I often spend lots of time on setting up the same software over and over again for different purposes. In order to save time, I decide to write down some useful procedures or hints here. Hope that they could be helpful to myself and others who visit here.

FW: List the revision of a conda environment

posted Feb 12, 2018, 8:05 PM by Teng-Yok Lee

$ conda list --revisions


Git tips

posted Feb 4, 2018, 6:36 AM by Teng-Yok Lee   [ updated Feb 4, 2018, 6:37 AM ]

This is the collection for tips to fix common problems I have.

Ignore the change of file modes

$ git config core.fileMode false

Commonly used dpkg & apt-get options

posted Jan 24, 2018, 6:07 PM by Teng-Yok Lee   [ updated Jan 24, 2018, 6:09 PM ]

Given a file path, query the belonging package (REF):
$ dpkg -S <filepath>

Query files in a package:
$ dpkg -L <package>

List install packages (REF):
$ apt list --installed

FW: How to debug a release build

posted Jul 7, 2017, 6:38 AM by Teng-Yok Lee


With CMake, the same effect can be achieved by modifying the following CMake variables:


Tips for Bash Programming

posted Jun 25, 2017, 7:05 PM by Teng-Yok Lee   [ updated Jun 27, 2017, 8:02 AM ]

  • Add "set -o errexit" to exit when there are errors.
  • Add "set -o nounset" to exist when there are undefined variables.
  • Add readonly for constant variables.
How to enable the debug mode? (REF, REF)
  • -v: Run shell script in verbose mode.
  • -n: Read the command w/o execution.
  • -x: Show the executed commands & arguments

Test the slideshow of Google+ photo album

posted Oct 15, 2016, 5:53 AM by Teng-Yok Lee   [ updated Oct 15, 2016, 7:57 PM ]

This post is to test the slideshow feature of Google+ photo on Google Site. Unfortunately it does not work...

Maybe the easiest way is to insert an animated .gif? e.g.

With imagemagick, I can combine multiple images into a single GIF. The command is:

$ convert -delay 100 -loop 1 *.JPG firework.gif

A result is below:

Run Apache Spark with Docker

posted May 5, 2015, 7:32 PM by Teng-Yok Lee   [ updated May 9, 2015, 8:34 AM ]

I want to learn spark, but I don't have a cluster, so I uses Docker to simulate one to practice. This memo mainly re-organizes multiple tutorials online:

Prepaere a virtual machine

This step is needed since I am using Windows. At the beginning I used the virtual machine came with boot2docker, but the root (/) was stored in RAM, and thus I lost all configuration changes (e.g. BASH) after rebooted the guest OS. Thus I decide to install a Ubuntu on VirtualBox instead. The tutorial in the link below has clear illustration


  1. Create a disk with at least 20GB. At the beginning I only prepared 8GB, which quickly ran out of space.
  2. Also, don't use fixed size because it cannot be resized if needed (REF).
  3. Assign enough CPUs. I used 4 cores.
  4. Once the guest Ubuntu is installed, install openssh-server (REF) in order to log in to the Ubuntu via putty.

Install Docker

Docker's official site has a clear instructions. Once log in to the Ubuntu, run the following 2 commands:

$ wget -qO- | sh

$ sudo docker run hello-world

Run spark

Nevertheless, the git repo URL is different. Now the git command should be:

$ git clone -b blogpost

Then launch the docker containers for spark:
$ sudo ./docker-scripts/deploy/ -i amplab/spark:0.8.0 -c

  1. These script cannot work for newer version of Spark. First, it only supports up to 1.0.0. Second, its script for spark 1.0.0 will not work. The docker command will keep waiting for the master, which is impossible since the master cannot launch spark.
  2. This command will launch a shell for scala. To run pyspark, just type command exit to terminate this shell (or it will take all CPUs for its own workers)

Run PySpark

The following is based on the instructions by Aris. The previous step should have output to indicate the master information. For instance:


start shell via:            sudo /home/leeten/projects/docker-scripts/deploy/ -i amplab/spark-shell:0.8.0 -n 5b37cadb558db3380eef69adfd9bcc533dd98a604f529447c81331533dfa951b

visit Spark WebUI at:

visit Hadoop Namenode at:

ssh into master via:        ssh -i /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@

/data mapped:

kill master via:           sudo docker kill 7fff30fe8ef3b766504844e0f5eace95a10c66d1e72327fbdd5604b5b8536a16


Now you can log in to the master after change the permission of the id_rsa file:

$ chmode 400 /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa
$ ssh -i /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@

On the docker container, launch pyspark:

$ /opt/spark-0.8.0/pyspark

Example: Estimate PI

The following python code can estimate PI. It is based on the code segment in Spark Examples, and the source code can be found on git. However the current version fails at the statement to create SparkContext with Spark 0.8.0. Thus I revise it as follows:

# REF:
# Complete (But not runnable code):

import sys
from random import random
from operator import add

def sample(p):
    x, y = random(), random();
    return 1 if x*x + y*y < 1 else 0;

count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, b: a + b);
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES);


What if docker keeps waiting for the master?

You can log in to the master manually. As mentioned in the previous section, the following command print the master's IP.

$ ./docker-scripts/deploy/ -i amplab/spark:0.8.0 -c

Then you can directly log in to the host.

$ ssh -i /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@

Once logged in, check whether spark is running, or manually launch spark to see whether it is working. In my case, the master failed to launch spark so docker is waiting.

Error: WARN ClusterScheduler: Initial job has not accepted any resource

If spark or pyspakr shows the following message, it means that no worker is available (REF):

WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

You can check the spark status on the WebUI, as shown above. In this example, it is


My Python Porting of TestScatterPlotMatrix for VTK

posted Apr 12, 2015, 6:25 AM by Teng-Yok Lee   [ updated Apr 12, 2015, 6:26 AM ]

This code is based on the sample:

I modified it to python version and load the iris dataset as a demo.

import csv;

import numpy as np;

import vtk;

import vtk.util.numpy_support as VN;

# Load the iris dataset.

# NOTE: It is downloaded from:


csv_filepath = r'F:\data\multivariate\iris\iris.csv';

n_cols = 4;

csv_table = np.loadtxt(open(csv_filepath, "rb"), delimiter=",", usecols=xrange(n_cols));

# Convert into the table format for VTK.

vtk_table = vtk.vtkTable();

for columni in range(n_cols):

    # Convert numpy array to vtk array.

    # Note:

    array = VN.numpy_to_vtk(np.ascontiguousarray(csv_table[:, columni]), deep=1);

    array.SetName("%d" % (columni))



# REF:

matrix = vtk.vtkScatterPlotMatrix();

# Fine tune the color if needed.

# matrix.SetPlotColor(matrix.SCATTERPLOT, vtk.vtkColor4ub(0, 0, 0, 1));

# matrix.SetPlotColor(matrix.ACTIVEPLOT, vtk.vtkColor4ub(0, 0, 0, 1));

matrix_view = vtk.vtkContextView();






Use libSDF to read SDF format on Windows

posted Mar 20, 2015, 9:46 AM by Teng-Yok Lee   [ updated Mar 20, 2015, 9:47 AM ]

This is my memo to read the data for IEEE SciVis 2015 Contest. The file is in SDF format. There is a C/C++ library libSDF to open the files, but some instructions are unclear, especially for the porting on Windows. Also, I cannot find examples about its usage so I write one:


To build libSDF for Visual Studio 2010, I use cygwin and mingw x64. The procedure to install them can be seen here:

Build libSDF for Windows x64 platform

  • Edit SDFfuncs.c: Remove the preprocessor USE_ALLOCA.
  • Edit utils.c: Change the function MPMU_Fopen() to open the file as binary (otherwise not all bytes can be read):
MPMYFile *MPMY_Fopen(const char *path, int mpmy_flags)
    MPMYFile *fp;
    int iomode = MPMY_SINGL;
    char mode[8] = {[0] = 'r'};

    Msgf("Fopen %s\n", path);
    if (mpmy_flags & MPMY_RDONLY) mode[0] = 'r'; /* MPMY_RDONLY is 0 since O_RDONLY is (stupidly) 0 */
    if (mpmy_flags & MPMY_WRONLY) mode[0] = 'w';
    if (mpmy_flags & MPMY_APPEND) mode[0] = 'a';
    switch(mode[0]) {
    case 'r':
        strcpy(mode, "r+b");
    case 'w':
        strcpy(mode, "w+b");
    case 'a':
        strcpy(mode, "a+b");

  • Edit Makefile: Change CC from gcc to the one for mingw.
  • Use cygwin to build the library.
$ make libSDF.a
$ /usr/bin/x86_64-w64-mingw32-dllwrap.exe --export-all-symbols *.o -lm --output-def libSDF_x64.def -o libSDF_x64.dll

  • Use Visual Studio (64-bit) Command Prompt to build the .dll

lib.exe /machine:X64 /def:libSDF_x64.def

My quick example to read the array "x"

    #define LOG_VAR(x) cout<<x<<endl;

    char* szSdfFilepath = "F:/data/viscontest/scivis2015/ds14_scivis_0128_e4_dt04_0.0200";

    SDF *sdf = SDFopen(szSdfFilepath, "");
    SDFdebug(1); // Put it 0 to disable debug information.

    int64_t uNrOfRecs = SDFnrecs("x", sdf);
    vector<float> vfData(uNrOfRecs);
    SDFrdvecs(sdf, "x", uNrOfRecs,, 0, NULL);

    for(size_t i = 0; i < uNrOfRecs; i++)
        if( 0.0f != vfData[i] )

Suggestions to avoid ARPACK++ compilation errors

posted Jan 21, 2015, 7:24 AM by Teng-Yok Lee   [ updated Jan 21, 2015, 6:42 PM ]

I plan to put my patches to ARPACK++( to my Google Code repo at the end of Jan. 2015. Here I manually list the fixes for my applications. Note that my applications only use ardsmat.h and ardssym.h so there could be more errors, but I guess that the fixes for other parts should be similar.


Comment arcomp.h (Otherwise, arcomple<float>/arcomple<double> will be declared inside a extern "C" closure, which is not allowed since C does not understand C++ template).

// #include "arcomp.h"

Also, replace the generic.h

#include <generic.h>

by the only needed macro name2:

// REF:
#define name2(a,b) gEnErIc2(a,b)
#define gEnErIc2(a,b) a ## b


DefineParameters(A.ncols(), nevp, &A, ARdsSymMatrix<FLOAT>::MultMv,
DefineParameters(A.ncols(), nevp, &A, &ARdsSymMatrix<FLOAT>::MultMv,


#include <iostream.h>

#include <iostream>


Change the sentences at the end (Because arcomp.h is not included, '}' will be ignored and thus the entern closure is not completed).

#endif // ARCOMP_H

#endif // ARCOMP_H
} // extern "C" {

1-10 of 107