Memo

I often spend lots of time on setting up the same software over and over again for different purposes. In order to save time, I decide to write down some useful procedures or hints here. Hope that they could be helpful to myself and others who visit here.

How to check a python module path before importing?

posted Jun 24, 2018, 7:01 PM by Teng-Yok Lee

REF: https://stackoverflow.com/questions/247770/retrieving-python-module-path

Use imp. For instance,
import cv2 imp.find_module("cv2")

Enable OpenGL direct rendering over ssh after installing nVidia driver

posted Jun 23, 2018, 5:30 PM by Teng-Yok Lee   [ updated Jun 23, 2018, 8:29 PM ]

In my experience, nVidia driver will overwrite Mesa's libGL.so in /usr/lib/x86_64-linux-gnu/mesa/. However, mesa's libGL.so support direct rendering over ssh, while nVidia's can't.

A quick fix is copying the ligGL.so of /usr/lib/x86_64-linux-gnu/mesa/ from other host. Then once ssh to the host, set LD_LIBRARY_PATH as follows:

$ export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/mesa:${LD_LIBRARY_PATH}

Then the current session will use mesa's libGL.so.

More detailed explanation can be seen here:
https://praveen-palanisamy.github.io/blog/2017/02/15/Running-OpenGL-Applications-Remotely


FW: List the revision of a conda environment

posted Feb 12, 2018, 8:05 PM by Teng-Yok Lee

$ conda list --revisions

REF: http://blog.rtwilson.com/conda-revisions-letting-you-rollback-to-a-previous-version-of-your-environment/

Git tips

posted Feb 4, 2018, 6:36 AM by Teng-Yok Lee   [ updated Feb 4, 2018, 6:37 AM ]

This is the collection for tips to fix common problems I have.

Ignore the change of file modes

REF: https://stackoverflow.com/questions/1580596/how-do-i-make-git-ignore-file-mode-chmod-changes
$ git config core.fileMode false



Commonly used dpkg & apt-get options

posted Jan 24, 2018, 6:07 PM by Teng-Yok Lee   [ updated Jan 24, 2018, 6:09 PM ]

Given a file path, query the belonging package (REF):
$ dpkg -S <filepath>

Query files in a package:
$ dpkg -L <package>

List install packages (REF):
$ apt list --installed

FW: How to debug a release build

posted Jul 7, 2017, 6:38 AM by Teng-Yok Lee

REF: https://msdn.microsoft.com/en-us/library/fsk896zz.aspx

With CMake, the same effect can be achieved by modifying the following CMake variables:

  • Append "/Z7" to CMAKE_CXX_FLAGS_RELEASE and CMAKE_C_FLAGS_RELEASE.
  • Append "/DEBUG /OPT:REF /OPT:ICF" to CMAKE_EXE_LINKER_FLAGS_RELEASE.

Tips for Bash Programming

posted Jun 25, 2017, 7:05 PM by Teng-Yok Lee   [ updated Jun 27, 2017, 8:02 AM ]

REF: http://blog.jobbole.com/111514/
  • Add "set -o errexit" to exit when there are errors.
  • Add "set -o nounset" to exist when there are undefined variables.
  • Add readonly for constant variables.
How to enable the debug mode? (REF, REF)
  • -v: Run shell script in verbose mode.
  • -n: Read the command w/o execution.
  • -x: Show the executed commands & arguments

Test the slideshow of Google+ photo album

posted Oct 15, 2016, 5:53 AM by Teng-Yok Lee   [ updated Oct 15, 2016, 7:57 PM ]

This post is to test the slideshow feature of Google+ photo on Google Site. Unfortunately it does not work...

Maybe the easiest way is to insert an animated .gif? e.g.


With imagemagick, I can combine multiple images into a single GIF. The command is:

$ convert -delay 100 -loop 1 *.JPG firework.gif


A result is below:





Run Apache Spark with Docker

posted May 5, 2015, 7:32 PM by Teng-Yok Lee   [ updated May 9, 2015, 8:34 AM ]

I want to learn spark, but I don't have a cluster, so I uses Docker to simulate one to practice. This memo mainly re-organizes multiple tutorials online:

Prepaere a virtual machine

This step is needed since I am using Windows. At the beginning I used the virtual machine came with boot2docker, but the root (/) was stored in RAM, and thus I lost all configuration changes (e.g. BASH) after rebooted the guest OS. Thus I decide to install a Ubuntu on VirtualBox instead. The tutorial in the link below has clear illustration

http://www.wikihow.com/Install-Ubuntu-on-VirtualBox

NOTE

  1. Create a disk with at least 20GB. At the beginning I only prepared 8GB, which quickly ran out of space.
  2. Also, don't use fixed size because it cannot be resized if needed (REF).
  3. Assign enough CPUs. I used 4 cores.
  4. Once the guest Ubuntu is installed, install openssh-server (REF) in order to log in to the Ubuntu via putty.

Install Docker


Docker's official site has a clear instructions. Once log in to the Ubuntu, run the following 2 commands:

$ wget -qO- https://get.docker.com/ | sh

$ sudo docker run hello-world


Run spark


Nevertheless, the git repo URL is different. Now the git command should be:

$ git clone -b blogpost https://github.com/amplab/docker-scripts.git

Then launch the docker containers for spark:
$ sudo ./docker-scripts/deploy/deploy.sh -i amplab/spark:0.8.0 -c

NOTE
  1. These script cannot work for newer version of Spark. First, it only supports up to 1.0.0. Second, its script for spark 1.0.0 will not work. The docker command will keep waiting for the master, which is impossible since the master cannot launch spark.
  2. This command will launch a shell for scala. To run pyspark, just type command exit to terminate this shell (or it will take all CPUs for its own workers)

Run PySpark

The following is based on the instructions by Aris. The previous step should have output to indicate the master information. For instance:

***********************************************************************

start shell via:            sudo /home/leeten/projects/docker-scripts/deploy/start_shell.sh -i amplab/spark-shell:0.8.0 -n 5b37cadb558db3380eef69adfd9bcc533dd98a604f529447c81331533dfa951b

visit Spark WebUI at:       http://172.17.0.4:8080/

visit Hadoop Namenode at:   http://172.17.0.4:50070

ssh into master via:        ssh -i /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@172.17.0.4

/data mapped:

kill master via:           sudo docker kill 7fff30fe8ef3b766504844e0f5eace95a10c66d1e72327fbdd5604b5b8536a16

***********************************************************************

Now you can log in to the master after change the permission of the id_rsa file:

$ chmode 400 /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa
$ ssh -i /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@172.17.0.4

On the docker container, launch pyspark:

$ /opt/spark-0.8.0/pyspark


Example: Estimate PI



The following python code can estimate PI. It is based on the code segment in Spark Examples, and the source code can be found on git. However the current version fails at the statement to create SparkContext with Spark 0.8.0. Thus I revise it as follows:

# REF: https://spark.apache.org/examples.html
# Complete (But not runnable code): https://github.com/apache/spark/blob/master/examples/src/main/python/pi.py

import sys
from random import random
from operator import add

def sample(p):
    x, y = random(), random();
    return 1 if x*x + y*y < 1 else 0;

NUM_SAMPLES = 100;
count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, b: a + b);
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES);


Troubleshooting

What if docker keeps waiting for the master?

You can log in to the master manually. As mentioned in the previous section, the following command print the master's IP.

$ ./docker-scripts/deploy/deploy.sh -i amplab/spark:0.8.0 -c


Then you can directly log in to the host.

$ ssh -i /home/leeten/projects/docker-scripts/deploy/../apache-hadoop-hdfs-precise/files/id_rsa -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@172.17.0.4

Once logged in, check whether spark is running, or manually launch spark to see whether it is working. In my case, the master failed to launch spark so docker is waiting.

Error: WARN ClusterScheduler: Initial job has not accepted any resource

If spark or pyspakr shows the following message, it means that no worker is available (REF):


WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

You can check the spark status on the WebUI, as shown above. In this example, it is http://172.17.0.4:8080/.

References

http://www.wikihow.com/Install-Ubuntu-on-VirtualBox
https://docs.docker.com/installation/ubuntulinux/
https://amplab.cs.berkeley.edu/author/schumach/
http://www.rankfocus.com/run-berkeley-sparks-pyspark-using-docker-couple-minutes/

My Python Porting of TestScatterPlotMatrix for VTK

posted Apr 12, 2015, 6:25 AM by Teng-Yok Lee   [ updated Apr 12, 2015, 6:26 AM ]

This code is based on the sample: https://github.com/Kitware/VTK/blob/master/Charts/Core/Testing/Cxx/TestScatterPlotMatrix.cxx

I modified it to python version and load the iris dataset as a demo.

import csv;

import numpy as np;

import vtk;

import vtk.util.numpy_support as VN;


# Load the iris dataset.

# NOTE: It is downloaded from:

# http://aima.cs.berkeley.edu/data/iris.csv


csv_filepath = r'F:\data\multivariate\iris\iris.csv';

n_cols = 4;

csv_table = np.loadtxt(open(csv_filepath, "rb"), delimiter=",", usecols=xrange(n_cols));

# Convert into the table format for VTK.

vtk_table = vtk.vtkTable();

for columni in range(n_cols):

    # Convert numpy array to vtk array.

    # Note: https://pyscience.wordpress.com/2014/09/06/numpy-to-vtk-converting-your-numpy-arrays-to-vtk-arrays-and-files/

    array = VN.numpy_to_vtk(np.ascontiguousarray(csv_table[:, columni]), deep=1);

    array.SetName("%d" % (columni))

    vtk_table.AddColumn(array);


######################################################################

# REF:    http://fossies.org/dox/ParaView-v4.1.0-source/TestScatterPlotMatrix_8cxx_source.html

matrix = vtk.vtkScatterPlotMatrix();

# Fine tune the color if needed.

# matrix.SetPlotColor(matrix.SCATTERPLOT, vtk.vtkColor4ub(0, 0, 0, 1));

# matrix.SetPlotColor(matrix.ACTIVEPLOT, vtk.vtkColor4ub(0, 0, 0, 1));

matrix_view = vtk.vtkContextView();

matrix_view.GetScene().AddItem(matrix);

matrix.SetInput(vtk_table);

matrix.SetNumberOfBins(7);

matrix_view.Render();

matrix_view.GetInteractor().Start()




1-10 of 109