Skip to main content


Showing posts from September, 2013

Initializing a Pandas panel

Sometimes there are multiple tables of data that should be stored in an aligned manner. Pandas Panel is great for this. Panels can not expand along the major and minor axis after they are created (at least in a painless manner). If you know the maximum size of the tabular data it is convenient to initialize the panel to this maximum size before inserting any data. For example: import numpy, pandas as pd pn = pd.Panel(major_axis=['1','2','3','4','5','6'], minor_axis=['a','b']) pn['A'] = pd.DataFrame(numpy.random.randn(3,2), index=['2','3','5'], columns=['a','b']) print pn['A'] Which gives: a b 1 NaN NaN 2 1.862536 -0.966010 3 -0.214348 -0.882993 4 NaN NaN 5 -1.266505 1.248311 6 NaN NaN Edit: Don't need a default item - an empty panel can be created

Macro photography with reversed lens

I had forgotten the simple joys of experimenting with cameras. Some of you will recall the old trick of reversing your lens to obtain macro photos. Here I simply took my 18-55 kit lens, reversed it, set it to 18mm and took a photo of my laptop monitor. I aimed it at a white part of the screen and you can see the three sub pixels per real pixel which combine together to give the illusion of white.

Pandas panel = collection of tables/data frames aligned by index and column

Pandas panel provides a nice way to collect related data frames together while maintaining correspondence between the index and column values:

import pandas as pd, pylab #Full dimensions of a slice of our panel index = ['1','2','3','4'] #major_index columns = ['a','b','c'] #minor_index df = pd.DataFrame(pylab.randn(4,3),columns=columns,index=index) #A full slice of the panel df2 = pd.DataFrame(pylab.randn(3,2),columns=['a','c'],index=['1','3','4']) #A partial slice df3 = pd.DataFrame(pylab.randn(2,2),columns=['a','b'],index=['2','4']) #Another partial slice df4 = pd.DataFrame(pylab.randn(2,2),columns=['d','e'],index=['5','6']) #Partial slice with a new column and index pn = pd.Panel({'A': df}) pn['B'] = df2 pn['C'] = df3 pn['D'] = df4 for key in pn.items: print pn[key] -> output …

Wordpress renders LaTeX

I was so pleasantly surprised to learn that wordpress blogs will render latex. The tags are simply $latex and $.
So $latex e^{ix} = \cos(x) + i\sin(x)$ will render as

There are some cool parameters that you can set (from hints here and here):
increase size by adding &s=X where X is an integer [-4,4]: $latex x^2 &s=2$  Instead of inline equtions (default) display as block (bigger): $latex \displaystyle x^2$

Python: Multiprocessing: xlrd workbook can't be passed as argument

import multiprocessing as mp, xlrd def myfun(b): print b.sheet_names() b=xlrd.open_workbook('../../Notes/sessions_and_neurons.xlsx') p = mp.Pool(4), [b,b,b,b]) Exception in thread Thread-2: Traceback (most recent call last): File "/Applications/", line 551, in __bootstrap_inner File "/Applications/", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/Applications/", line 319, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin__.instancemethod failed

Python: Multiprocessing: passing multiple arguments to a function

Write a wrapper function to unpack the arguments before calling the real function. Lambda won't work, for some strange un-Pythonic reason.

import multiprocessing as mp def myfun(a,b): print a + b def mf_wrap(args): return myfun(*args) p = mp.Pool(4) fl = [(a,b) for a in range(3) for b in range(2)] #mf_wrap = lambda args: myfun(*args) -> this sucker, though more pythonic and compact, won't work, fl)

Calculating confidence intervals: straight Python is as good as scipy.stats.scoreatpercentile

I would say the most efficient AND readable way of working out confidence intervals from bootstraps is:


Where r is a n x b array where n are different runs (e.g different data sets) and b are the individual bootstraps within a run. This code returns the 95% CIs as three numpy arrays.

Confidence intervals can be computed by bootstrapping the calculation of a descriptive statistic and then finding the appropriate percentiles of the data. I saw that scipy.stats has a built in percentile function and assumed that it would work really fast because (presumably) the code is in C. I was using a simple minded Python/Numpy implementation by first sorting and then picking the appropriate percentile data. I thought this was going to be inefficient timewise and decided that using scipy.stats.scoreatpercentile was going to be blazing fast because
It was native C It was vectorized - I could compute the CIs for multiple bootstrap runs at the same time …

Three coding fonts

Coding fonts should:
Look good at small sizes, (10-11 pt) - you can see more code in your windowHave good distinction between characters, especially (O,0), (i,l), (l,1)(`,') - your programs have enough bugs already Three fonts that I have tried out and that work for me are, in order
Anonymous Pro - Looks good even at 10ptMonacoConsolas

D5100: More notes

It took me a little bit to get warmed up to the concept, but now I definitely see the potential for using DSLRs for movie making. Camcorders (in the price range I would consider) are fitted with single lenses (probably a superzoom) with average optical quality. Their smaller sensor size means a much noisier low light performance.

With this cheap DSLR I can put on my cheap 50mm/1.8 and get HD movies that look 'arty' because I opened the lens up wide. I can take movies in indoor lighting. I can take videos of my cat that look like something showing at Sundance. It really opens up for creativity.

My only gripe is the auto focus. It's not that it is slow, it's that I can't get it to do what I want, but perhaps I want too much. The AF, with a decent lens, like the 35mm/1.8 AF-S, is fast enough and silent enough. The kit lens is atrocious in this department. My gripe is that I just could not figure out how to efficiently get it to track my subject (my cat).

My assum…

A script to clear and disable recent items in Mac OS X doc

From a hint here.

Mac OS X has the annoying feature of remembering your application history in the dock and not erasing the history when you erase it from the application preferences.

The following is a little bash script that does this for you provided you pass the name of the application (e.g. to it.

#!/bin/bash -x BUNDLEID=$(defaults read "/Applications/$1/Contents/Info" CFBundleIdentifier) defaults delete "$BUNDLEID.LSSharedFileList" RecentDocuments defaults write "$BUNDLEID" NSRecentDocumentsLimit 0 defaults write "$BUNDLEID.LSSharedFileList" RecentDocuments -dict-add MaxAmount 0
You need to run killall Dock after this to restart the dock for the changes to take effect.

The nikon D5100 (as seen by a D40 shooter)

The D5100 is rather old news now. People are either ogling the m4/3 cameras (I know I am) or looking at Nikon's new models such as the D5200. However, I recall, when the D5100 first came out, and I was the owner of a D40, I badly wanted the high ISO performance and the video.

Well, enough time has passed that the D5100 is now at a sweet price point (especially the refurbished ones) that I did get myself one. There are tons of comprehensive D5100 reviews out there, this will be a short collection of very subjective thoughts from a D40 owner.

What kind of photographer am I?
Well, I'm a casual shooter. A few pics are up on flickr, but I mostly shoot family and don't really put up pictures on web galleries. My favorite subject is the human face in the middle of its many fleeting expressions.
High ISO performance
I'm very happy. Experts on sites such as dpreview complained that noise rendered D5100 photos above 1600 unusable. I was already impressed by the D40's ISO 1600 …