Monday, September 30, 2013

Initializing a Pandas panel

Sometimes there are multiple tables of data that should be stored in an aligned manner. Pandas Panel is great for this. Panels can not expand along the major and minor axis after they are created (at least in a painless manner). If you know the maximum size of the tabular data it is convenient to initialize the panel to this maximum size before inserting any data. For example:
import numpy, pandas as pd

pn = pd.Panel(major_axis=['1','2','3','4','5','6'], minor_axis=['a','b'])
pn['A'] = pd.DataFrame(numpy.random.randn(3,2), index=['2','3','5'], columns=['a','b'])
print pn['A']

Which gives:
          a         b
1       NaN       NaN
2  1.862536 -0.966010
3 -0.214348 -0.882993
4       NaN       NaN
5 -1.266505  1.248311
6       NaN       NaN

Edit: Don't need a default item - an empty panel can be created

Saturday, September 28, 2013

Macro photography with reversed lens

I had forgotten the simple joys of experimenting with cameras. Some of you will recall the old trick of reversing your lens to obtain macro photos. Here I simply took my 18-55 kit lens, reversed it, set it to 18mm and took a photo of my laptop monitor. I aimed it at a white part of the screen and you can see the three sub pixels per real pixel which combine together to give the illusion of white.

Monday, September 23, 2013

Pandas panel = collection of tables/data frames aligned by index and column

Pandas panel provides a nice way to collect related data frames together while maintaining correspondence between the index and column values:


import pandas as pd, pylab

#Full dimensions of a slice of our panel
index = ['1','2','3','4'] #major_index
columns = ['a','b','c'] #minor_index

df = pd.DataFrame(pylab.randn(4,3),columns=columns,index=index) #A full slice of the panel
df2 = pd.DataFrame(pylab.randn(3,2),columns=['a','c'],index=['1','3','4']) #A partial slice
df3 = pd.DataFrame(pylab.randn(2,2),columns=['a','b'],index=['2','4']) #Another partial slice
df4 = pd.DataFrame(pylab.randn(2,2),columns=['d','e'],index=['5','6']) #Partial slice with a new column and index


pn = pd.Panel({'A': df})
pn['B'] = df2
pn['C'] = df3
pn['D'] = df4

for key in pn.items:
  print pn[key]

-> output

          a         b         c
1  0.243221 -0.142410  1.228757
2 -0.748140 -0.780719  0.644401
3  0.161369 -0.001034 -0.278070
4 -1.143613 -1.547082  0.025639
          a   b         c
1  1.165219 NaN  1.391501
2       NaN NaN       NaN
3 -1.484183 NaN  0.541619
4  0.810439 NaN -0.848142
          a         b   c
1       NaN       NaN NaN
2  1.310740  1.278829 NaN
3       NaN       NaN NaN
4  0.042748 -0.464065 NaN
    a   b   c
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

One thing to note is that Panel does not expand along the minor and major axes.

Saturday, September 21, 2013

Wordpress renders LaTeX

I was so pleasantly surprised to learn that wordpress blogs will render latex. The tags are simply $latex and $.
So $latex e^{ix} = \cos(x) + i\sin(x)$ will render as

There are some cool parameters that you can set (from hints here and here):
  1. increase size by adding &s=X where X is an integer [-4,4]: $latex x^2 &s=2$  
  2. Instead of inline equtions (default) display as block (bigger): $latex \displaystyle x^2$

Thursday, September 19, 2013

Python: Multiprocessing: xlrd workbook can't be passed as argument

import multiprocessing as mp, xlrd

def myfun(b):
  print b.sheet_names()

b=xlrd.open_workbook('../../Notes/sessions_and_neurons.xlsx')
p = mp.Pool(4)
p.map(myfun, [b,b,b,b])
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/Applications/Canopy.app/appdata/canopy-1.1.0.1371.macosx-x86_64/Canopy.app/Contents/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/Applications/Canopy.app/appdata/canopy-1.1.0.1371.macosx-x86_64/Canopy.app/Contents/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Applications/Canopy.app/appdata/canopy-1.1.0.1371.macosx-x86_64/Canopy.app/Contents/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
    put(task)
PicklingError: Can't pickle : attribute lookup __builtin__.instancemethod failed

Python: Multiprocessing: passing multiple arguments to a function

Write a wrapper function to unpack the arguments before calling the real function. Lambda won't work, for some strange un-Pythonic reason.


import multiprocessing as mp

def myfun(a,b):
  print a + b

def mf_wrap(args):
  return myfun(*args)

p = mp.Pool(4)

fl = [(a,b) for a in range(3) for b in range(2)]
#mf_wrap = lambda args: myfun(*args) -> this sucker, though more pythonic and compact, won't work

p.map(mf_wrap, fl)

Tuesday, September 17, 2013

Calculating confidence intervals: straight Python is as good as scipy.stats.scoreatpercentile

UPDATE:
I would say the most efficient AND readable way of working out confidence intervals from bootstraps is:

numpy.percentile(r,[2.5,50,97.5],axis=1)

Where r is a n x b array where n are different runs (e.g different data sets) and b are the individual bootstraps within a run. This code returns the 95% CIs as three numpy arrays.


Confidence intervals can be computed by bootstrapping the calculation of a descriptive statistic and then finding the appropriate percentiles of the data. I saw that scipy.stats has a built in percentile function and assumed that it would work really fast because (presumably) the code is in C. I was using a simple minded Python/Numpy implementation by first sorting and then picking the appropriate percentile data. I thought this was going to be inefficient timewise and decided that using scipy.stats.scoreatpercentile was going to be blazing fast because
  1. It was native C
  2. It was vectorized - I could compute the CIs for multiple bootstrap runs at the same time
  3. It could pick out multiple percentiles (low and high ends) at the same time.
Funnily enough, my crude measurements showed that the dumb implementation using numpy.sort is just as fast as the builtin one. Well, silly me: it turns out that scipy.stats.scoreatpercentile calls scipy.stats.mquantiles which simply does numpy.sort. I guess I should have thought of that, since sorting is the real bottle neck in this operation and numpy.sort is as efficient as you can get since that's implemented in C.



python test.py | grep 'function calls'
         38 function calls (36 primitive calls) in 0.001 seconds
         12 function calls in 0.001 seconds
         38 function calls (36 primitive calls) in 0.001 seconds
         12 function calls in 0.001 seconds
         38 function calls (36 primitive calls) in 0.876 seconds
         17 function calls in 0.705 seconds
         38 function calls (36 primitive calls) in 0.814 seconds
         17 function calls in 0.704 seconds


import pylab, cProfile, scipy.stats as ss

def conf_int_scipy(x, ci=0.95):
  low_per = 100*(1-ci)/2.
  high_per = 100*ci + low_per
  mn = x.mean()
  cis = ss.scoreatpercentile(x, [low_per, high_per])
  return mn, cis

def conf_int_native(x, ci=0.95):
  ci2 = (1-ci)*.5
  low_idx = int(ci2*x.size)
  high_idx = int((1-ci2)*x.size)
  x.sort()
  return x.mean(), x[low_idx], x[high_idx]


def conf_int_scipy_multi(x, ci=0.95):
  low_per = 100*(1-ci)/2.
  high_per = 100*ci + low_per
  mn = x.mean(axis=0)
  cis = ss.scoreatpercentile(x, [low_per, high_per],axis=0)
  return mn, cis

def conf_int_native_multi(x, ci=0.95):
  ci2 = (1-ci)*.5
  low_idx = int(ci2*x.shape[1])
  high_idx = int((1-ci2)*x.shape[1])
  mn = x.mean(axis=1)
  xs = pylab.sort(x)
  cis = pylab.empty((mn.size,2),dtype=float)
  cis[:,0] = xs[:,low_idx]
  cis[:,1] = xs[:,high_idx]
  return mn, cis

r = pylab.randn(10000)
cProfile.run('conf_int_scipy(r)')
cProfile.run('conf_int_native(r)')

r = pylab.randn(10000)
cProfile.run('conf_int_scipy(r)')
cProfile.run('conf_int_native(r)')

r = pylab.randn(1000,10000)
cProfile.run('conf_int_scipy_multi(r)')
cProfile.run('conf_int_native_multi(r)')

r = pylab.randn(1000,10000)
cProfile.run('conf_int_scipy_multi(r)')
cProfile.run('conf_int_native_multi(r)')

Sunday, September 15, 2013

Three coding fonts

Coding fonts should:
  1. Look good at small sizes, (10-11 pt) - you can see more code in your window
  2. Have good distinction between characters, especially (O,0), (i,l), (l,1)(`,') - your programs have enough bugs already
Three fonts that I have tried out and that work for me are, in order
  1. Anonymous Pro - Looks good even at 10pt
  2. Monaco
  3. Consolas

Anonymous Pro 11pt


Monaco 11pt

Consolas 11pt

Saturday, September 14, 2013

D5100: More notes

Video
It took me a little bit to get warmed up to the concept, but now I definitely see the potential for using DSLRs for movie making. Camcorders (in the price range I would consider) are fitted with single lenses (probably a superzoom) with average optical quality. Their smaller sensor size means a much noisier low light performance.

With this cheap DSLR I can put on my cheap 50mm/1.8 and get HD movies that look 'arty' because I opened the lens up wide. I can take movies in indoor lighting. I can take videos of my cat that look like something showing at Sundance. It really opens up for creativity.

My only gripe is the auto focus. It's not that it is slow, it's that I can't get it to do what I want, but perhaps I want too much. The AF, with a decent lens, like the 35mm/1.8 AF-S, is fast enough and silent enough. The kit lens is atrocious in this department. My gripe is that I just could not figure out how to efficiently get it to track my subject (my cat).

My assumption was that with AF-F and subject track I would be able to focus on the cat and then when he moved focus would follow. Well, not quite. The logic seemed to be to focus on what ever was in the focus box. I need more practice with this to figure out what to do.

Ok, you know how guys get a bad rap because they don't read the manual? I'm one of those guys. This ain't your grandpapy's SLR. You flick the switch to live view. You set to AF-C and select 'Tracking' for mode. You use the eight-way switch to place the green cursor over your subject. Then you hit OK. This registers the pattern in the targeting computer and the square will follow the subject around the screen and keep focus on it. I think this is what fighter pilots use to target their AGMs.

In the meanwhile, manual focus works a blast, and I can see this working just fine stopped down to f3.4 or so for causal home movies.

I read somewhere that movie makers actually use manual focus. I remember seeing documentaries about film making and the camera-man's assistant was always scuttling around with a tape measure and the actors had lines on the floor indicating where they were to stand for the low DoF shots, so perhaps no AF logic can really understand what YOU want in focus in a moving scene and so you need to do it manually.

Battery life
Normal battery life seems to be fine. I say seems because I have been using the video a bit and, man, does this chew through the battery fast. I had a fully charged battery and I had it down to the last bar within 30min of playing with video, making about 4 videos in this time and having live view on. I suspect that if you are using live view a bunch/making videos you need to find a very high capacity battery or have a bunch of spares. On the days I went out and shot mostly stills, I did not have any problems, the battery went through 100 shots without changing a bar.

I've been using a cheap(er) Wasabi battery and it has worked fine so far. Some people complained that their D5100 did not recognize the battery, or did not recognize it after the first charge, but so far, so good.

Viewfinder
The view finder now carries complete exposure information. You see shutter speed, f-number AND ISO in the viewfinder. This is also customizable, which makes it a nice step up from the D40's viewfinder. For some reason, I find it easier to manually focus my 50mm/1.8 on the D5100. I suspect that it is the rangefinder display that is aiding me, though the added brightness may be helping.

My use of the viewfinder is primarily to check focus and secondarily to compose. I'm OK with a little composition slop because I feel I can crop later (I don't worry too much about sticking to standard aspects). So if the coverage (how much of the scene is visible in the viewfinder) is less than 100% I don't care - I'm possibly getting some junk on the edges which I will crop away if needed. But I worry about focus and this matters a lot in low-light. So my intuition was that viewfinders with more magnification are better to see details which are important for focus.

Then I went through some specs on film and digital cameras:

F65 (Film camera, entry level) - 0.68x, 89% (Pg 106 of manual)
F4 (Film, top end, interchangeable viewfinders) -  0.7x, 100% (stock), 6x (special) (Pg 107 of manual)
D40 (Digital, entry) - 0.80x, 95%
D5100 (Digital, entry) - 0.78x, 95%
D4 (Digital, top end)- 0.70x, 100%

That's odd. Why would the top camera have LESS magnification than the entry level ones?

I read a nice description of basic viewfinder specs on luminous landscape and on stack exchange photography. I always thought that a larger magnification was better, but Stan Roger's answer on stack exchange made me realize that as the magnification gets larger the image gets dimmer, so there is a tradeoff between detail and low light visibility (other trade-offs are mentioned on the luminous landscape article).

This is a double hit: it is in low-light, when your f-number is low that accurate focus is most important (and your AF is likely to give up) but your viewfinder has to tradeoff between being visible and giving you detail.

So, I guess, in the D4 and other FF cameras, the designers went for better visibility (users often describe the viewfinders as brighter) deciding that that would help manual focus with 0.7x mag giving enough acuity for focus.

As a sidenote - where are our interchangeable viewfinders for digital cameras? I don't think even the D4 "flagship" has the ability to pop-off the stock pentaprism assembly and place another viewfinder.


Wednesday, September 11, 2013

A script to clear and disable recent items in Mac OS X doc

From a hint here.

Mac OS X has the annoying feature of remembering your application history in the dock and not erasing the history when you erase it from the application preferences.

The following is a little bash script that does this for you provided you pass the name of the application (e.g. vlc.app) to it.


#!/bin/bash -x
BUNDLEID=$(defaults read "/Applications/$1/Contents/Info" CFBundleIdentifier)
defaults delete "$BUNDLEID.LSSharedFileList" RecentDocuments
defaults write "$BUNDLEID" NSRecentDocumentsLimit 0
defaults write "$BUNDLEID.LSSharedFileList" RecentDocuments -dict-add MaxAmount 0

You need to run killall Dock after this to restart the dock for the changes to take effect.

Sunday, September 1, 2013

The nikon D5100 (as seen by a D40 shooter)

The D5100 is rather old news now. People are either ogling the m4/3 cameras (I know I am) or looking at Nikon's new models such as the D5200. However, I recall, when the D5100 first came out, and I was the owner of a D40, I badly wanted the high ISO performance and the video.

Well, enough time has passed that the D5100 is now at a sweet price point (especially the refurbished ones) that I did get myself one. There are tons of comprehensive D5100 reviews out there, this will be a short collection of very subjective thoughts from a D40 owner.

What kind of photographer am I?
Well, I'm a casual shooter. A few pics are up on flickr, but I mostly shoot family and don't really put up pictures on web galleries. My favorite subject is the human face in the middle of its many fleeting expressions.

High ISO performance
I'm very happy. Experts on sites such as dpreview complained that noise rendered D5100 photos above 1600 unusable. I was already impressed by the D40's ISO 1600 performance so I didn't pay any attention to that.

I'm extremely happy with Hi-1 (ISO 12800) and I think even ISO 25600 is better than the D40's Hi-1 (3200). This is candle light shooting, if you have an f1.8 lens. I think I'm going to have a lot of fun with the 35mm f1.8 mounted on the D5100. In addition, Nikon's auto-ISO logic will extend up to 25600 (In the D40 you would have to manually go to 3200 if you wanted that). The night shot (ISO 102400) is fun but I think the lens focuses at infinity.

Digital camera ISOs are an example of how easily we become spoilt. When I shot film the highest I ever shot was 800. I'd see 400 ISO film and salivate. Now I'm like, man, pictures aren't perfect at 25600, things could be better. WHAT MORE DO I WANT?!

Focus points
I used to focus and recompose but then for some reason I got it into my head that this is inaccurate.  While technically true, this only becomes an issue when shooting wide-open and medium to close distance subjects (relative to lens reach).

If you shoot wide open (say f1.8) hand held with the 35mm, trying for a subject at 1m, three things work against focus and recompose: a) The actual geometrical theory: the focal surface is actually a bowl and not flat like the sensor b) You don't actually swivel the camera in place - you shift a little. c) The subject moves during this time.

Once you stop down a little bit, or the subject is further off, the DOF is enough to compensate for focus-recompose

The D5100 has more focus points, but far fewer than the D5200. It's enough for me. I wasted enough time jogging the 4-way switch with the D40's three focus points. Sometimes, you focus-recompose and just shoot. It's better to get a little softness than miss the moment. Interestingly, cycling through the D5100's 11 points is not slower than the D40's, so I appreciate that.

The D5100 is really a step up in this department because now I have focus points above and below the midline (the D40 has three points horizontally along the midline) and this is perfect for both portrait and landscape orientation.

I am told the real utility of denser focal points is during shooting moving subjects and using AF-C, where the focus will predictively follow your subject. I'm impressed, but skeptical. Need to try this out.

Manual focus
I had forgotten this, but another reason I wanted the more advanced Nikons is that they had a focus rangefinder that you can use with manual focus lenses. I've tried this with my 50mm f1.8 and it works pretty well. This is a nice bonus I had not thought about when stepping up.

Video
I find the video quality very good. My other alternative is a 8 year old Canon powershot, so my standards are modest. The video fills the screen of my laptop, and is sharp and saturated. I can't ask for more. What I was a little disappointed in at first was the focus. The hunting was a bad throwback to my compact camera days (which was a big reason why I saved up for a DSLR). However, in decent light (where our criterion for decent is now bathroom or bedroom illumination) the focus is OK. It takes about 1s for the lens (35mm f1.8) to settle down but I can see fast moving objects being a problem on a wide open lens.

Audio
This bothered me a bit. The mic, being in the body, picks up the sounds from dial adjustment, button presses and of course the focus motor on the lens. This I would definitely say, for even casual shooters: get an external mike. I was pleasantly surprised to find a speaker in the camera for playback.

Compared to m4/3
For the longest time I was looking at the m4/3 cameras, and I was looking very closely at the Olympus line, especially the E-PM-2 and the E-PL-5. High ISO, tiny size, HD video and fast AF, what's not to like?

There were two factors that led me back to Nikon DSLR: a love of shooting through the viewfinder and a lack of familiarity with the m4/3 system.

I have discovered that while I'm fine shooting with an LCD screen, I'm more comfortable using a viewfinder. I'm not sure how I would enjoy using an EVF, and that would add greatly to the cost and a little to the size of the Olympus.

I wasn't sure how quickly you could start up an E-PM-2 and shoot from standby mode. Can you leave it for ever in standby, like you can leave a DSLR?

I have a very small set of F-mount lenses which I really like, I'm not demanding at all, but I really love the 35mm f1.8 AF-S and the D40 kit lens is just fine for well lit situations where the sun is not in the front. In addition I have a 50mm f1.8 with no focus motor. It wasn't clear to me how expensive it would be for me to get similar quality lenses for the m4/3.

Lastly, while the photo experts were raving about the speed of the Olympus AF system and in principle it seemed awesome to be able to hold the camera out and then tap on the screen to focus and shoot, I just wasn't sure. If I was motivated enough I would have gone into a camera store and played with the camera for a while.

It's possible I'm passing up an opportunity to get a tiny camera system with great optics and superb video for a bulkier system with poorer video, but I'm comfortable with the price-performance point of the D5100.

UPDATE: I really liked this review at Andy's blog.

Pandas: HDF: Strange corruption with tables wider than 1306 columns

https://github.com/pydata/pandas/issues/4724