Skip to main content


Showing posts from April, 2014

How to tell if a property is a rental

From a thread here:

The most accurate is to physically go to the place and politely ask the residents - making clear that you are considering buying and are trying to determine how many units are rentals.Online, if you go to the assessor's database if the owner's mailing address is different from the house address it's most certainly not a primary residence. This may mean it's rented out.Online, if you can see the property tax amount, many towns in Mass. have a residential exemption which applies to primary residences only. If the tax seems lower, then this is a primary residence.

What makes marine grade wire special?

I was in the market for pain old 14 AWG wire for a project when I came across spools of wire being sold as "tinned marine grade wire". I was curious to know what made this wire "marine grade". From this page, it turns out, marine grade wire is

12% bigger for a given wire gauge. So 14G marine wire is thicker than regular 14G wireEach strand of the wire is finer and individually coated with tin, making it resistant to corrosionThe insulation on the wire is oil, moisture and heat resistant.

Breaking BAM

It's fun to mess with our bioinformatics tools and laugh at ourselves. The BAM format carries a query name string. In an idle moment I wondered, how long a string can I put here before bad things happen to the BAM?

Generally this string carries information related to the device that produced the read, lane number, run number and so and so forth. It's completely specification free and everyone encodes information here in their own way, so you would think it's an arbitrary length string. Almost.

Try out the short Python snippet below. It generates a dummy fasta file (test.fa) and a dummy BAM file (test.bam) with one 'aligned' read. The only funky thing is that you can make the qname field of the read as long as you want (by varying the lone input parameter to the script).

import sys import pysam try: qnamelen = int(sys.argv[1]) except: qnamelen = 255 with open('test.fa', 'w') as f: f.write('>test\n') f.write('A'*100) bam_…

Adjusting sticking doors

Several of the doors in ye old house were sticking at the top at the latch end. The door was angled too high (or the door frame top was drooping). My solution, in each case, was to use a chisel to deepen the lower hinge seat on the door frame. This allowed the bottom of the door to swing towards the hinge side, bringing the top away from the frame. I could have added shims to the top hinge too. Indeed, for one of the doors I added shims to the top hinge, chiseled away the lower hinge AND chiseled away a bit of the top corner.

Bash conditionals

#!/bin/bash set -x MYVAR=3 if [ ${MYVAR} = 1 ]; then : First option elif [ ${MYVAR} = 2 ]; then : Second option elif [ ${MYVAR} = 3 ]; then : Third option fi -> + MYVAR=3 + '[' 3 = 1 ']' + '[' 3 = 2 ']' + '[' 3 = 3 ']' + : Third option
You need to watch out for whitespace, as bash is sensitive to whitespace

#!/bin/bash set -x MYVAR=3 if [${MYVAR} = 1 ]; then : First option elif [ ${MYVAR} = 2 ]; then : Second option elif [ ${MYVAR} = 3 ]; then : Third option fi -> + MYVAR=3 + '[3' = 1 ']' ./ line 4: [3: command not found + '[' 3 = 2 ']' + '[' 3 = 3 ']' + : Third option
Very, very sensitive to whitespace

#!/bin/bash set -x MYVAR=3 if [ ${MYVAR}=1 ]; then : First option elif [ ${MYVAR} = 2 ]; then : Second option elif [ ${MYVAR} = 3 ]; then : Third option fi -> + MYVAR=3 + '[' 3=1 ']' + : First option

The magic of mmap

Big data is sometimes described as data whose size is larger than your available RAM. I think that this is a good criterion because once the size of your data (or the size of any results of computing on your data) start to approach your RAM size you have to start worrying about how you are going to manage memory. If you leave it up to your OS you are going to be writing and reading to disk in somewhat unpredictable ways and depending on the software you use, your program might just quit with no warning or with a courtesy 'Out of memory' message. The fun challenge of "Big Data" is, of course, how to keep doing computations regardless of the size of your data and not have your computer quit on you. Some calculations can be done in a blocked fashion but some calculations require you to access different parts of the data all at once.

Python's mmap module is an excellent way to let someone else do the dirty work of handling data files that are comparable or larger th…

The logging overhead.

Python makes printing logger messages, and adjusting the logger level (which messages to print when) very easy. However, it seems, that the logger code comes with a higher overhead than if you used simple 'if' statements to control what messages to print.

Logger messages are very, very useful. Two typical uses of logger messages is to debug things when a program goes awry and to print out the progress of a program (if the user wishes it) to reassure us that the program is running and inform us of how much further the program has to go.

Python's logger is a lot of fun because it is very easy to set up logging messages at different levels and then filter which messages you will actually show. For example you can code the program to print out values of some variables at the  'DEBUG' level and print out the progress of the program at the 'INFO' level. Then you can instruct the code to print out only the INFO messages or both the DEBUG and INFO messages.


Bash: print commands as they execute

I'm making a demo script that calls programs from a toolchain I am developing. The idea of the script is that it runs a set of commands in sequence and also prints what the command will be doing. Say the base script was:

I first put in echo to describe the command

echo 'Now we do blah blah' python

But, I wanted to see the commands too. I discovered the set -x command, which starts the debug mode and prints out every command as it is executed.

set -x echo 'Now we do blah blah' python

But, this now, effectively, printed out the description twice: once when printing the echo statement and then again when actually executing the echo statement. Then I discovered the No OPeration (NOP) character for bash ":".

set -x : Now we do blah blah python
This prints the line but does not try to execute it.

Python: passing a mix of keyword arguments and dictionary arguments to a function

So Python is cool because of keyword arguments:

def foo(a=1,b=2,c=3): print a,b,c foo(a=1) # -> 1 2 3
Python is cool because you can pass a dictionary whose keys match the argument names:

def foo(a=1,b=2,c=3): print a,b,c args = {'a': 1, 'b':2} foo(**args) # -> 1 2 3
But, can you mix the two? Yes, yes you can!

def foo(a=1,b=2,c=3): print a,b,c args = {'a': 1, 'b':2} foo(c=3, **args) # -> 1 2 3
Hmm, can we screw up the interpreter? What happens if we send the same argument as a keyword AND a dictionary?

def foo(a=1,b=2,c=3): print a,b,c args = {'a': 1, 'b':2} foo(a=4, **args) # -> TypeError: foo() got multiple values for keyword argument 'a'
Nothing gets past Python, eh?