Friday, December 20, 2013

Quitclaim deeds, land records and all THAT

Looking into some legal stuff I noticed that a deed was labelled 'quitclaim'. I was puzzled by what this meant (it sounded a little shady to me). From the page here it seems like a quitclaim deed is weaker than a warranty deed. A warranty deed states that whoever is giving you the deed is legally obliged to defend any challenges to ownership that arise on the land regardless of how far back in time this challenge originates. The quitclaim deed obliges the grantor to defend any challenges to ownership that arose only while they were owning the property. Any challenges that arise before are excluded. This seems a little shady, because if it is your land, and you are selling it, why would you NOT give the full support of ownership as a warranty deed promises? I started to look into land records (For Massachusetts you can go to http://www.masslandrecords.com/ and do a search based on county) and every transfer of that particular piece of land was quitclaim, going back as far as I could trace it. From a legal blog it seems that the quit claim deed is very commonly used in Massachusetts but I don't know exactly why.

Friday, December 13, 2013

Massachusetts State Government registry for lead in rentals/homes

If you are concerned about lead paint in a building for living in go to http://webapps.ehs.state.ma.us/leadsafehomes. By entering a complete address you can find out if the dwelling has any violations. If you enter just a street name you can find all the violations recorded.

Sunday, December 8, 2013

Docopt is amazing

I love the command line and I love Python. So, naturally, I am an avid user of the argparse module bundled with Python. Today I discovered docopt and I am so totally converted. argparse is great but there is a bunch of setup code that you have to write and often things look very boilerplate-y and messy and it just looks like there should be a more concise way of expressing the command line interface to a program. Enter docopt

docopt allows you to describe your commandline interface in your doc string and then it parses this description and creates a command line parser that returns a dictionary with the values for all the options filled in. Just like that.

So, for example, one of my scripts has a docstring that looks like

Usage:
  compute_eye_epoch [-R DATAROOT] [-x EXCEL] [-d DATABASE] [-e EPOCH] [-f|-F] [-q]

Options:
  -h --help     Show this screen and exit.
  -R DATAROOT   Root of data directory [default: ../../Data]
  -x EXCEL      Spreadsheet with sessions/trials etc [default: ../../Notes/sessions_and_neurons.xlsx]
  -d DATABASE   sqlite3 database we write to [default: test.sqlite3]
  -e EPOCH      Name of epoch we want to process
  -f            Force recomputation of all entries for this epoch
  -F            Force storing of epoch (automatically forces recomputation)
  -q            Quiet mode (print only ERROR level logger messages)

And the __main__ part of the code is

import docopt

if __name__ == '__main__':
  arguments = docopt.docopt(__doc__, version='v1')
  print arguments

If I call the program with -h then I get the usage information printed and the program exits. If I call it with other options args will be filled out, for example:

{'-F': False,
 '-R': '../../Data',
 '-d': 'test.sqlite3',
 '-e': None,
 '-f': False,
 '-q': False,
 '-x': '../../Notes/sessions_and_neurons.xlsx'}

This totally removes the barrier to creating command line interfaces and removes clutter from the __main__ section of the code! Amazing! Give it a try!

Thursday, December 5, 2013

Database diagrams and sqlite on the cheap

Those diagrams that show you your database tables and the links between them through foreign keys are apparently called Entity Relationship Diagrams (ERDs). I wanted to create one for my sqlite database to keep track of everything but I'm a cheapskate and didn't want to pay anything.

It turns out MySQL WorkBench is great for this. You don't need to register with them to download the program. You don't need a MySQL database running for this. I simply followed these steps:

  1. From the sqlite3 commandline I types .schema which printed the database schema to the console.
  2. I pasted the schema into a file and saved it
  3. I used Import from MySQL Workbench to parse the schema and place it on a diagram. 
The Autolayout feature is pretty good and probably optimizes for visual appeal, but I spent a few minutes changing the layout to  what I think worked logically in my head and also minimized connection overlaps. The translation from sqlite3 to MySQL dialects is smooth.

My only complaint with this tool on Mac is that is pretty unstable and crashes on every whim. Save often.


Store numpy arrays in sqlite

Use numpy.getbuffer (or sqlite3.Binary) in combination with numpy.frombuffer to lug numpy data in and out of the sqlite3 database:
import sqlite3, numpy

r1d = numpy.random.randn(10)

con = sqlite3.connect(':memory:')
con.execute("CREATE TABLE eye(id INTEGER PRIMARY KEY, desc TEXT, data BLOB)")
con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", sqlite3.Binary(r1d)))
con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", numpy.getbuffer(r1d)))
res = con.execute("SELECT * FROM eye").fetchall()
con.close()

#res ->
#[(1, u'1d', <read-write buffer ptr 0x10371b220, size 80 at 0x10371b1e0>),
# (2, u'1d', <read-write buffer ptr 0x10371b190, size 80 at 0x10371b150>)]

print r1d - numpy.frombuffer(res[0][2])
#->[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

print r1d - numpy.frombuffer(res[1][2])
#->[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

Note that for work where data types other than float and shapes other than 1d are needed you will also need to keep track of dtype and shape and store those in the database so you can pass them to numpy.frombuffer

Wednesday, December 4, 2013

Monday, December 2, 2013

Pandas, multiindex, date, HDFstore and frame_tables

Currently if you have a dataframe with a multiindex with a date as one of the indexers you can not save it as a frame_table. Use datetime instead.
import pandas as pd, numpy, datetime

print pd.__version__ #-> 0.13.0rc1

idx1 = pd.MultiIndex.from_tuples([(datetime.date(2013,12,d), s, t) for d in range(1,3) for s in range(2) for t in range(3)])
df1 = pd.DataFrame(data=numpy.zeros((len(idx1),2)), columns=['a','b'], index=idx1)
#-> If you want to save as a table in HDF5 use datetime rather than date


with pd.get_store('test1.h5') as f:
  f.put('trials',df1) #-> OK

with pd.get_store('test2.h5') as f:
  f.put('trials',df1,data_colums=True,format='t') #-> TypeError: [date] is not implemented as a table column

#-> Solution is to use datetime

Update: Thanks to Jeff again for the solution