Skip to main content

Pandas: the frame_table disk space overhead

When a Pandas DataFrame is saved (via PyTables) to hdf5 as a frame_table there is a varying amount of disk space overhead depending on how many columns are declared as data_columns (i.e. columns you can use to select rows by). This overhead can be rather high.


import pandas as pd, numpy

df = pd.DataFrame(numpy.random.randn(1000000,3),columns=['a','b','c'])
df.to_hdf('data_table_nocomp.h5','data') #-> 32 MB
df.to_hdf('data_normal.h5','data',complevel=9,complib='bzip2') #-> 21.9 MB
df.to_hdf('data_table.h5','data',complevel=9,complib='bzip2',table=True) #-> 22.5 MB
df.to_hdf('data_table_columns1.h5','data',complevel=9,complib='bzip2',table=True,data_columns=['a']) #-> 29.1 MB
df.to_hdf('data_table_columns2.h5','data',complevel=9,complib='bzip2',table=True,data_columns=['a','b']) #-> 35.8 MB
df.to_hdf('data_table_columns3.h5','data',complevel=9,complib='bzip2',table=True,data_columns=['a','b','c']) #-> 42.4 MB
df.to_hdf('data_table_columns3_nocomp.h5','data',table=True,data_columns=['a','b','c']) #-> 52.4 MB

Comments

Popular posts from this blog

Flowing text in inkscape (Poster making)

You can flow text into arbitrary shapes in inkscape. (From a hint here).

You simply create a text box, type your text into it, create a frame with some drawing tool, select both the text box and the frame (click and shift) and then go to text->flow into frame.

UPDATE:

The omnipresent anonymous asked:
Trying to enter sentence so that text forms the number three...any ideas?
The solution:
Type '3' using the text toolConvert to path using object->pathSize as necessaryRemove fillUngroupType in actual text in new text boxSelect the text and the '3' pathFlow the text

Drawing circles using matplotlib

Use the pylab.Circle command

import pylab #Imports matplotlib and a host of other useful modules cir1 = pylab.Circle((0,0), radius=0.75, fc='y') #Creates a patch that looks like a circle (fc= face color) cir2 = pylab.Circle((.5,.5), radius=0.25, alpha =.2, fc='b') #Repeat (alpha=.2 means make it very translucent) ax = pylab.axes(aspect=1) #Creates empty axes (aspect=1 means scale things so that circles look like circles) ax.add_patch(cir1) #Grab the current axes, add the patch to it ax.add_patch(cir2) #Repeat pylab.show()

Running a task in a separate thread in a Tkinter app.

Use Queues to communicate between main thread and sub-threadUse wm_protocol/protocol to handle quit eventUse Event to pass a message to sub-threadimport Tkinter as tki, threading, Queue, time def thread(q, stop_event): """q is a Queue object, stop_event is an Event. stop_event from http://stackoverflow.com/questions/6524459/stopping-a-thread-python """ while(not stop_event.is_set()): if q.empty(): q.put(time.strftime('%H:%M:%S')) class App(object): def __init__(self): self.root = tki.Tk() self.win = tki.Text(self.root, undo=True, width=10, height=1) self.win.pack(side='left') self.queue = Queue.Queue(maxsize=1) self.poll_thread_stop_event = threading.Event() self.poll_thread = threading.Thread(target=thread, name='Thread', args=(self.queue,self.poll_thread_stop_event)) self.poll_thread.start() self.poll_interval = 250 self.poll() self.root.wm_protocol("WM_DELETE…