Skip to main content

Pandas: the frame_table disk space overhead

When a Pandas DataFrame is saved (via PyTables) to hdf5 as a frame_table there is a varying amount of disk space overhead depending on how many columns are declared as data_columns (i.e. columns you can use to select rows by). This overhead can be rather high.


import pandas as pd, numpy

df = pd.DataFrame(numpy.random.randn(1000000,3),columns=['a','b','c'])
df.to_hdf('data_table_nocomp.h5','data') #-> 32 MB
df.to_hdf('data_normal.h5','data',complevel=9,complib='bzip2') #-> 21.9 MB
df.to_hdf('data_table.h5','data',complevel=9,complib='bzip2',table=True) #-> 22.5 MB
df.to_hdf('data_table_columns1.h5','data',complevel=9,complib='bzip2',table=True,data_columns=['a']) #-> 29.1 MB
df.to_hdf('data_table_columns2.h5','data',complevel=9,complib='bzip2',table=True,data_columns=['a','b']) #-> 35.8 MB
df.to_hdf('data_table_columns3.h5','data',complevel=9,complib='bzip2',table=True,data_columns=['a','b','c']) #-> 42.4 MB
df.to_hdf('data_table_columns3_nocomp.h5','data',table=True,data_columns=['a','b','c']) #-> 52.4 MB

Comments

Popular posts from this blog

Python: Multiprocessing: passing multiple arguments to a function

Write a wrapper function to unpack the arguments before calling the real function. Lambda won't work, for some strange un-Pythonic reason.


import multiprocessing as mp def myfun(a,b): print a + b def mf_wrap(args): return myfun(*args) p = mp.Pool(4) fl = [(a,b) for a in range(3) for b in range(2)] #mf_wrap = lambda args: myfun(*args) -> this sucker, though more pythonic and compact, won't work p.map(mf_wrap, fl)

Flowing text in inkscape (Poster making)

You can flow text into arbitrary shapes in inkscape. (From a hint here).

You simply create a text box, type your text into it, create a frame with some drawing tool, select both the text box and the frame (click and shift) and then go to text->flow into frame.

UPDATE:

The omnipresent anonymous asked:
Trying to enter sentence so that text forms the number three...any ideas?
The solution:
Type '3' using the text toolConvert to path using object->pathSize as necessaryRemove fillUngroupType in actual text in new text boxSelect the text and the '3' pathFlow the text

Running a task in a separate thread in a Tkinter app.

Use Queues to communicate between main thread and sub-threadUse wm_protocol/protocol to handle quit eventUse Event to pass a message to sub-threadimport Tkinter as tki, threading, Queue, time def thread(q, stop_event): """q is a Queue object, stop_event is an Event. stop_event from http://stackoverflow.com/questions/6524459/stopping-a-thread-python """ while(not stop_event.is_set()): if q.empty(): q.put(time.strftime('%H:%M:%S')) class App(object): def __init__(self): self.root = tki.Tk() self.win = tki.Text(self.root, undo=True, width=10, height=1) self.win.pack(side='left') self.queue = Queue.Queue(maxsize=1) self.poll_thread_stop_event = threading.Event() self.poll_thread = threading.Thread(target=thread, name='Thread', args=(self.queue,self.poll_thread_stop_event)) self.poll_thread.start() self.poll_interval = 250 self.poll() self.root.wm_protocol("WM_DELETE…