Skip to main content

Posts

Showing posts from February, 2013

Efficiently reading structured binary files using numpy's fromfile method

nev_packet = pylab.dtype([     ('nstx', 'h'),     ('npkt_id', 'h'),     ('npkt_data_size', 'h'),     ('timestamp', 'Q'),     ('eventid', 'h'),     ('nttl', 'H'),     ('ncrc', 'h'),     ('ndummy1', 'h'),     ('ndummy2', 'h'),     ('dnExtra', '8i'),     ('eventstring', '128c')   ])   data = pylab.fromfile(fin, dtype=nev_packet, count=-1)

Using -1 to indicate "upto the end"

I was writing some code in Python to read a binary file, extract packets of information and write out subsets of information to different files. The code was, in essence, this: def func(f, max_packets=-1,buffer_len=100):     if buffer_len < max_packets:        buffer_len = max_packets     d = f.read(buffer_len)     while len(d):        #Process        d = f.read(buffer_len) There was a bunch of other code in between the buffer_len adjustment and the loop, and I wasn't doing a direct read but using numpy.fromfile. Anyway, basically when I wanted to read the whole file I would put in max_packets=-1. This would make buffer_len=-1. This would in turn try to read in the whole file into memory causing a segfault (sometimes I would get an out of memory error, which was more informative) This bug took me several hours to track down.