Skip to main content

Posts

Showing posts from February, 2013

Efficiently reading structured binary files using numpy's fromfile method

nev_packet = pylab.dtype([
    ('nstx', 'h'),
    ('npkt_id', 'h'),
    ('npkt_data_size', 'h'),
    ('timestamp', 'Q'),
    ('eventid', 'h'),
    ('nttl', 'H'),
    ('ncrc', 'h'),
    ('ndummy1', 'h'),
    ('ndummy2', 'h'),
    ('dnExtra', '8i'),
    ('eventstring', '128c')
  ])
  data = pylab.fromfile(fin, dtype=nev_packet, count=-1)

Using -1 to indicate "upto the end"

I was writing some code in Python to read a binary file, extract packets of information and write out subsets of information to different files.

The code was, in essence, this:

def func(f, max_packets=-1,buffer_len=100):
    if buffer_len < max_packets:
       buffer_len = max_packets
    d = f.read(buffer_len)
    while len(d):
       #Process
       d = f.read(buffer_len)


There was a bunch of other code in between the buffer_len adjustment and the loop, and I wasn't doing a direct read but using numpy.fromfile.

Anyway, basically when I wanted to read the whole file I would put in max_packets=-1. This would make buffer_len=-1. This would in turn try to read in the whole file into memory causing a segfault (sometimes I would get an out of memory error, which was more informative)

This bug took me several hours to track down.