Sunday, December 9, 2012

Python: using a regexp to do pattern matching on a byte stream

Python regular expressions are a very concise and efficient way of performing pattern matching on strings. Many computing problems involves a similar kind of pattern matching, but on arbitrary data. For my particular application I have a long sequence of one byte digital codes that indicate a sequence of runs for an experiment. Each run starts with the codes 9,9,9 followed by some codes telling me what happened during the experiment and ends with 18,18,18. I need to split up this long sequence of codes into runs and then parse the events in each run.

In the past I would have written a state machine to do this, but I thought, that's a waste: the regexp module already implements logic of this kind. So I came up with the following:

  ec = array.array('B', [ev & 0xff for ev in event_sequence]).tostring()
  separate = re.compile(r"\x09\x09\x09(.*?)\x12\x12\x12")
  trials = separate.findall(ec)

array.array converts the sequence of bytes into a fake string and the regexp does its usual magic.

No comments:

Post a Comment