Skip to main content

Numpy: a note on slicing efficiency

In my application I have three arrays of the same length (and datatype) (say p1, p2 and p3) that have a one to one correspondence with each other. My code works on the arrays in blocks and has to concatenate fragments of these three arrays together. In the rest of my code I take matched slices through these arrays (say p1[0:100], p2[0:100], p3[0:100]) and pass them onto functions for processing,

My first instinct was to create a two-dimensional array since this encapsulates the data nicely, even though the data do not, semantically, constitute a matrix. It was at this point I came across a fact that I had not considered before: taking slices from a 2D array is more time expensive than taking slices from a 1D array.

For example:

In [397]: %%timeit p = numpy.empty((3, 1e6))
   .....: x = p[1, 0:2]
   .....: 
100000 loops, best of 3: 2.18 µs per loop

Vs

In [398]: %%timeit p = numpy.empty(1e6)
   .....: x = p[0:2]
   .....: 
1000000 loops, best of 3: 512 ns per loop

Row-column shape does not make a difference:

In [399]: %%timeit p = numpy.empty((1e6, 3))
x = p[0:2, 1]
   .....: 
100000 loops, best of 3: 1.97 µs per loop

Additionally, A bit of parametric investigation reveals that, interestingly, the size of the slice does not matter:

In [402]: %%timeit p = numpy.empty((3, 1e6))
x = p[1, 0:100000]
   .....: 
100000 loops, best of 3: 2.17 µs per loop

Vs

In [401]: %%timeit p = numpy.empty(1e6)
x = p[0:100000]
   .....: 
1000000 loops, best of 3: 507 ns per loop

I think the most likely reason for this is that we are no longer pulling contiguous memory locations but rather having to pull together fragments from different memory locations and do proper offset computations while we do this. I can't understand why the time is independent of size for the 2D array, though!

So these consideration led me to the following formulation. I first verified that a concat of 2D fragments was not faster than repeated concats of separate 1-D fragments:

In [404]: %%timeit p1 = numpy.empty((3, 1e6)); p2=numpy.empty((3,1e2))
   .....: p = numpy.concatenate((p1, p2),axis=1)
   .....: 
100 loops, best of 3: 8.15 ms per loop

Vs

In [405]: %%timeit p1 = [numpy.empty(1e6) for _ in range(3)]; p2=[numpy.empty(1e2) for _ in range(3)]
   .....: p = [numpy.concatenate((pa, pb)) for pa, pb in zip(p1, p2)]
   .....: 
100 loops, best of 3: 9.34 ms per loop

And then used a list of arrays instead of a 2D array. This works for me because I do a lot of slicing of the resultant arrays and I never need to do any matrix operations with the three rows arranged as a matrix.

Comments

  1. Since my previous comment didn't seem to get posted... here's my question. If efficiency / speed is a concern, why are you using Python?

    ReplyDelete
  2. Hey Ben, I don't see any comment in the moderation list - did you get any acknowledgement that the comment went through?

    Ok, to answer your question:

    Python shines as maintainable code when written idiomatically but Python can't get anywhere near C speeds unless you do some tricks (like write parts in Cython) which makes the code unreadable and hard to maintain.

    However, this does not mean you shouldn't profile your code and remove bottlenecks, especially if the modified code is still idiomatic Python.

    ReplyDelete
  3. Also, the blog was a getting a lot of spam which cluttered up the moderation queue so I think I set it so that only registered users can comment. I will check to make sure folks with blogger accounts/openid can comment.

    ReplyDelete

Post a Comment

Popular posts from this blog

A note on Python's __exit__() and errors

Python's context managers are a very neat way of handling code that needs a teardown once you are done. Python objects have do have a destructor method ( __del__ ) called right before the last instance of the object is about to be destroyed. You can do a teardown there. However there is a lot of fine print to the __del__ method. A cleaner way of doing tear-downs is through Python's context manager , manifested as the with keyword. class CrushMe: def __init__(self): self.f = open('test.txt', 'w') def foo(self, a, b): self.f.write(str(a - b)) def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.f.close() return True with CrushMe() as c: c.foo(2, 3) One thing that is important, and that got me just now, is error handling. I made the mistake of ignoring all those 'junk' arguments ( exc_type, exc_val, exc_tb ). I just skimmed the docs and what popped out is that you need to return True or

Using adminer on Mac OS X

adminer is a nice php based sqlite manager. I prefer the firefox plugin "sqlite manager" but it currently has a strange issue with FF5 that basically makes it unworkable, so I was looking for an alternative to tide me over. I really don't want apache running all the time on my computer and don't want people browsing to my computer, so what I needed to do was: Download the adminer php script into /Library/WebServer/Documents/ Change /etc/apache2/httpd.conf to allow running of php scripts (uncomment the line that begins: LoadModule php5_module Start the apache server: sudo apachectl -k start Operate the script by going to localhost Stop the server: sudo apachectl -k stop