Efficient pickling in python

Use the highest protocol you have available - which will usually be binary. So as example:

import cPickle
import pylab

z = pylab.random((10,10))
z  ndarray  10x10: 100 elems, type `float64`, 800 bytes
len(cPickle.dumps(z))                           -> 2371
len(cPickle.dumps(z, cPickle.HIGHEST_PROTOCOL)) -> 934

z = pylab.random((50,50))
z  ndarray  50x50: 2500 elems, type `float64`, 20000 bytes
len(cPickle.dumps(z))                           -> 55586
len(cPickle.dumps(z, cPickle.HIGHEST_PROTOCOL)) -> 20134

For some reason, I thought the default was to use the most efficient protocol, but it isn't.

cPickle.dump?
Type:  builtin_function_or_method
Base Class: 
String Form: 
Namespace: Interactive
Docstring:
   dump(obj, file, protocol=0) -- Write an object in pickle format to the given file.
  
   See the Pickler docstring for the meaning of optional argument proto.

Hence my slow processing and bloated data files...
Note that load figures out the protocol automatically...

Assorted Experience

Search This Blog

Efficient pickling in python

Labels

Comments

Post a Comment

Popular posts from this blog

A note on Python's exit() and errors

Store numpy arrays in sqlite

affixa = the new gattach

Assorted Experience

Efficient pickling in python

Labels

Comments

Post a Comment

Popular posts from this blog

A note on Python's __exit__() and errors

Store numpy arrays in sqlite

affixa = the new gattach

A note on Python's exit() and errors