Friday, November 14, 2008

Python tips: pickling it right

Here's an often-seen Python snippet...:

pickle.dump(stuff, open('foo.pik', 'w'))

What's wrong with this? Well, several things, as it turns out...!
  1. Use cPickle, not pickle: that will speed things up by 5 or 6 times, effortlessly.

  2. The common, sloppy use of open without a corresponding close is theoretically OK in today's cPython, but there's really no good reason to support it. Be neat instead, and write (after a from __future__ import with_statement if you're still using Python 2.5):

    with open('foo.pik', 'w') as f:
    cPickle.dump(stuff, f)

  3. Unless there's a very special reason to make you want the pickle dump to be in ASCII (and I've hardly ever seen a good one), don't just use pickle's default, legacy protocol! Rather, explicitly request protocol 2, or better still, unless you need pickle files loadable by older releases of Python, request "the best protocol available".

So, the best equivalent of that little sloppy but alas-too-common idiom is:

with open('foo.pik', 'wb') as f:
cPickle.dump(stuff, f, cPickle.HIGHEST_PROTOCOL)

Don't forget the little b in 'wb', by the way — it won't matter under Linux, OSX, or Solaris, but it will matter in Windows... and, anyway, as we all know, explicit is better than implicit!-)

No comments: