Monday, January 18, 2016

Python: Getting position in a compressed file

Problem: Python's gzip module is awesome, but poses a problem when we wish to report reading progress along a compressed file: .tell() returns the total bytes read but we usually only have the compressed size of the total file available, e.g. via os.path.getsize.

Solution: One trick is probably to extract the original size of the gzipped file from the last 4 bytes in the file, but, there are interesting caveats to this. Another method, slightly more principled, is to use .raw.fileobj.tell which, interestingly, reports the position on the compressed file, rather than the uncompressed stream.

No comments:

Post a Comment