This project is read-only.

direct access to position within a zipped file?

May 19, 2010 at 2:00 AM


Normally one would unzip a compressed file sequentially, from the first to last byte. I however need something different. My program interactively processes a huge (several GB) text file by seeking and then reading from an arbitrary position within the file. Typically during one session the program will need to seek and read several times. I want to more or less retain this behaviour WITHOUT unzipping the whole file upfront. I realise that I will hardly be able to seek to an exact position I need within the compressed file, but perhaps there is a way to seek to some anchor position reasonably close to the one I need and then read sequentially from there?

Thanks and regards,

May 19, 2010 at 2:41 AM

Using a plain zip file, with a single 3gb file within it,  won't satisfy your requirements, because it's not possible to uncompress the middle of a compressed block. 

For your special requirements, you might want to segment the file content before compression, then compress the segments.  This is a way for you to set "the anchor positions" yourself. 

Suppose the file is 3gb.  If you cut it into 100mb segments, you'll have 30 of them.  You can then independently compress each segment and zip them all up in a single zip.  If you then need to read the bytes at the 650mb mark, you know to uncompress the 7th segment only.