This project is read-only.

Performance of Multiple calls to ZipEntry.OpenReader

Apr 6, 2009 at 10:11 PM
I'm using the latest Stable build 1.7.

I have an index file in the root folder that maps out different files in one sub directory.
I read the index file using OpenReader and while traversing the index file I read the individual files using further OpenReader entries

In my test case I have 10000 files.. I am clocking each individual iteration and i see a slow degradation of performance.

Is there an suggestions on how to do this kind of reading without degradation?

ZipEntry entry = zip[fileName];
CrcCalculatorStream stream = entry.OpenReader();

This method is much faster than extracting the whole zip file to a temporary directory and then reading the individual files.
Apr 7, 2009 at 10:22 AM
add a "using" statement. It will close the streams.
Apr 7, 2009 at 6:12 PM
I already tried that. But then I looked at the source code for the CrcCalculatorStream. It doesn't implement Close and Dispose. I might be overlooking something here, but I see some of the other Stream classes like DeflateStream implements Close but not Dispose. I am not sure if this is the cause of the problem or if it's getting closed elsewhere.

Also the other questions is if I can use OpenReader simultaneously on two ZipEntries without closing the call to the first.
Apr 16, 2009 at 2:17 AM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Apr 16, 2009 at 2:29 AM
Edited Apr 16, 2009 at 2:40 AM
I believe you cannot use OpenReader() on more than one ZipEntry at a time. The OpenReader() is essentially a wrapper on the read-only stream on the zip file.  The way it works is the library does a Seek() on the internal stream, then opens the DeflateStream on that. For the CrcCalculatorStream() returned from the OpenReader() to work properly, you cannot move the cursor on the zipfile stream while you are reading.   Every time you call OpenReader(), the cursor on the zipfile stream is set.  The conclusion is: use only a single CrcCalculatorStream from OpenReader(), at a time.   

What I can suggest is to read in the entire index first, and then use OpenReader() on each successive entry.  If the index is very large, you could extract it to a file, and then use a separate stream to read it.  If you don't like that idea, you could open two ZipFile() instances using the same filesystem file.  Call OpenReader() on the index with the first ZipFile, and then call OpenReader() on other entries using the 2nd instance of the ZipFile.  

In any case, for a given ZipFile instance, you can use only one CrcCalculatorStream returned from an OpenReader() call, at a time.


There is another possible explanation of the increase in time for each iteration when using OpenReader().  The implementation of OpenReader() seeks on the ZipFile stream.  It could be that on successive calls to OpenReader(), the Seek() takes increasingly longer amounts of time.  But this would be a surprise, because Seek() is going to be fast, in comparison to a DeflateStream.

Can you tell me where the additional time is consumed on successive intervals?  is it in the OpenReader() call, or is it in the actual read and extract operation?