Granular events tests

Nov 19, 2008 at 6:11 PM
Hello Cheeso,

I done some tests with the last 1.7 version and I saw one problem and 2 small optimizations.

Try to make a zip with the winform example :
1 - files seems to be read twice, you will see 2 times the second progressbar with the same file (1 of 1 files ...) the second time is faster than the first one (don't know if it's because the file is in memory), also if these files are read twice there is a design class problem (performance problem).
2 - If i zip/unzip only one file both progressbars should have the same step <= this is a detail
3 - the second progressbar should be the first IMHO (the final progressbar in the bottom should be more logic) <= this is a detail ;)

After some checking, I see that you really read twice the file.

if (nCycles > 1) readAgain = false;
else if (!outstream.CanSeek) readAgain = false;
else if (cipher != null && CompressedSize - 12 <= UncompressedSize) readAgain = false;
else readAgain = WantReadAgain();

I don't understand why the file should be read twice.
Nov 19, 2008 at 6:59 PM
Hey Domz, thanks for testing this out.

Files are read twice by the library if running the entry through the DeflateStream class results in an increase in the data size, and if you haven't done anything with ForceNoCompression  or the WillReadTwiceOnInflation callback.   This is the defined, documented, and correct behavior of the library.   If you don't want to read a file twice check the doc for how to avoid it.

I'm not sure I will fiddle with the positioning of the progress bars.  I'm inclined to just let them be.
And the optimization for #2 - I agree that would be kind.  Will look into that.
Nov 19, 2008 at 7:49 PM
Edited Nov 20, 2008 at 2:22 AM
Why not reading one time and storing the content into byte[]  and use this array with all your compression processing and post compress operation (test compress/uncompress sizes) ?
What is taking the most processing time, read the file or compress the data ?

May be byte[] is a bad choice for memory !?

Sorry to ask you such questions, i am quite a rookie in C# and I would like to understand your design choices.

Also, it seems that there is majors drawbacks to using "System.IO.Compression", I think you should have a look at the code of the OpenSource component "SharpZipLib" which have is own zip algorithm (which support ZIP64 and Zip compression level).
SharpZipLib is most used Zip library in C# environment (so it has been tested deeply).

But don't misunderstanding what I say, your lib is still the best because :
1 - Your support is awesome !
2 - Your library is compliant with the last Zip specification (unicode), is not the case for "SharpZipLib"

Thanks a lot
Nov 19, 2008 at 10:38 PM

Reading the file data once and retaining it, works for small files.  zipping up an 8k file is fine.  zipping and caching an 8m file is less responsible.  retaining an 80mb file is much feasible.   Zipping an 800mb file just won't work. 

The caching approach does not work for larger files.

In fact storing (caching) the file data is what DotNetZip did previously.
I changed it in response to requirements from customers.   I changed the architecture to use a streaming approach, which means the library must read the file twice if the compression does not actually reduce the size of the data the first time through..

This is not a C# thing, it's a basic architecture thing.
Yes, it is a trade-off of speed versus size (size of memory consumed).

I agree that it would be nice to have a more flexible compression algorithm. 
For now it is pluggable and I may be able to replace it at some point in the future.

I know of SharpZipLib but I have not looked at it.

Thanks for the compliments.