DotNetZip uses only a single CPU on a multi-cpu machine when zipping up archives, only a single core on multi-core machines.
Creation of ZIP files with this library is a CPU-bound operation. The compression and encryption are the CPU-intensive bits. The DEFLATE algorithm repeatedly scans through the data to be compressed, looking for repeated sequences in the data. It is effectively a search. But not only does it look for repeated sequences, it searches for the longest repeated sequences, which means repeated scans through the data. In contrast the encryption is not as expensive. And relatively speaking, the IO done by DotNetZip is not important in terms of a performance limiter.
There are a couple ways to go about exploiting multi-core, multi-cpu machines:
- parallelize the DEFLATE algorithm. There is some research here, need to see if appropriate and practical to integrate into DotNetZip
- use mutliple threads to parallelize the processing of each file, then re-assemble the files back together into the zip archive.
Another option to explore for improving performance, independent and complementary to this one, is to exploit a native-C++ version of ZLIB. That is for another workitem.