Multiple extract in-parallel ; multithreaded solution

Nov 20, 2008 at 6:53 PM
Edited Nov 20, 2008 at 7:00 PM
Hi,
    I am trying to extract some zip files, in parallel. The zip files have different names though, they all contains same named files (with different contents). When i try to do that it throws exceptions such as "CRC error: the file being extracted appears to be corrupted.....BadCrcException.....InternalExtract ...line 1369."
   It does not happen, if i only extract one file or extract file one after the other. I try to use "TempFileFolder" but it also start raising some exceptions.
The code is as follows:

zFile = ZipFile.Read(_tempFolder + "\\" + fileName);

 zFile.ExtractAll(zDir, true);

           Let me know, how can i do that and also give me some performance tip. Thanks,
Regards,
Osman

Coordinator
Nov 20, 2008 at 7:00 PM
So you have multiple instances of the ZipFile class that you are using in multiple threads - is that right?
or do you have a single instance of the ZipFile class that you are using in multiple threads ?

Can you give me a simple test case that reproduces the behavior you're experiencing?
Nov 20, 2008 at 8:11 PM
Hi, Multiple threads working with different instances. I did not purely made threads but use

http://www.codeplex.com/smartthreadpool to simulate threads i.e. call the function. The function has a local variable ZipFile type and just extracts the zip file. Now, i have thousands of zip file. All the zip files have same set files inside ( the name of the files are same, although contents are different e.g say Address.txt file is present in every zip file having different address).

use this function :

private

 

void checkFolder()

 

{

 

string[] fileEntries = Directory.GetFiles(_uploadFolder,"*.zip");

 

 

foreach (string fileName in fileEntries)

 

{

_pool.QueueWorkItem(

new WorkItemCallback(this.processZip),Path.GetFileName(fileName));

 

}

}

private

 

object processZip(object fileName)

 

{

string

 

zDir = _tempFolder + "\\" + Path.GetFileNameWithoutExtension(fileName.ToString());

 

 

 

 

if(System.IO.Directory.Exists(zDir)==false)

 

System.IO.

Directory.CreateDirectory(zDir);

 

zFile =

ZipFile.Read(_tempFolder + "\\" + fileName);

 

zFile.ExtractAll(zDir,

true);

 


}       


        Thanks,
Regards, Osman
           
Coordinator
Nov 20, 2008 at 9:36 PM
ok let me have a look.
Nov 21, 2008 at 2:58 PM
Let me give you another tip.
                        It works; if you use lock or sync. But it will defeat the purpose of the application. The application will not be truely multi-threaded. The processing would be one after the other.
Nov 25, 2008 at 6:39 PM
Hi,
   I looked at the code and found several "lock". It means that the component is not true multi-threaded. The block will be executed only one at a time. I also noticed that most of the lock are used in the events OnBegin, OnEnd...
 
I would recommend that there should be some way to turn off all the event generation, it will make it faster and may be also multi-threaded. Thanks,
Nov 26, 2008 at 6:05 AM
Edited Nov 26, 2008 at 6:11 AM
That should lock only the current instance and not the other ones !?
Coordinator
Dec 18, 2008 at 11:51 PM
The ZipFile instance is not multi-thread friendly.  I've updated the doc to state that.  See change set 27058.  workitem 5182:
http://www.codeplex.com/DotNetZip/WorkItem/View.aspx?WorkItemId=5182

On the other hand, in your case, using multiple threads working with different instances of ZipFile, is supported and will work, if you have the v1.7.1.6 release or later.

There was a bug in releases prior to that, which initialized a static crc table in the instance constructor. This resulted in the BadCrcException which you originally reported. See work item 6637:  http://www.codeplex.com/DotNetZip/WorkItem/View.aspx?WorkItemId=6637