ZipEntry and Task Parallel library

Oct 4, 2011 at 11:32 AM

I have to use ZipEntry with task parallel library

First I were looking fro clone or copy method of ZipEntry class but didn't found

Problem is that i am looping around ZipEntries and want's to call parallel task on each Zip entry.But looks like the reference is some how mixed

My code is some thing like

 

NineOneParser objParser = null;
            List<Task> taskHandles = new List<Task>();

            using (ZipFile objZip = new ZipFile(ObjMailDatLists.LevelOneValidatedFilePath))
            {
                objParser = new NineOneParser();
                foreach (ZipEntry ze in objZip.Entries)
                {
                    if (ze.FileName.ToLower().EndsWith(".hdr"))
                    {
                        taskHandles.Add(Task.Factory.StartNew(() =>
                                                                  {
                                                                      ZipEntry zeToBeProcessed = ze;
                                                                      LoggingHandler.Log("hdr started parsing completed", EnumLogTypes.Verbose);                                                                     
                                                                      ObjMailDatLists.HDRList = objParser.GetHDRList(zeToBeProcessed, ObjMailDatLists.ObjImportJobInfo);
                                                                      ObjMailDatLists.HDRFileName = zeToBeProcessed.FileName;
                                                                      LoggingHandler.Log("HDR parsing completed", EnumLogTypes.Verbose);
                                                                  }));

                    }
                    if (ze.FileName.ToLower().EndsWith(".seg"))
                    {
                        taskHandles.Add(Task.Factory.StartNew(() =>
                                                                 {
                                                                     ZipEntry zeToBeProcessed = ze;
                                                                     LoggingHandler.Log("seg started parsing completed", EnumLogTypes.Verbose);                                                                     
                                                                     ObjMailDatLists.SEGList = objParser.GetSEGList(zeToBeProcessed, ObjMailDatLists.ObjImportJobInfo);
                                                                     ObjMailDatLists.SEGFileName = zeToBeProcessed.FileName;
                                                                     LoggingHandler.Log("seg parsing completed", EnumLogTypes.Verbose);
                                                                 }));
                    }
                    if (ze.FileName.ToLower().EndsWith(".mpu"))
                    {
                        taskHandles.Add(Task.Factory.StartNew(() =>
                                                                 {
                                                                     ZipEntry zeToBeProcessed = ze;
                                                                     LoggingHandler.Log("mpu started parsing completed", EnumLogTypes.Verbose);                                                                                                                                         
                                                                     ObjMailDatLists.MPUList = objParser.GetMPUList(zeToBeProcessed, ObjMailDatLists.ObjImportJobInfo);
                                                                     ObjMailDatLists.MPUFileName = zeToBeProcessed.FileName;
                                                                     LoggingHandler.Log("mpu parsing completed", EnumLogTypes.Verbose);
                                                                 }));
                    }  
..............//in real i have about 15 odd files
} foreach (Task t in taskHandles) { t.Wait(); } LoggingHandler.Log("All parsing completed", EnumLogTypes.Verbose); objZip.Dispose(); } objParser = null;

 

While the method on each entry looks like

 

public List<HDR> GetHDRList(ZipEntry ze, ImportJob objJob)
        {
            List<HDR> objList = new List<HDR>();
            using (BufferedStream buffer = new BufferedStream(ze.OpenReader()))
            {
                using (StreamReader reader = new StreamReader((buffer), Encoding.ASCII))
                {
                    HDR obj = null;
                    string line = String.Empty;
                    int recordID = 0;
                    while ((line = reader.ReadLine()) != null)
                    {
                        if (line.Trim().Length > 0)
                        {
                            recordID++;
                            obj = new HDR();
                            obj.IdeAllianceVersionOfJobInBaseClass = objJob.imjIdeAllianceVersion;
                            obj.JobID = line.Substring(0, 8).Trim().Length == 0 ? String.Empty : line.Substring(0, 8).ToLower();//casesensitivecheck
                            obj.IDEAllianceVersion = line.Substring(8, 4).Trim().Length == 0 ? String.Empty : line.Substring(8, 4);
                            obj.HeaderHistorySequenceNumber = line.Substring(12, 4).Trim().Length == 0 ? String.Empty : line.Substring(12, 4);
                            obj.HeaderHistoryStatus = line.Substring(16, 1).Trim().Length == 0 ? String.Empty : line.Substring(16, 1).ToUpper();
                            obj.SystemGeneratedJobID = objJob.JobID;
                            obj.RecordRowNumber = recordID;
                            objList.Add(obj);
                            line = null;
                            obj = null;
                        }
                    }
                    reader.Close();
                    reader.Dispose();
                }
                buffer.Close();
                buffer.Dispose();
            }
            return objList;
        }
Problem is that ZipEntries are mixing and wrong list got wrong entry
Any help on it

 

 

Coordinator
Oct 5, 2011 at 4:18 PM

You cannot use multiple threads on a single ZipFile instance.

in other words: Don't do what you are doing.

Oct 10, 2011 at 6:41 AM

Hmmm,

Any Idea Cheeso (Geek) how to improve it via parallelism?

Coordinator
Oct 10, 2011 at 7:31 PM

What are you trying to improve?  Are you trying to improve the library, or improve your use of the library?

And in what way? What is the goal?

Oct 10, 2011 at 9:17 PM

Hi all. I try to use this code and found it faster then the same without AsParallel extension method:

ParallelQuery<ZipEntry> ze = null;
            using (ZipFile zf = new ZipFile(@"G:\test\myFile.zip")) {
                ze = zf.SelectEntries("*.*").AsParallel<ZipEntry>();
            }

            foreach (ZipEntry z in ze) {
                z.Extract(new FileStream(@"G:\temp\" + z.FileName, FileMode.Create, FileAccess.Write));
            }

Nov 7, 2011 at 6:59 PM

I think that ideally you want to overlap I/O with decompression and parsing, so while reading one file from the archive you're decompressing/parsing another previously read one.

Coordinator
Nov 7, 2011 at 9:29 PM

yes, that would make sense.