Read zip entry as a stream

Jun 27, 2008 at 6:03 PM
  Hello!

I was looking for ZIP library for my small project or few days and it seems to me finally I found everything I need! Thanks for your great job!

I am wondering if you point me the easiest way to convert my existent code like:

                StreamReader file = new System.IO.StreamReader(_fileName, _fileEncoding);

                while (!file.EndOfStream)
                {
                  ...
                }

into code with ZIP file support. I'm trying to rewrite code like this:

                ZipFile zip = new ZipFile(_zipFileName);
                foreach (ZipEntry ze in zip)
                {
                    if (ze.FileName.EndsWith (_fileName))
                    {
                        StreamReader file = new StreamReader(May I get stream from ZipEntry here?);
                        while (!file.EndOfStream)
                        {
                         ...
                        }
                     }
                }

but unfortunately can't find proper solution to get stream from ZipEntry.

  Thank you in advance!


Coordinator
Jun 27, 2008 at 7:07 PM

you can extract to an output stream.

One of the methods on the ZipFile is Extract(Filename, OutputStream);

does that help?

Jun 27, 2008 at 8:44 PM
Edited Jun 27, 2008 at 10:45 PM
I have tried this:

                MemoryStream memStream = new MemoryStream();

                ZipFile zip = new ZipFile(_zipFileName);
                foreach (ZipEntry ze in zip)
                {
                    if (ze.FileName.EndsWith (_fileName))
                    {
                        ze.Extract(memStream);
                     }
                }

                memStream.Seek(0, SeekOrigin.Begin);

                StreamReader file = new StreamReader(memStream, _fileEncoding);
                while (!file.EndOfStream)
                 {
                         ... read file
                 }

It's works fine but consumes a lot of memory on large files because of temporary MemoryStream. I just had expected a direct way to read ZipEntry as a stream... Something like:

       StreamReader file = new StreamReader (zipEntry.ToStream(), _encoding);
       ... read file line by line

But anyway, thanks for your response!

Coordinator
Jun 27, 2008 at 10:37 PM
Sounds like a good feature request. . .
Jun 27, 2008 at 11:06 PM
I think it would be nice to have such a feature. Please consider on this :)

Because I read this file twice, my final piece of code looks like:

        byte[] fileArray = new byte[0];
        using (ZipFile zip = new ZipFile(_zipFileName))
        {
            foreach (ZipEntry ze in zip)
            {
                if (ze.FileName.EndsWith(_fileName))
                {
                    using (MemoryStream ms = new MemoryStream())
                    {
                        ze.Extract(ms);
                        fileArray = ms.ToArray();
                    }
                    break;
                }
            }
        }

        StreamReader file = new StreamReader(new MemoryStream (fileArray));
        ... read encoding specified in file
        file.Close();

        file = new StreamReader(new MemoryStream (fileArray), _fileEncoding);
        ... read file again with correct encoding

Coordinator
Jul 5, 2008 at 8:48 PM
Abudaba, I added this capability in v1.6, check it out, see if it satisfies you. . .
Jul 5, 2008 at 11:16 PM
Thanks a lot! I'll check it out.

You're doing a great job, really.
Jul 9, 2008 at 10:41 AM
Abudaba,
Could you please provide me the full code for reading the entry into a stream directly

I don't know how you get the correct encoding when you are reading the file for the second time.

Regards

M
Jul 9, 2008 at 12:49 PM
It's a file with a certain format. Actually, it's a OpenOffice spelling dictionary. It has special option like:

SET ISO8859-1

So I just parse this file looking for this SET option first, then re-open it with specified encoding.

There a bit of code, and I don't think it will be suitable for your purposes.

Coordinator
Jul 9, 2008 at 4:38 PM
So, Abudaba, what did you think of the v1.6 feature?
didja try it?
Jul 9, 2008 at 7:38 PM
I have tried to try it :) But unfortunately for my small project I use Visual C# Express, and I have no SVN client installed on my home laptop, so it's really complicated to check it out and compile.

Sorry :(

Coordinator
Jul 9, 2008 at 8:43 PM
Abudaba,

a) you don't need a SVN client to check out code to build.
- if you want to check out code, you can use the codeplex client thing, which is small and easy to download and run.  It is on 
http://www.codeplex.com/CodePlexClient/Release/ProjectReleases.aspx?ReleaseId=4423
- you don't need any source-control client at all.  Just go the the Source tab and click on the download link for the change set.  You will get a zip file.  Unpack it and all the source is there. No need for check-out.

b) You don't need to compile the source to use it.  The change I described is in the v1.6 preview binary, which is available on the Releases tab for DotNetZip.

can you give it a try?
let me know what you think. . .
Jul 9, 2008 at 11:25 PM
Edited Jul 9, 2008 at 11:48 PM
Sorry, I haven't seen "Planned" tab on the "Releases" page:)

I have updated to 1.6 and have got ArgumentException at once with message "Invalid input", paramname: "outstream | basedir". Context is:

        using (MemoryStream ms = new MemoryStream())
        {
            ze.Extract(ms);
            affarray = ms.ToArray();
        }

With 1.5 works fine.
Coordinator
Jul 10, 2008 at 5:57 AM

yeap, a very basic bug there. I've uploaded a new binary.

you can try again when you have a  moment.

 

Jul 12, 2008 at 8:11 PM
Sorry for delay in response and thank you for bugfix!

Today I have tried 1.6 again. I have rewrote my piece of code like this:

#if (ZIPSTREAM)
                            AnalyseFile(zipFile + ": " + ze.FileName, e, worker,
                                ze.OpenReader());
#else
                            byte[] filearray = new byte[0];
                            using (MemoryStream ms = new MemoryStream())
                            {
                                ze.Extract(ms);
                                filearray = ms.ToArray();
                            }
                            AnalyseFile(zipFile + ": " + ze.FileName, e, worker,
                                new MemoryStream(filearray));
                            filearray = null;
#endif

As you can see, now code is much smaller and much more clear, it's cool! I have compared performance and memory consumption for both pieces of code. Stream have smaller memory footprint, but it is a little bit slower (about 5%). I think it's predictable, so there is nothing to complain.

So, in general it works just fine. Thank you very much!


Coordinator
Jul 12, 2008 at 9:38 PM

I'm glad it works.  It's odd that it is slower; I would think the opposite would be true.

I guess it depends how your routine "AnalyzeFile" works.  If it reads a few bytes at a time from the stream you pass in, then the OpenReader() option would be slower, I'd expect.  The normal ZipEntry.Extract(Stream) routine can extract in big chunks (4k at a time).  So if your AnalyzeFile is reading only a few bytes at a time, I'd guess it would be less efficient.  The cost of reading a few bytes at a time from a MemorySstream is probably much lower, than the cost of reading a few bytes at a time from a stream connected to disk i/o (which is what you get with OpenReader(). If you want to play with it, you could try to wrap the output of OpenReader() in a BufferedStream(), and measure the difference.  See if that works and if it speeds things up a bit.

eg

AnalyseFile(zipFile + ": " + ze.FileName, e, worker,
                                new BufferedStream(ze.OpenReader()));


Not sure if this will work, but worth trying. Anyway, I'm glad you like it.

 

 

Jul 16, 2008 at 5:27 PM
Cheeso,

Just a word to say that this feature is great, I tried using it in SSIS and works fine. I am speaking about you here:

http://sqlblog.com/blogs/alberto_ferrari/archive/2008/07/16/reading-zip-files-with-ssis.aspx

Thanks.

Alberto
Coordinator
Jul 16, 2008 at 5:35 PM

Ferrari - you have a speedy name.

I'm glad you found it useful!