Occasional Bad CRC's when Zipping one zip file into another (VB .NET)

Oct 25, 2010 at 8:29 PM

I am using DotNetZip to create backups of directories and their contents with VB .Net.  For security reasons, I chose to use a zip file obscured inside a second zip file, so that contents aren't as visible, with the encapsulating zip file being uncompressed & AES-256 encrypted.  It contains both the lightly-compressed "Inner" zip file of the backup itself and a small text "label" file with version info and a few other bits of information.

Occasionally, the resulting zip gets corrupted during the zip operation when it is being encapsulated in the Outer Zip file; I know it's here because I have the program check the file contents list after each zip operation to ensure that there isn't a failure to generate a proper zip file, and it doesn't throw any errors in between the two operations, run sequentially.  It's a bit of a rough way to check but it has worked until now. I haven't been able to replicate the error in testing, and this has only happened a few times out of a few hundred runs, but is troubling due to my need for consistent backups. 

I have to create the Inner zip file as a second file rather than an output stream because of memory limitations on the systems that we are running it on (a few of our different production computers running XP / Vista / 7, so far no commonality).  I am adding a Threading.Thread.Sleep(2000) to be 100% sure the transition from "DotNetZip-XXXXX.tmp" to resulting "Inner" zip file isn't being caught in mid-stream and that all disk access has completed before the encapsulation begins.

I was hoping someone could help here:

Is there a way I can get the contents of the Inner Zip file after the encapsulation, while it is inside the Outer? Is there a way to do this without extracting the files, even temporarily?  If so, I could check the contents list to verify it's not corrupt.  Is there a way I can check the CRC values of the inner files a different way? Am I not seeing something else?

If anyone could provide any other insights, I'd really appreciate it.  Thanks!

Here is the core of the compressing sub, with a few bits chopped out above and below... not very pretty, but it usually works ;) .  It handles the "Outer" zip / "Inner" zip switch with "blnCompressing" - if true, create an "Inner" zip / if false, create an "Outer":

 

            Using zip1 As ZipFile = New ZipFile
                zip1.UseZip64WhenSaving = Zip64Option.AsNecessary
                zip1.Name = strZipToCreatePub
                    If blnCompressingZip Then
                        zip1.CompressionLevel = Ionic.Zlib.CompressionLevel.BestSpeed
                        zip1.AddDirectory(strToBeZippedPub, "")
                        zip1.Comment = strComment & ", " & Now.ToString("hh:mm tt") & ", " & Now.ToString("MM-dd-yy")
                    Else
                        zip1.Encryption = EncryptionAlgorithm.WinZipAes256
                        zip1.Password = strPasswordPub
                        zip1.CompressionLevel = Ionic.Zlib.CompressionLevel.None
                        zip1.AddFile(strLabelFilePub, "")
                        zip1.AddFile(strToBeZippedPub, "")
                    End If
                Me._entriesToZip = intAllFilesAndDirs       'this is the total # of files that are being zipped to display in the progress bar
                Me.SetProgressBars()
                AddHandler zip1.SaveProgress, New EventHandler(Of SaveProgressEventArgs)(AddressOf Me.zip1_SaveProgress)
                zip1.Save(zip1.Name)
            End Using

 

 

Coordinator
Oct 27, 2010 at 1:45 AM

I understand what you're doing, and it makes sense to me.

What you want to do, I think, is validate the outer zip after it's been written.  I think you said you want to validate the contents of the inner zip, but without extracting those files to the filesystem.

To do this, open the outer zip for reading with ZipFile.Read().  There's just one ZipEntry in it, if I understand the situation correctly.  Call ZipEntry.OpenReader() on that entry, which gives you a readable stream.  You can then create a ZipInputStream around that read-only stream

You also may be able to wrap the result of ZipEntry.OpenReader() in a BufferedStream, and call ZipFile.Read() on THAT.

You cannot succeed by calling ZipFile.Read() using the result of ZipEntry.OpenReader(), in order to read and validate the inner zip.  This is because when the ZipFile class is used to read a zip file, it seeks around in the file.  It seeks to the end, reads some things, then seeks backwards, and so on.  The result of ZipEntry.OpenReader(), though, is a read-forward-only stream.  It doesn't support seeks.  Using a BufferedStream basically just caches all the inner zip data, and so once again Seek() will work. But be careful, this only works if the buffer size is as large as the entire inner zip file.  Maybe impractical.

On the other hand, the ZipInputStream does support reading from a read-forward-only stream, so it can be used to validate the result of ZipEntry.OpenReader().  

The interface you want to use on ZipInputStream is GetNextEntry();  call that and then call ZipInputStream.Read() to read the de-compressed content of the entry in the inner zip.  Do GetNextEntry + Read it in a loop to read all entries. 

If you call ZipInputStream.Read() repeatedly until it returns 0 (implying no more data is available for the ZipEntry), the Read() method will automatically and transparently validate the CRC for the entry, and throw an exception if the CRC does not match the expected value.  If you do not call Read() until it returns 0, then the ZipInputStream.Read() method will not validate the CRC of the entry.

Therefore to validate the CRC of the entry, just call Read() into a small buffer (say 2k), repeatedly, until it returns 0. 

Does this answer the question?

ps: I am interested in tracking down any problem with corruption, but to do it, I guess you realize I need a reproducible test case.

last thing: if you want to add an entry to a zip, and the entry is a small string, you can call AddEntry() and pass a System.String.  For your "label" entry, you may find that option interesting.

 

Nov 23, 2010 at 4:22 PM
Well, I got better info from my end users and caught a corruption in the act. A backup utility that was set to backup everything put into the destination zip directory would sometimes step in and try to backup parts of the zip file as it was copying in. Without the password , it could backup the zip header and file list of the zip, but nothing else... presto! Corrupted backup, good source, good destination.

I ended up putting in a diff check from the source to both the destination and backup files and haven't seen it happen again.

Thanks for all the help; I will certainly look into some of the other capabilities!

On Tue, Oct 26, 2010 at 8:45 PM, Cheeso <notifications@codeplex.com> wrote:

From: Cheeso

I understand what you're doing, and it makes sense to me.

What you want to do, I think, is validate the outer zip after it's been written. I think you said you want to validate the contents of the inner zip, but without extracting those files to the filesystem.

To do this, open the outer zip for reading with ZipFile.Read(). There's just one ZipEntry in it, if I understand the situation correctly. Call ZipEntry.OpenReader() on that entry, which gives you a readable stream. You can then create a ZipInputStream around that read-only stream.

You also may be able to wrap the result of ZipEntry.OpenReader() in a BufferedStream, and call ZipFile.Read() on THAT.

You cannot succeed by calling ZipFile.Read() using the result of ZipEntry.OpenReader(), in order to read and validate the inner zip. This is because when the ZipFile class is used to read a zip file, it seeks around in the file. It seeks to the end, reads some things, then seeks backwards, and so on. The result of ZipEntry.OpenReader(), though, is a read-forward-only stream. It doesn't support seeks. Using a BufferedStream basically just caches all the inner zip data, and so once again Seek() will work. But be careful, this only works if the buffer size is as large as the entire inner zip file. Maybe impractical.

On the other hand, the ZipInputStream does support reading from a read-forward-only stream, so it can be used to validate the result of ZipEntry.OpenReader().

The interface you want to use on ZipInputStream is GetNextEntry(); call that and then call ZipInputStream.Read() to read the de-compressed content of the entry in the inner zip. Do GetNextEntry + Read it in a loop to read all entries.

If you call ZipInputStream.Read() repeatedly until it returns 0 (implying no more data is available for the ZipEntry), the Read() method will automatically and transparently validate the CRC for the entry, and throw an exception if the CRC does not match the expected value. If you do not call Read() until it returns 0, then the ZipInputStream.Read() method will not validate the CRC of the entry.

Therefore to validate the CRC of the entry, just call Read() into a small buffer (say 2k), repeatedly, until it returns 0.

Does this answer the question?

ps: I am interested in tracking down any problem with corruption, but to do it, I guess you realize I need a reproducible test case.

last thing: if you want to add an entry to a zip, and the entry is a small string, you can call AddEntry() and pass a System.String. For your "label" entry, you may find that option interesting.

Read the full discussion online.

To add a post to this discussion, reply to this email (DotNetZip@discussions.codeplex.com)

To start a new discussion for this project, email DotNetZip@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com


Coordinator
Nov 23, 2010 at 4:40 PM

Wow, I'm glad you figured this out.  I'll think about the possibility of how the library could be modified, to avoid this pitfall. 

 

Nov 23, 2010 at 9:22 PM
Oh, sorry. It wasn't a problem with DotNetZip after all! I'd only written to see if I could detect the bad CRC's somehow on my end... which you explained very well, and thanks again for that.

The issue stemmed from the third party backup (I can PM you the name). I'd have my backups create the zip file in a local backup directory (which always worked correctly), then copy the zip into a "Backup Transfer" folder, that the backup utility was watching. It would see the zip file hit the "Backup Transfer" folder and would (rarely) start the transfer with just the header info and the contained files. It seemed to treat the zip as a directory; since that is all that the internal Windows "Compressed Folders" could see without password support. I'd end up with a jumbled mess about 1% of the time, and have to put the local backup back into the "Backup Transfer" folder.

So, this actually wasn't a problem with DotNetZip at all... and you answered the questions I did have, which I appreciate. The only issue I have ever had with DotNetZip is that I am still puzzling over how to correctly restore the Date Modified stamps (LastWriteTime), when I create a backup that crosses a time zone. When I unzip it with DotNetZip, the time is off by the difference between the two time zones... but not when I use WinRAR. Should I make a new thread for questions about this for your tracking purposes? I haven't seen this specific thing being asked.
Coordinator
Nov 24, 2010 at 5:34 PM

ah, ok, I understand. 

I think DotNetZip could suffer a similar problem if you zip into a directory that you are currently zipping.  It would be nice to be able to avoid that pitfall but it may be that this is not important enough to enough people, for me to spend time on solving it.

Regarding the timestamp thing - yes, that is better covered in a new thread if you please.  It's not really for tracking - it's for search.  I want people to be able to search on relevant topics, and keeping threads narrowly constrained helps out there.

thanks for posting the resolution to your problem!