28
Vote

ZipFile.AddFile fails depending of the file size.

description

AddFile truncate the entry, and Extract trow an exception "bad read of entry test/MyFile.txt from compressed archive."
 
Debugging step by step sometimes work fine.
 
My code:
 
Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
    Dim str As New String(" "c, 2490368)
    IO.File.WriteAllText("C:\test\MyFile.txt", str)
End Sub
 
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Using zip As New Ionic.Zip.ZipFile
        zip.AddFile("C:\test\MyFile.txt")
        zip.Save("C:\test\MyZip.zip")
    End Using
End Sub
 
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
    Using zip As Ionic.Zip.ZipFile = Ionic.Zip.ZipFile.Read("C:\test\MyZip.zip")
        For Each arch As Ionic.Zip.ZipEntry In zip
            arch.Extract("C:\test\opened")
        Next
    End Using
End Sub

comments

Pointy wrote Aug 15, 2011 at 7:29 PM

Hi pablo,

I've reproduced the error on my machine - it looks like it's something to do with the parallel compression algorithm and your file size (which is a multiple of the default 128k buffer size for parallel compression: 131072 * 19 = 2490368). The zip file fails to extract in Windows Explorer, and WinZip reports a bad CRC ("Expected 216E2C97, actual 3832F8D9.").

However, if you disable parallel compression with "zip.ParallelDeflateThreshold = -1" in your Button1_Click it seems to avoid the issue and generates a valid zip file which Windows Explorer, WinZip and DotNetZip can all extract from. (Parallel compression also works fine for input data which is 1, 2, 3 or 4 multiples of 128k, but I think that's because it allocates them all to the same processor).

The code for the parallel deflate algorithm is waaaay more complex than I can cope with, so I can't really offer any fix better than the workaround above.

Hope this helps,

Mike

DexMiK wrote Aug 16, 2011 at 2:25 PM

DexMiK wrote Aug 16, 2011 at 2:26 PM

delete my last and this comment please, wrong work item :(

DexMiK wrote Aug 16, 2011 at 3:35 PM

ok I find somthing that might be releated to this, but i need further testing...

I zip mdb files (access) - alle closed for sure (we test that).

Now on some files (i cannot provide any yet, but I hope I can really soon) after zip.Saving them in the zip file (without errors),
i extract them, getting bad read of that file... (I assume its a crc check also that fails).

When i set zip.ParallelDeflateThreshold = -1 before zip.Save(...); it seems to work fine also.

Regards,
DexMiK

XrstalLens wrote Sep 16, 2011 at 8:04 PM

I'm getting the same problem (bad read of entry) on a file of size 5373952 bytes, which is exactly 41 times the parallel buffer read size. The zip creation succeeds, but fails to extract. I've had the same error on other files, but didn't know of this problem and didn't know to look at the exact file size.

In my case I'm going to try working around it by turning off compression for any file that's an exact multiple of 128K in my AddProgress event handler (I already turn off compression for other files like mp3s) to see if that works around the problem.

rhpainte wrote Jan 6, 2012 at 6:30 PM

The issue is reproducable anytime you are compressing data that fits into an integeral number of internal buffers inside ParallelDeflateOutputStream (The default size is set to 64KB). When it has finished loading your data into the buffers Close is called which will result in _Flush(true) being called. Since we have an integeral size _currentlyFilling is equal to -1 and we go directly to EmitPendingBuffers(true, false) and _FlushFinish().

Inside EmitPendingBuffers the outside do{} while (doAll && (_lastWritten != _latestCompressed)) can exit prematurely as it doesn't take into account number of buffers filled, as our threads could still be busy computing blocks 3-15 with _lastWritten and _latestCompressed both equalling 2.

Workround:
  1. The suggestion by Mike is great
  2. Modify the buffer size so that it is not nicely aligned with your file size
Fix:
Zlib\ParallelDeflateOutputStream.cs

line 987:
change
} while (doAll && (_lastWritten != _latestCompressed));
to
} while (doAll && (_lastWritten != _latestCompressed || _lastWritten != _lastFilled));

divo wrote Nov 16, 2012 at 11:00 AM

This is a short sample program exhibiting the defect:

using System;
using System.IO;
using System.Linq;
using System.Text;
using Ionic.Zip;

namespace Issue14087
{
class Program
{
    static void Main(string[] args)
    {
        // create a buffer (or file) that is 
        //  - larger than the default ParallelDeflateThreshold (= 512 kB)
        //  - has a size which is a multiple of 128 kB
        var buffer = new byte[917504];
        foreach (var i in buffer)
        {
            buffer[i] = (byte)'a';
        }

        using (var zippedStream = new MemoryStream())
        {
            using (var zip = new ZipFile(Encoding.UTF8))
            {
                // uncommenting the following line can be used as a work-around
                // zip.ParallelDeflateThreshold = -1;
                zip.AddEntry("entry.txt", buffer);
                zip.Save(zippedStream);
            }
            zippedStream.Position = 0;

            using (var zip = ZipFile.Read(zippedStream))
            {
                using (var ms = new MemoryStream())
                {
                    // This line throws a BadReadException
                    zip.Entries.First().Extract(ms);
                }
            }
        }
    }
}
}

divo wrote Nov 16, 2012 at 12:32 PM

The issue's impact is classified as "Low". I'd consider this as highly critical because the generated zip file basically is corrupt and data might be lost.

dagc wrote Jan 22, 2013 at 10:03 AM

I understand it can be hard to debug this bug, but it's very strange to flag this bug as a "low impact" bug since a zip library's objective is to create readable zip files. I experienced file corruption for a file I attempted to compress (it's size was 3801088 bytes = 29*131072), and setting ParallelDeflateThreshold to -1 fixed the issue. In addition, bugs like these are very hard to discover during testing.

mohammadforutan wrote Feb 27, 2013 at 12:08 PM

"zip.ParallelDeflateThreshold = -1" resolve my issue and save my day, thank you Mike

iosifpetre wrote Apr 11, 2013 at 4:31 PM

mohammadforutan, is nice that "zip.ParallelDeflateThreshold = -1" resolved your issue but how do I extract the files, that is when you create the file, but how to fix to get the file, I need the data from the files archived..
please help

divo wrote Apr 11, 2013 at 6:05 PM

@iosifpetre: There is no way to recover the zip archive once it is corrupted. That is why this issue is so critical. I'd recommend using another zip library like SharpZipLib or the functionality included in the current .NET Framework version.

bob0043 wrote May 3, 2013 at 6:27 PM

I concur with rhpainte's as to the location of the problem but differ slightly as to the analysis and fix.

The number of buffers used is partially dependent on the number of processors. Each set of buffers is handled by a separate thread. The variables _latestCompressed, _lastFilled, and _lastFilled keep track of, respectively, the last buffer that has been compressed, the last buffer that has been filled with input awaiting compression, and the last buffer written to the output.

The code is such that _latestCompressed <= _lastFilled is always true. The devil is in that "<" part of the expression. The EmitPendingBuffers functions, as written, is exiting when _lastWritten == _latestCompressed but that may not be the last buffer that needs to be written -- the last to be written should be _lastFilled. There is a race condition here: depending on input file size, buffer count, thread count, processor workload, and the phase of the moon, the _latestCompressed may or may not be equal to _lastFilled when EmitPendingBuffers checks it.

So the fix should be:
change
} while (doAll && (_lastWritten != _latestCompressed));
to
} while (doAll && (_lastWritten != _lastFilled));

This is in function EmitPendingBuffers. In the copy of the source I have that is line 971 of ParallelDeflateOutputStream.cs. Rhpainte notes it as line 987 of the same file.

Using "} while (doAll && (_lastWritten != _latestCompressed || _lastWritten != _lastFilled));", as rhpainte suggested, has a an "extra" check: if (_lastWritten == _lastFilled) is true then (_lastWritten == _latestCompressed) is also true (only compressed buffers are written) and there is no need to check it.

mlavoie88 wrote Jun 7, 2013 at 4:01 PM

I also just encountered this issue. Easy to work around with the ParallelDeflate setting, but difficult to isolate the failure since it occurs only on files of a specific size. I concur with earlier comments that the impact of this bug should be higher.

nmg196 wrote Sep 25, 2013 at 2:11 PM

Why is the Impact set to "low"? For me, and many others above, the result of this bug was a corrupt zip file, meaning the system using this library completely failed. Anything resulting in a corrupt zip file should surely result in a "High" impact level?
Thanks

johnbuuck wrote Nov 26, 2013 at 4:46 PM

Yes, this bug fits very well the criteria for elevated priority.

(1) It will eventually affect most anyone that uses the product over a long period of time (since many binary file formats have sizes that are multiples of 64k)
(2) It doesn't manifest in a straight-forward error that can be recovered from programmatically or manually
(3) It's impact is the irretrievable loss of data
(4) The problem is only discovered (possibly much) later when attempting to unzip the file

It would be much better to simply issue an error whenever "Parallel Deflate" is active and the file size is a multiple of the internal buffer size. That way the user is not fooled into thinking they have a retrievable archive of the file. The error message could explain the problem and suggest a work-around. Instead, more and more corrupt archives get created with no notification until eventually someone attempts to retrieve the file.

mrutter wrote Apr 10 at 11:35 AM

Hello everybody,
I agree with many others above: the impact of this bug should be elevated.

Today, I encountered this bug after hundreds of days where my application worked fine. I had, for the first time, a source file of 28,639,232 (which is 437 times 64K)!

It looks like this project is not active anymore ... isn't it?

Thank you very much to the guy who posted a valid workaround (ParallelDeflateThreshold = -1 worked for me too).

Jcwrequests wrote Apr 28 at 11:15 PM

I have also experienced the same issue which I commented on here https://dotnetzip.codeplex.com/workitem/16815. Please make this a high priority item.

Thanks,

Jason Wyglendowski