7

Closed

Compression roundtrip problem - Adler32 is incorrect

description

--- Continued from http://dotnetzip.codeplex.com/Thread/View.aspx?ThreadId=207718&ProjectName=DotNetZip ---
 
Hi again,
 
I've attached a 50k file which contains randomly-generated bytes that break roundtripping in the following code. This is the smallest file I've been able to produce so far that exhibits this problem. Let me know if you're unable to reproduce the problem.
 
Thanks,
 
Mike
 
 
 
class RoundtripTest
{
 
public static void RunTest()
{
 
    byte[] bdata;
 
    // initialise the test data from the sample file
    string fileName = "c:\\temp\\buffer.dat";
    using(System.IO.FileStream input = new System.IO.FileStream(fileName, System.IO.FileMode.Open))
    {
        int fileLength = (int)(new System.IO.FileInfo(fileName).Length);
        bdata = new byte[fileLength - 1];
        // make sure we read the whole file in one go
        if (!(input.Read(bdata, 0, bdata.Length) == bdata.Length))
        {
            throw new System.InvalidOperationException();
        };
    };

    // try to roundtrip the data. we'll actually get an exception from the call to UncompressBuffer
    byte[] c = Ionic.Zlib.ZlibStream.CompressBuffer(bdata);
    byte[] buncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(c);
    System.Console.WriteLine("are equal?: {0}", check(bdata, buncompressed));

}
 
static bool check(byte[] a, byte[] b)
{
    if (a.Length != b.Length) return false;
    for (int i = 0; i < a.Length; i++)
    {
        if (a[i] != b[i]) return false;
    }
    return true;
}
 
};

file attachments

Closed Jun 20, 2011 at 7:34 PM by Cheeso
fixed in changeset 79502. Thie first version of the library to contain this fix will be v1.9.1.6.

comments

Cheeso wrote Mar 31, 2010 at 6:24 PM

Mike, Thanks for the report. I reproduced the problem here. WIll let you know what I find.

Pointy wrote Mar 31, 2010 at 6:49 PM

Cool. If you need any more sample buffers to help diagnose this you can use the GenerateFailBuffer method in my original post - the parameters will build one which fails for UncompressBuffer or which fails when decompressing from a base stream (using the ZlibStream.Read / Write). Or just post here and I'll make some more for you.

M

Pointy wrote Apr 9, 2010 at 6:21 PM

Hi Cheeso,

I've found a slightly different issue which might be related to this one. If you run the code below with the attached buffer, the second call to StreamCompress gets truncated by 3 bytes. From what I can tell, the last call to ZlibBaseStream is appending the final adler32 checksum to the interal "pending" buffer but it's overrunning the final compressed output buffer and ignoring it.

Basically, the algorithm doesn't handle the case where the pending buffer only contains the trailing adler32 bytes - it skips these because it has already set "nomoreinput" in the last iteration.

You can check this by replacing
        if (nomoreinput && _wantCompress) return 0;  // workitem 8557
with
        if (nomoreinput && _wantCompress)
        {
            if (_z.dstate.pendingCount > 0) throw new System.InvalidOperationException();
            return 0;  // workitem 8557
        }

I think I've got a fix for this, but I want to test it a bit more before I post. It might also fix the original issue if we're lucky...


Mike
Public Sub TestCompression()

    Dim intChunkSize As Integer
    Dim intBytesRead As Integer
    Dim objBufferBytes() As Byte
    Dim objCompressed1() As Byte
    Dim objCompressed2() As Byte

    ' read the test buffer
    Using objInput As System.IO.FileStream = New System.IO.FileStream("c:\temp\buffer.dat", IO.FileMode.Open)
        ReDim objBufferBytes(Convert.ToInt32(objInput.Length - 1))
        intBytesRead = objInput.Read(objBufferBytes, 0, objBufferBytes.Length)
        If Not (intBytesRead = objBufferBytes.Length) Then
            Throw New System.InvalidOperationException
        End If
    End Using

    ' compress using a stream and a chunk size which is known to work, producing 49373 bytes. 
    intChunkSize = 9875
    objCompressed1 = StreamCompress(objBufferBytes, intChunkSize, Ionic.Zlib.CompressionLevel.BestCompression)

    ' reduce the chunk size and try again which gives no error, but only produces the first 49370 bytes.
    intChunkSize = 9874
    objCompressed2 = StreamCompress(objBufferBytes, intChunkSize, Ionic.Zlib.CompressionLevel.BestCompression)

    ' compare the results
    If Not CompareBytes(objCompressed1, objCompressed2) Then
        Throw New System.InvalidOperationException
    End If

End Sub


''' <summary>
''' Compress a byte array by writing it to a memory stream,
''' then reading it back through a compressing ZlibStream.
''' </summary>
''' <param name="buffer">The data to compress.</param>
''' <param name="chunkSize">The maximum number of bytes to read from the ZlibStream at a time.</param>
''' <param name="level">Compression level</param>
''' <returns></returns>
''' <remarks></remarks>
Private Function StreamCompress(ByVal buffer() As Byte, ByVal chunkSize As Integer, ByVal level As Ionic.Zlib.CompressionLevel) As Byte()
    Dim objCompressed As List(Of Byte)
    Dim objBytesRead() As Byte
    Dim intBytesRead As Integer
    Dim intTotalBytes As Integer
    Using objBufferStream As System.IO.MemoryStream = New System.IO.MemoryStream
        objBufferStream.Write(buffer, 0, buffer.Length)
        objBufferStream.Flush()
        objBufferStream.Seek(0, IO.SeekOrigin.Begin)
        Using objZlibStream As Ionic.Zlib.ZlibStream = New Ionic.Zlib.ZlibStream(objBufferStream, Ionic.Zlib.CompressionMode.Compress, level, True)
            objCompressed = New List(Of Byte)
            While True
                ReDim objBytesRead(chunkSize - 1)
                intBytesRead = objZlibStream.Read(objBytesRead, 0, objBytesRead.Length)
                intTotalBytes = intTotalBytes + intBytesRead
                If (intBytesRead = 0) Then
                    Exit While
                Else
                    ReDim Preserve objBytesRead(intBytesRead - 1)
                    objCompressed.AddRange(objBytesRead)
                End If
            End While
        End Using
    End Using
    Return objCompressed.ToArray
End Function

Private Function CompareBytes(ByVal data1() As Byte, ByVal data2() As Byte) As Boolean
    If Not (data1.Length = data2.Length) Then Return False
    For intIndex As Integer = 0 To (data1.Length - 1)
        If Not (data1(intIndex) = data2(intIndex)) Then
            Return False
        End If
    Next intIndex
    Return True
End Function

Pointy wrote Apr 11, 2010 at 10:44 PM

I've attached a diff which fixes the second part of this problem. If the input gets fully consumed before some of the trailing block bytes are generated (e.g. tables / trees / adler32) then ZlibBaseStream.Read returns 0 too early and only flushes the stuff it's already put into the pending buffer. It never generates the additional bytes on the next call, effectively truncating the output.

I'll also separately upload a diff with unit tests. There are 3 which are fixed by this path, and one which is still failing at the moment with the original problem I reported.

Feel free to chop it around as much as you need to.

Cheers,

M

Pointy wrote Apr 11, 2010 at 10:44 PM

Updated Unit tests.

Pointy wrote Apr 15, 2010 at 11:14 PM

First off, sorry these posts are a bit all over the place. I can try to sort it into some sense of order if it will help?

Secondly, I think I've fixed the other part of the problem. It looks like Zlib.Adler.Adler32 is giving wrong values in some cases. The Adler32 value given by DotNetZip for the attached buffer.dat is "0xEF 0x55 0x88 0xCE", but the implementation in the "official" zlibwapi.dll gives "0xF0 0x36 0x88 0xCE" for the same input.

It looks like it's a bitshifting / overflow problem - if you change int to uint in the first few lines of your Adler32 it fixes the problem and gives the same result as zlibwapi.dll for buffer.dat:

    private static readonly uint BASE = 65521;
    // NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1
    private static readonly int NMAX = 5552;

    static internal uint Adler32(uint adler, byte[] buf, int index, int len)
    {
        if (buf == null)
            return 1;

        uint s1 = (uint) (adler & 0xffff);

uint s2 = (uint) ((adler >> 16) & 0xffff);

I added the following to "Zlib Tests", but you need to expose Adler as public for it to work.
    [TestMethod]
    public void Zlib_Adler32()
    {
        byte[] buffer = GetRoundtripBuffer();
        uint adler = Zlib.Adler.Adler32(1, buffer, 0, buffer.Length);
        Assert.IsTrue(adler==0xf03688ce, "Adler checksum calculation error");
    }
One last thing I noticed was that I'd missed a change from the diff files to fix the other problem. Replace the following in ZlibBaeStream:

while (_z.AvailableBytesOut > 0 && !nomoreinput && rc == ZlibConstants.Z_OK);

with

while (_z.AvailableBytesOut > 0 && rc == ZlibConstants.Z_OK);

Cheers,

Mike

Cheeso wrote Apr 21, 2010 at 5:03 PM

Thanks for your continued work on this, pointy. I will follow up.

Pointy wrote Apr 26, 2010 at 12:40 AM

Unit testing the Adler class:
  • add the following to Zlib's AssemblyInfo.cs to expose "internal" classes and members to the unit test project.
[assembly: System.Runtime.CompilerServices.InternalsVisibleTo("ZlibTest")]
  • make the Adler class internal and the Adler32 method public (or internal).
  • add the following test class to the "Zlib Tests" project
AdlerTests.cs

using Microsoft.VisualStudio.TestTools.UnitTesting;
using System;
using System.Collections.Generic;
using System.Text;

namespace Ionic.Zlib.Tests
{
/// <summary>
/// Contains tests for the Zlib.Adler class
/// </summary>
[TestClass]
public class AdlerTests
{

    #region Fields

    private TestContext testContextInstance;

    #endregion

    #region Properties

    /// <summary>
    ///Gets or sets the test context which provides
    ///information about and functionality for the current test run.
    ///</summary>
    public TestContext TestContext
    {
        get
        {
            return testContextInstance;
        }
        set
        {
            testContextInstance = value;
        }
    }

    #endregion

    #region Test Methods

    /// <summary>
    /// Test adler calculation with a null buffer.
    /// </summary>
    [TestMethod]
    public void Zlib_Adler32_NullBufferTest()
    {
        uint adler = Zlib.Adler.Adler32(0, null, 0, 0);
        Assert.IsTrue(adler==0x00000001, "Adler checksum calculation error");
    }

    /// <summary>
    /// Test adler calculation with a known buffer.
    /// </summary>
    [TestMethod]
    public void Zlib_Adler32_BasicTest()
    {
        byte[] buffer = UnitTest1.GetRoundtripBuffer();
        uint adler = Zlib.Adler.Adler32(0, null, 0, 0);
        adler = Zlib.Adler.Adler32(adler, buffer, 0, buffer.Length);
        Assert.IsTrue(adler==0xf03688ce, "Adler checksum calculation error");
    }

    #endregion

}
}

hakonhc wrote May 7, 2010 at 2:01 PM

Hi. I have a possible related problem. The attached file is compressed. Doing this test to decompress leads to: Ionic.Zlib.ZlibException : Bad state (incorrect data check)
    [Test]
    public void ZlibTest() {
        const int WORKING_BUFFER_SIZE = 2048 * 20;
        string fileToDecompress = "268";
        using (System.IO.Stream input = System.IO.File.OpenRead(fileToDecompress))
        {
           using (var raw = System.IO.File.Create(fileToDecompress + ".zlib"))
           {
               using (Stream compressor = new ZlibStream(raw, CompressionMode.Decompress))
               {
                   byte[] buffer = new byte[WORKING_BUFFER_SIZE];
                   int n;
                   while ((n = input.Read(buffer, 0, buffer.Length)) != 0)
                   {
                       compressor.Write(buffer, 0, n);
                   }
               }
           }
       }
    }

hakonhc wrote May 7, 2010 at 2:22 PM

Forgot to mention in my last comment that its working if setting WORKING_BUFFER_SIZE = 2048, but not WORKING_BUFFER_SIZE = 2048 * 20 as in the example.

Pointy wrote May 7, 2010 at 3:43 PM

Hi hakonhc,

Could you attach the original uncompressed file for reference. Also, did you compress the file using DotNetZip? If so could you post the code you used to compress it. I'll check if this is the same issue.

Cheers,

Mike

hakonhc wrote May 10, 2010 at 7:12 AM

Hi Pointy,
The original file (attached) was compressed using the original, native zlib library. I come across this issue when replacing this zlib library and obviously needs to be compatible with the files compressed with the original. It first seems to work ok, but then I came across some files like this that did not work.

Pointy wrote May 10, 2010 at 3:24 PM

Hi hakonhc,

Thanks for the upload. I've reproduced the problem here - I think it's caused by the Adler problem I reported below. Basically DotNetZip is calculating a checksum incorrectly in some cases, and your "286" file happens to trigger that issue.

I tested your file again with the fix in my "Apr 15 at 11:14 PM" post below and this resolves the problem. If you don't mind making a private build of the source code for the project you can use this, or hang on for an official fix from Dino.

Would you mind your sample file was used in a unit test in the project?

Hope this helps,

Mike

hakonhc wrote May 11, 2010 at 12:14 PM

Thank you Mike. I will create my own build with your patch for know, while waiting for the official build.
It's OK if you want to use the files
Regards,

Håkon

hakonhc wrote May 11, 2010 at 1:13 PM

Hi Mike,
I applied the ZlibBaseStream.diff but still got the same error (when the zlib stream is closed) when using a buffer size of 2048*20. Could you post the build that works for you?

Håkon

Pointy wrote May 11, 2010 at 1:59 PM

Håkon,

There's an extra set of changes in Zlib.cs which I hadn't uploaded diffs for. The attached Zlib_2.diff and ZlibBaseStream_2.diff files contain the changes which should resolve the issue.

Thanks,

Mike

Pointy wrote May 11, 2010 at 1:59 PM

Attaching Zlib_2.diff

Pointy wrote Aug 11, 2010 at 6:50 PM

*** Update on the Adler32 problem ***

The following code gives different adler values for the input buffer depending on the chunk size (which I don't think is the correct behaviour) and means some zip files are possibly getting a non-standard crc appended to them. It's due to "s2 += s1" overflowing int.MaxValue and going to a negative value in Ionic.Zlib.Adler.Adler32 if the input length is large enough (>3979 in the example). This causing "s2 %= BASE" to give unexpected values and poisons the return value.

I think the fix is to either make s1 and s2 (and BASE) uint so that overflows are handled gracefully, or add "s1 %= BASE; s2 %= BASE;" just before the end of the unrolled loop to avoid overflows in large chunks.

static void Main(string[] args)
    {
        // pre-calculated adler32 using reference implementation in zlibwapi
        var goal = 4104380882;
        // 3979 doesn't cause an overflow of "s2" in adler32, but 3980 does
        Console.WriteLine(TestAdler32(3979));
        Console.WriteLine(TestAdler32(3980));
        Console.WriteLine(TestAdler32(3999));
    }

    private static uint TestAdler32(int chunk)
    {
        // create a buffer full of 0xff's
        var buffer = new byte[2048 * 4];
        for (var i = 0; i < buffer.Length; i++)
        {
            buffer[i] = 255;
        };
        // compare the adler values from the dotnetzip implementation
        var adler = (uint)0;
        var index = 0;
        adler = Ionic.Zlib.Adler.Adler32(0, null, 0, 0);
        while (index < buffer.Length)
        {
            var length = Math.Min(buffer.Length - index, chunk);
            adler = Ionic.Zlib.Adler.Adler32(adler, buffer, index, length);
            //if (adler != zlib[index + length]) throw new System.InvalidOperationException();
            index = index + chunk;
        }
        //if (adler != goal) throw new System.InvalidOperationException();
        return adler;
    }

Pointy wrote Oct 4, 2010 at 3:14 PM

Something to think about if / when fixing this is "what happens to any zips that have already been created which have a dodgy Adler32 value". At the moment they're not Winzip-compatible because the Adler32 is wrong, but DotNetZip can unpack them because it calculates the same incorrect value when it unzips it.

Fixing the Adler32 algorithm will then break unzipping these zip files in DotNepZip. You might need to calculate both the correct and incorrect values, or add a backward-compatibility flag somewhere so it uses the "broken" algorithm instead of the fixed one.

Just a thought...

M

StefanLeoni wrote Oct 12, 2010 at 9:01 AM

Please fix it. With this bug nobody can use zlib from your library together with other libraries managedzlib, sharpzip ... or in my case with a java server using java deflate.

Pointy wrote Oct 12, 2010 at 10:34 AM

Hi Stefan,

In the meantime you can work around the Adler problem by using a buffer < 3980 bytes. The other issue can be resolved by incorporating the diffs in this thread to make a private build until an official fix gets published.

Let me know if you need any help doing that.

Cheers,

M

m5168 wrote Feb 4, 2011 at 4:55 PM

Any idea when a fix will be available? Any feedback is appreciated.

spongman wrote Apr 6, 2011 at 7:24 AM

is there any intention to fix this bug?

DotNetZip is currently broken with respect to the RFC and other tools/libraries.

Pointy's suggestion of making s1,s2&BASE uints is a valid fix.

Cheeso wrote Jun 13, 2011 at 7:55 PM

Yes I intend to fix this bug. Problem was, I lost my access to a computer, didn't have VS2010. etc etc.
I now have a usable development workstation and I am back on it.

Cheeso wrote Jun 20, 2011 at 7:29 PM

fixed in changeset 79502. Thie first version of the library to contain this fix will be v1.9.1.6.



** Closed by Cheeso 6/20/2011 11:27 AM

Cheeso wrote Jun 20, 2011 at 7:29 PM

re-opening to change the workitem title to something more appropriate.