Compression roundtrip problem

Mar 30, 2010 at 2:13 AM

Hi,

I was having some problems the ZlibStream class where it was raising an exception while decompressing a stream and after looking into it, it looks like there are some cases where it can't decompress a byte array even when the data was previously compresssed using ZlibStream. I don't know whether I'm just doing something wrong, but I've managed to distill the problem out to the code below which creates a random buffer that throws an exception when it compress it and subsequently tries to decompress it. Strangely, some buffers only fail when using UncompressBuffer and some only fail when using ZlibStream.Write with CompressionMode = Decompress.

If you run the TestRoundtrips code below you'll get a "Bad state (incorrect data check)" exception in the first call to my custom DecompressViaStream function, but it might just be me misunderstanding what ZlibStream is supposed to do...

Mike

 

Module ZlibStreamTest

#Region " Test Methods "

    Public Sub TestRoundtrips()
        Dim objOriginal() As Byte
        Dim objCompressed() As Byte
        Dim objDecompressed() As Byte
        ' get an invalid buffer. (size and chars parameters found empirically for a
        ' decent success / fail rate on random buffer generation).
        objOriginal = ZlibStreamTest.GenerateFailBuffer(100000, 232, False, True)
        ' compress by stream, decompress by stream
        objCompressed = ZlibStreamTest.CompressViaStream(objOriginal)
        objDecompressed = ZlibStreamTest.DecompressViaStream(objCompressed)
        If Not ZlibStreamTest.CompareArrays(objDecompressed, objOriginal) Then
            Throw New System.InvalidOperationException
        End If
        ' compress by stream, decompress by array
        objCompressed = ZlibStreamTest.CompressViaStream(objOriginal)
        objDecompressed = ZlibStreamTest.DecompressViaArray(objCompressed)
        If Not ZlibStreamTest.CompareArrays(objDecompressed, objOriginal) Then
            Throw New System.InvalidOperationException
        End If
        ' compress by array, decompress by stream
        objCompressed = ZlibStreamTest.CompressViaArray(objOriginal)
        objDecompressed = ZlibStreamTest.DecompressViaStream(objCompressed)
        If Not ZlibStreamTest.CompareArrays(objDecompressed, objOriginal) Then
            Throw New System.InvalidOperationException
        End If
        ' compress by array, decompress by array
        objCompressed = ZlibStreamTest.CompressViaArray(objOriginal)
        objDecompressed = ZlibStreamTest.DecompressViaArray(objCompressed)
        If Not ZlibStreamTest.CompareArrays(objDecompressed, objOriginal) Then
            Throw New System.InvalidOperationException
        End If
    End Sub

#End Region

#Region " Buffer Generation Methods "

    Public Function GenerateRandomBuffer(ByVal size As Integer, ByVal chars As Integer) As Byte()
        Dim objRng As Random
        Dim objBuffer() As Byte
        Dim strValidChars As String
        ' initialise some working variables
        objRng = New Random
        ReDim objBuffer(size - 1)
        ' buld the list of characters we'll allow in the random buffer
        strValidChars = String.Empty
        For intIndex As Integer = 0 To 255
            If (intIndex < chars) Then
                strValidChars = strValidChars & ChrW(intIndex)
            End If
        Next intIndex
        ' build a random buffer
        For intByteIndex As Integer = 0 To (objBuffer.Length - 1)
            objBuffer(intByteIndex) = Convert.ToByte(objRng.Next(0, strValidChars.Length - 1))
        Next intByteIndex
        ' return the result
        Return objBuffer
    End Function

    ''' <summary>
    ''' Generates random buffers until one fails zlib decompression via a base stream.
    ''' </summary>
    ''' <param name="size">
    ''' Number of bytes in the returned buffer.
    ''' </param>
    ''' <param name="chars">
    ''' Number of unique characters to use in the returned buffer. A higher value means 
    ''' faster matching of a buffer which has the required "fail" properies.
    ''' </param>
    ''' <param name="streamFail">True if the buffer should fail when using a stream to decompress.</param>
    ''' <param name="arrayFail">True if the buffer should fail when using and array to decompress.</param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Function GenerateFailBuffer(ByVal size As Integer, ByVal chars As Integer, ByVal streamFail As Boolean, ByVal arrayFail As Boolean) As Byte()
        Dim intLoopCount As Integer
        Dim objRawBytes() As Byte
        Dim objCompressed() As Byte
        Dim objDecompressed() As Byte
        Dim blnSkipBuffer As Boolean
        ' keep building buffers until we find one we want to keep
        While True
            Debug.WriteLine("running iteration " & intLoopCount)
            intLoopCount = intLoopCount + 1
            ' build a random buffer
            objRawBytes = ZlibStreamTest.GenerateRandomBuffer(size, chars)
            objCompressed = ZlibStreamTest.CompressViaStream(objRawBytes)
            blnSkipBuffer = False
            ' read the whole compressed file into a byte array, then decompress it
            Try
                objDecompressed = ZlibStreamTest.DecompressViaStream(objCompressed)
                If Not ZlibStreamTest.CompareArrays(objRawBytes, objDecompressed) Then
                    Throw New System.InvalidOperationException
                End If
                If streamFail Then blnSkipBuffer = True
            Catch ex As Ionic.Zlib.ZlibException
                ' ignore this error - we're looking for a file which we can read in
                ' wholesale, but that fails streaming
                Debug.WriteLine("failed reading whole file")
                If Not streamFail Then blnSkipBuffer = True
            End Try
            ' read the compressed file on the fly
            If Not blnSkipBuffer Then
                Try
                    objDecompressed = ZlibStreamTest.DecompressViaArray(objCompressed)
                    If Not ZlibStreamTest.CompareArrays(objRawBytes, objDecompressed) Then
                        Throw New System.InvalidOperationException
                    End If
                    If arrayFail Then blnSkipBuffer = True
                Catch ex As Ionic.Zlib.ZlibException
                    Debug.WriteLine("failed reading captive stream")
                    If Not arrayFail Then blnSkipBuffer = True
                End Try
            End If
            If Not blnSkipBuffer Then
                Return objRawBytes
            End If
        End While
        Throw New System.InvalidOperationException
    End Function

#End Region

#Region " Compression Methods "

    Public Function CompressViaArray(ByVal data() As Byte) As Byte()
        Dim objFileBytes() As Byte
        ' decompress the file bytes
        objFileBytes = Ionic.Zlib.ZlibStream.CompressBuffer(data)
        ' return the result
        Return objFileBytes
    End Function

    Public Function CompressViaStream(ByVal data() As Byte) As Byte()
        Dim objFileBytes() As Byte
        Dim intBytesRead As Integer
        ' compress the raw data via a memory stream
        Using objCompressed As System.IO.MemoryStream = New System.IO.MemoryStream
            ' write the compressed data to the stream
            Using objZlibOutput As Ionic.Zlib.ZlibStream = New Ionic.Zlib.ZlibStream(objCompressed, Ionic.Zlib.CompressionMode.Compress, True)
                objZlibOutput.Write(data, 0, data.Length)
            End Using
            ' reset the stream
            objCompressed.Flush()
            objCompressed.Seek(0, IO.SeekOrigin.Begin)
            ' read the compressed data
            ReDim objFileBytes(Convert.ToInt32(objCompressed.Length) - 1)
            intBytesRead = objCompressed.Read(objFileBytes, 0, objFileBytes.Length)
            ' check we read the full length of the data
            If Not (intBytesRead = objCompressed.Length) Then
                Throw New System.InvalidOperationException
            End If
        End Using
        ' return the result
        Return objFileBytes
    End Function

    Public Function DecompressViaArray(ByVal data() As Byte) As Byte()
        Dim objFileBytes() As Byte
        ' decompress the file bytes
        objFileBytes = Ionic.Zlib.ZlibStream.UncompressBuffer(data)
        ' return the result
        Return objFileBytes
    End Function

    Public Function DecompressViaStream(ByVal data() As Byte) As Byte()
        Dim objFileBytes() As Byte
        Dim intBytesRead As Integer
        ' decompress the data via a memory stream
        Using objCompressed As System.IO.MemoryStream = New System.IO.MemoryStream
            ' write the compressed data to the stream
            Using objZlibOutput As Ionic.Zlib.ZlibStream = New Ionic.Zlib.ZlibStream(objCompressed, Ionic.Zlib.CompressionMode.Decompress, True)
                objZlibOutput.Write(data, 0, data.Length)
            End Using
            ' reset the stream
            objCompressed.Flush()
            objCompressed.Seek(0, IO.SeekOrigin.Begin)
            ' read the decompressed data
            ReDim objFileBytes(Convert.ToInt32(objCompressed.Length) - 1)
            intBytesRead = objCompressed.Read(objFileBytes, 0, objFileBytes.Length)
            ' check we read the full length of the data
            If Not (intBytesRead = objCompressed.Length) Then
                Throw New System.InvalidOperationException
            End If
        End Using
        ' return the result
        Return objFileBytes
    End Function

#End Region

#Region " Comparison Methods "

    Public Function CompareArrays(ByVal obj1() As Byte, ByVal obj2() As Byte) As Boolean
        If Not (obj1.Length = obj2.Length) Then Return False
        For intIndex As Integer = 0 To (obj1.Length - 1)
            If Not (obj1(intIndex) = obj2(intIndex)) Then Return False
        Next intIndex
        Return True
    End Function

#End Region

End Module
Coordinator
Mar 30, 2010 at 5:25 PM

I understand you believe that ZlibStream is not decompressing, what it has already compressed.

Show me a simpler example, something with fewer lines of code.  You said you distilled it, but there's an awful lot of code there.

For "round trip" problems with string data it should be as simple as this:

String data = @"The good will is not good because of what it effects 
or accomplishes or because of its competence to achieve some intended 
end: it is good only because of its willing (i.e. it is good in itself).

Even if it should happen that, by a particularly unfortunate fate or by
the niggardly provision of a step-motherly nature, this will should be
wholly lacking in power to accomplish its purpose, and if even the
greatest effort should not avail it to achieve anything of its end, and
if there remained only the good will not as a mere wish, but as the
summoning of all the means in our power - it will sparkle like a jewel
all by itself, as something that had its full worth in itself.";

System.Console.WriteLine("original:");
System.Console.WriteLine(data);
var c = ZlibStream.CompressString(data);
string uncompressed = ZlibStream.UncompressString(c);
System.Console.WriteLine("uncompressed:");
System.Console.WriteLine(uncompressed);
System.Console.WriteLine("are equal?: {0}", data.Equals( uncompressed));

Right? and for byte array data, like this:

byte[]  bdata = {
0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
};

Func<byte[],byte[],bool> check = (a,b) => {
    if (a.Length != b.Length) return false;
    for (int i=0; i < a.Length;  i++)
    {
        if (a[i]!=b[i]) return false;
    }
    return true;};

byte[] c = ZlibStream.CompressBuffer(bdata);
byte[] buncompressed = ZlibStream.UncompressBuffer(c);
System.Console.WriteLine("are equal?: {0}", check(bdata, buncompressed));

I just ran both of those snippets and the roundtrip checks out. So I'm not clear on the problem.

You said something about random buffers and streams, but that seems like a different thing. Make it really clear and simple for me and I will look into it.  Minimal code to illustrate the problem.

Mar 31, 2010 at 3:49 PM

Sorry about the length - I was trying to be as complete as possible. Basically, some data doesn't roundtrip properly if you compress then decompress using ZlibStream - you get an exception when you try to decompress.

The smallest buffer I could find that shows this problem is about 50k and ran to about 3000+ lines when I formatted it like your second example. I've created a post in the Issue Tracker instead so I can attach the buffer as a file - see http://dotnetzip.codeplex.com/WorkItem/View.aspx?WorkItemId=10562.

M

Coordinator
Mar 31, 2010 at 5:19 PM

Great, I'll have a look.

 

Coordinator
Mar 31, 2010 at 6:24 PM

Mike, Thanks for the report. I reproduced the problem here.  WIll let you know what I find.