Checksum change on same source

Nov 12, 2009 at 8:56 PM
Edited Nov 12, 2009 at 8:57 PM

Hello,

I am just wondering why when I create a zip file with the same set of source files and the checksum for the result zip file change each time. Nothing has changed inside the set of source.

Here are the step I use to create the zip in VB.nET

 

Dim pzip as ZipFile

pzip.AddDirectory(sourcefolder, zipdirectory)

pzip.save(targetfile)

 

Once the process completed, I ran the MD5 checksum or even CRC32 process and both of these checksum generator generate different checksum each time new zip file create. I tried version 1.8 and beta 1.9 and still get the same result.

 

Thank you.

 

Hugh

Coordinator
Nov 13, 2009 at 2:32 AM

Hi hugh,

I agree - the checksum should be the same.   I just tried it here, and i get the same hash for the generated zip.

That code doesn't run.  What's the real code? 

If a timestamp changes on the file, the hash inside the zip will change.

 

 

 

Nov 13, 2009 at 1:22 PM
Edited Nov 13, 2009 at 2:03 PM

Hi Cheeso,

Thank you for taking a look at it. Below are the code that I used to zip, generate the checksum. Also included is the CRC32 and Checksum function. I normally use MD5 checksum.

    Public Function GetFileChecksum(ByVal tfile As String, Optional ByVal tmode As String = "MD5") As String
        Try
            If File.Exists(tfile) Then
                Dim fStream As Stream = File.OpenRead(tfile)
                Dim checksum() As Byte = {0}

                Select Case tmode.ToLower
                    Case "sha1"
                        Dim mysha1 As SHA1 = New SHA1Managed()
                        checksum = mysha1.ComputeHash(fStream)
                    Case "sha256"
                        Dim mysha256 As SHA256 = New SHA256Managed()
                        checksum = mysha256.ComputeHash(fStream)
                    Case "sha384"
                        Dim mysha384 As SHA384 = New SHA384Managed()
                        checksum = mysha384.ComputeHash(fStream)
                    Case "sha512"
                        Dim mysha512 As SHA512 = New SHA512Managed()
                        checksum = mysha512.ComputeHash(fStream)
                    Case Else
                        Dim mymd5 As MD5 = MD5.Create()
                        checksum = mymd5.ComputeHash(fStream)
                End Select

                fStream.Close()

                Dim buff As StringBuilder = New StringBuilder
                Dim hashByte As Byte
                For Each hashByte In checksum
                    buff.Append(String.Format("{0:X2}", hashByte))
                Next

                'Return Replace(BitConverter.ToString(checksum), "-", "")
                Return buff.ToString
            End If

        Catch ex As Exception

        End Try
        Return 0
    End Function

    Public Function GetCRC32(ByVal sFileName As String) As String
        Try
            Dim FS As FileStream = New FileStream(sFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192)
            Dim CRC32Result As Integer = &HFFFFFFFF
            Dim Buffer(4096) As Byte
            Dim ReadSize As Integer = 4096
            Dim Count As Integer = FS.Read(Buffer, 0, ReadSize)
            Dim CRC32Table(256) As Integer
            Dim DWPolynomial As Integer = &HEDB88320
            Dim DWCRC As Integer
            Dim i As Integer, j As Integer, n As Integer

            'Create CRC32 Table
            For i = 0 To 255
                DWCRC = i
                For j = 8 To 1 Step -1
                    If (DWCRC And 1) Then
                        DWCRC = ((DWCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
                        DWCRC = DWCRC Xor DWPolynomial
                    Else
                        DWCRC = ((DWCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
                    End If
                Next j
                CRC32Table(i) = DWCRC
            Next i

            'Calcualting CRC32 Hash
            Do While (Count > 0)
                For i = 0 To Count - 1
                    n = (CRC32Result And &HFF) Xor Buffer(i)
                    CRC32Result = ((CRC32Result And &HFFFFFF00) \ &H100) And &HFFFFFF
                    CRC32Result = CRC32Result Xor CRC32Table(n)
                Next i
                Count = FS.Read(Buffer, 0, ReadSize)
            Loop

            FS.Close()
            Return Hex(Not (CRC32Result))
        Catch ex As Exception
            MsgBox("CRC32 calculate error: " & ex.StackTrace)
        End Try
        Return ""
    End Function

    Public Function getDirectoryName(ByVal tpath As String) As String
        Dim tdir As String = ""
        Try
            Dim tinfo As New DirectoryInfo(tpath)
            tdir = tinfo.Name
        Catch ex As Exception

        End Try

        Return tdir
    End Function

    Public Function ZipFolder(ByRef pzip As ZipFile, ByVal targetzip As String, ByVal sourcefolder As String _
                          , Optional ByVal trecursive As Boolean = True _
                          , Optional ByVal tfilter As String = "", Optional ByRef tbar As ToolStripProgressBar = Nothing) As Boolean
        Try
            ' Delete existing zip file
            If File.Exists(targetzip) Then
                File.Delete(targetzip)
            End If

            'pzip = New ZipFile(targetzip)
            'Dim zipentry As ZipEntry


            'zipentry = pzip.AddDirectory(sourcefolder, getDirectoryName(sourcefolder))
            'pzip.AddSelectedFiles("name != *~* And name != *$* And name != *.bak And name != *.BAK", sourcefolder, getDirectoryName(sourcefolder), True)
            pzip.AddSelectedFiles(tfilter, sourcefolder, getDirectoryName(sourcefolder), True)

            If Not tbar Is Nothing Then
                tbar.Minimum = 0
                tbar.Value = 0
            End If

            'pzip.Save()

            pzip.Save(targetzip)

            If File.Exists(targetzip) Then
                Return True
            Else
                Return False
            End If
        Catch ex As Exception
            MsgBox("Message" & ex.Message & vbCr & "Stack Trace:" & vbCr & ex.StackTrace, MsgBoxStyle.Critical, "Zip Process Error")
        End Try

        Return False
    End Function

Coordinator
Nov 13, 2009 at 2:34 PM

Hello Hugh,

Thanks for the code - Just ran it, totally unchanged, and got the same results I saw earlier:  the Md5 checksum for the generated zip file is always the same, for a set of files that has not changed.

If I "touch" a file in the folder (update its timestamp), then the checksum of the resulting zip file is different, as expected. 

Here's my code, in its entirety:

' HughChksum.vb
'
' Fri, 13 Nov 2009  10:26
'
' compile with
'
'       c:\.net3.5\vbc.exe /t:exe /debug:full /R:Ionic.Zip.dll /out:HughChksum.exe HughChksum.vb
'
' ------------------------------------------------------------------



Imports System
Imports System.IO
Imports System.Text
Imports System.Windows.Forms
Imports System.Security.Cryptography
Imports Ionic.Zip

Namespace Ionic.Tests.Zip

Public Class HughChksum

    ' Methods
    Public Function GetFileChecksum(ByVal tfile As String, Optional ByVal tmode As String = "MD5") As String
        Try
            If File.Exists(tfile) Then
                Dim fStream As Stream = File.OpenRead(tfile)
                Dim checksum() As Byte = {0}

                Select Case tmode.ToLower
                    Case "sha1"
                        Dim mysha1 As SHA1 = New SHA1Managed()
                        checksum = mysha1.ComputeHash(fStream)
                    Case "sha256"
                        Dim mysha256 As SHA256 = New SHA256Managed()
                        checksum = mysha256.ComputeHash(fStream)
                    Case "sha384"
                        Dim mysha384 As SHA384 = New SHA384Managed()
                        checksum = mysha384.ComputeHash(fStream)
                    Case "sha512"
                        Dim mysha512 As SHA512 = New SHA512Managed()
                        checksum = mysha512.ComputeHash(fStream)
                    Case Else
                        Dim mymd5 As MD5 = MD5.Create()
                        checksum = mymd5.ComputeHash(fStream)
                End Select

                fStream.Close()

                Dim buff As StringBuilder = New StringBuilder
                Dim hashByte As Byte
                For Each hashByte In checksum
                    buff.Append(String.Format("{0:X2}", hashByte))
                Next

                'Return Replace(BitConverter.ToString(checksum), "-", "")
                Return buff.ToString
            End If

        Catch ex As Exception

        End Try
        Return 0
    End Function

    Public Function GetCRC32(ByVal sFileName As String) As String
        Try
            Dim FS As FileStream = New FileStream(sFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192)
            Dim CRC32Result As Integer = &HFFFFFFFF
            Dim Buffer(4096) As Byte
            Dim ReadSize As Integer = 4096
            Dim Count As Integer = FS.Read(Buffer, 0, ReadSize)
            Dim CRC32Table(256) As Integer
            Dim DWPolynomial As Integer = &HEDB88320
            Dim DWCRC As Integer
            Dim i As Integer, j As Integer, n As Integer

            'Create CRC32 Table
            For i = 0 To 255
                DWCRC = i
                For j = 8 To 1 Step -1
                    If (DWCRC And 1) Then
                        DWCRC = ((DWCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
                        DWCRC = DWCRC Xor DWPolynomial
                    Else
                        DWCRC = ((DWCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
                    End If
                Next j
                CRC32Table(i) = DWCRC
            Next i

            'Calcualting CRC32 Hash
            Do While (Count > 0)
                For i = 0 To Count - 1
                    n = (CRC32Result And &HFF) Xor Buffer(i)
                    CRC32Result = ((CRC32Result And &HFFFFFF00) \ &H100) And &HFFFFFF
                    CRC32Result = CRC32Result Xor CRC32Table(n)
                Next i
                Count = FS.Read(Buffer, 0, ReadSize)
            Loop

            FS.Close()
            Return Hex(Not (CRC32Result))
        Catch ex As Exception
            MsgBox("CRC32 calculate error: " & ex.StackTrace)
        End Try
        Return ""
    End Function


    
    Public Function getDirectoryName(ByVal tpath As String) As String
        Dim tdir As String = ""
        Try
            Dim tinfo As New DirectoryInfo(tpath)
            tdir = tinfo.Name
        Catch ex As Exception

        End Try

        Return tdir
    End Function


    
    Public Function ZipFolder(ByRef pzip As ZipFile, ByVal targetzip As String, ByVal sourcefolder As String _
                              , Optional ByVal trecursive As Boolean = True _
                              , Optional ByVal tfilter As String = "*.*", Optional ByRef tbar As ToolStripProgressBar = Nothing) As Boolean
        Try
            ' Delete existing zip file
            If File.Exists(targetzip) Then
                File.Delete(targetzip)
            End If

            'pzip = New ZipFile(targetzip)
            'Dim zipentry As ZipEntry

            'zipentry = pzip.AddDirectory(sourcefolder, getDirectoryName(sourcefolder))
            'pzip.AddSelectedFiles("name != *~* And name != *$* And name != *.bak And name != *.BAK", sourcefolder, getDirectoryName(sourcefolder), True)
            pzip.AddSelectedFiles(tfilter, sourcefolder, getDirectoryName(sourcefolder), True)

            If Not tbar Is Nothing Then
                tbar.Minimum = 0
                tbar.Value = 0
            End If

            'pzip.Save()

            pzip.Save(targetzip)

            If File.Exists(targetzip) Then
                Return True
            Else
                Return False
            End If
        Catch ex As Exception
            MsgBox("Message" & ex.Message & vbCr & "Stack Trace:" & vbCr & ex.StackTrace, MsgBoxStyle.Critical, "Zip Process Error")
        End Try

        Return False
    End Function


    Public Sub Run()
        Dim name As String = "HughChksum.zip"
        
        Using zip As ZipFile  = New ZipFile
            zip.StatusMessageTextWriter = System.Console.Out
            ZipFolder(zip, name, folderToZip)
        End Using 

        System.Console.WriteLine("{0}", GetFileChecksum(Name))
        
    End Sub

    
    Public Shared Sub Main(ByVal args As String())
    Try 
        Dim X as New HughChksum(args)
        X.Run
    Catch exc1 As Exception
        Console.WriteLine("Exception: {0}", exc1.ToString)
    End Try
    End Sub


    Public Sub New(ByVal args As String())
        If args.Length <> 1 Then
            Usage
        End If
        
        If args(0)="-?" Then
            Usage
        End If

        folderToZip = args(0)
    End Sub

    Private folderToZip As String

    Private Shared Sub Usage()
        Console.WriteLine( _
        "HughChksum: zips a folder and computes an MD5 checksum on the resulting zip. " & vbcrlf & _
        "usage:" & ChrW(10) & "  HughChksum <folderToZip>" & vbcrlf _
              )
        Environment.Exit(1)
    End Sub

End Class


End Namespace

 

Nov 13, 2009 at 5:02 PM

Thank you very much for your help. This is just weird, I ran your test case on my set and the checksum are different each time. I don't see anything I can attach the file of my zip source but basically, I have 10 identical files 5 in the root folder and the other in the  sub folder as shown.

 

 

C:\DOTNET\ZipConsole\ZipConsole\bin\Release>ConsoleApplication1 C:\ICT\Programs\
Ziptest
adding selection '*.*' from dir 'C:\ICT\Programs\Ziptest'...
found 10 files...
adding C:\ICT\Programs\Ziptest\TR8001.INI...
adding C:\ICT\Programs\Ziptest\TR8002.INI...
adding C:\ICT\Programs\Ziptest\TR8003.INI...
adding C:\ICT\Programs\Ziptest\TR8004.INI...
adding C:\ICT\Programs\Ziptest\TR8005.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800A.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800B.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800C.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800D.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800F.INI...
saving....
1FAA0D1B7B83A3D77EA0A75ECD1EC137
C:\DOTNET\ZipConsole\ZipConsole\bin\Release>ConsoleApplication1 C:\ICT\Programs\
Ziptest
adding selection '*.*' from dir 'C:\ICT\Programs\Ziptest'...
found 10 files...
adding C:\ICT\Programs\Ziptest\TR8001.INI...
adding C:\ICT\Programs\Ziptest\TR8002.INI...
adding C:\ICT\Programs\Ziptest\TR8003.INI...
adding C:\ICT\Programs\Ziptest\TR8004.INI...
adding C:\ICT\Programs\Ziptest\TR8005.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800A.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800B.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800C.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800D.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800F.INI...
saving....
8002C651428848F629D5C4CA491A5EA8
C:\DOTNET\ZipConsole\ZipConsole\bin\Release>ConsoleApplication1 C:\ICT\Programs\
Ziptest
adding selection '*.*' from dir 'C:\ICT\Programs\Ziptest'...
found 10 files...
adding C:\ICT\Programs\Ziptest\TR8001.INI...
adding C:\ICT\Programs\Ziptest\TR8002.INI...
adding C:\ICT\Programs\Ziptest\TR8003.INI...
adding C:\ICT\Programs\Ziptest\TR8004.INI...
adding C:\ICT\Programs\Ziptest\TR8005.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800A.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800B.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800C.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800D.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800F.INI...
saving....
DB2939073424B932AB2AF6B57B180304

C:\DOTNET\ZipConsole\ZipConsole\bin\Release>ConsoleApplication1 C:\ICT\Programs\

Ziptest

adding selection '*.*' from dir 'C:\ICT\Programs\Ziptest'...

found 10 files...

adding C:\ICT\Programs\Ziptest\TR8001.INI...

adding C:\ICT\Programs\Ziptest\TR8002.INI...

adding C:\ICT\Programs\Ziptest\TR8003.INI...

adding C:\ICT\Programs\Ziptest\TR8004.INI...

adding C:\ICT\Programs\Ziptest\TR8005.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800A.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800B.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800C.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800D.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800F.INI...

saving....

1FAA0D1B7B83A3D77EA0A75ECD1EC137

 

C:\DOTNET\ZipConsole\ZipConsole\bin\Release>ConsoleApplication1 C:\ICT\Programs\

Ziptest

adding selection '*.*' from dir 'C:\ICT\Programs\Ziptest'...

found 10 files...

adding C:\ICT\Programs\Ziptest\TR8001.INI...

adding C:\ICT\Programs\Ziptest\TR8002.INI...

adding C:\ICT\Programs\Ziptest\TR8003.INI...

adding C:\ICT\Programs\Ziptest\TR8004.INI...

adding C:\ICT\Programs\Ziptest\TR8005.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800A.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800B.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800C.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800D.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800F.INI...

saving....

8002C651428848F629D5C4CA491A5EA8

 

C:\DOTNET\ZipConsole\ZipConsole\bin\Release>ConsoleApplication1 C:\ICT\Programs\

Ziptest

adding selection '*.*' from dir 'C:\ICT\Programs\Ziptest'...

found 10 files...

adding C:\ICT\Programs\Ziptest\TR8001.INI...

adding C:\ICT\Programs\Ziptest\TR8002.INI...

adding C:\ICT\Programs\Ziptest\TR8003.INI...

adding C:\ICT\Programs\Ziptest\TR8004.INI...

adding C:\ICT\Programs\Ziptest\TR8005.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800A.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800B.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800C.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800D.INI...

adding C:\ICT\Programs\Ziptest\Folder1\TR800F.INI...

saving....

DB2939073424B932AB2AF6B57B180304

 

Coordinator
Nov 13, 2009 at 6:05 PM

very weird. But I have an idea.

ok try this:

modify the code so the Run sub looks like this:

    Public Sub Run()
        Dim name As String = "HughChksum.zip"
        
        Using zip As ZipFile  = New ZipFile
            zip.EmitTimesInWindowsFormatWhenSaving = False  '' <-- insert this
            zip.StatusMessageTextWriter = System.Console.Out
            ZipFolder(zip, name, folderToZip)
        End Using 

        System.Console.WriteLine("{0}", GetFileChecksum(Name))
        
    End Sub

By default, the library stores 3 timestamps for each file into the zip: the created, modified, and "last accessed" time.  And yes, in NTFS, the "last accessed" time gets updated when you read the file.

As I was trying to figure out what could be changing, it occurred to me - I think I may have disabled lastaccess tracking on my NTFS volume, as a performance optimization.  Which means, the "last accessed" time is updated lazily, and in my case it might not be updated at all.  In your case, "last access" tracking could still be enabled on your NTFS volume, which means it will be updated, every time you read each ini file. 

By inserting that extra line into the code, you're telling the library not to include any of the timestamps into the zip.   If this produces zips with the same checksum, then you know it's the timestamp.

We can talk about whether you want to use EmitTimesInWindowsFormatWhenSaving in your production, after you run the test.

 

Coordinator
Nov 13, 2009 at 6:12 PM

Upon further review, in Windows Vista, NTFS "last access" tracking is disabled by default.

http://blogs.technet.com/filecab/archive/2006/11/07/disabling-last-access-time-in-windows-vista-to-improve-ntfs-performance.aspx 

In Windows XP and other operating systems, I believe it is enabled by default.  Regardless of the OS you're using, you may have it enabled, in which case the content of the zip would change.

I have it disabled - I use Vista, so it's disabled by default - and as a result the timestamps don't change, and the checksums don't change.

 

Nov 13, 2009 at 6:20 PM
Edited Nov 13, 2009 at 6:35 PM

After inserted the line at the code the problem fixed. I get the same checksum for both run now.

adding C:\ICT\Programs\Ziptest\TR8004.INI...
adding C:\ICT\Programs\Ziptest\TR8005.INI..
adding C:\ICT\Programs\Ziptest\Folder1\TR800A.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800B.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800C.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800D.INI...
adding C:\ICT\Programs\Ziptest\Folder1\TR800F.INI...
saving....
2C8F5ACBE62764EC8B50F8D2DF12409F
Coordinator
Nov 13, 2009 at 6:39 PM

ok, so that clears up the mystery.  Now, you have to decide whether you want the timestamps in there or not.

Some people use a checksum on a zip to find out if it's been changed, before transmitting it for example.  If that's your scenario they you're gonna want the zip to be the same, all the way down to the timestamp. 

On the other hand some people like the extended timestamps in the zip file.  Without those, when you unzip, the unzipped files will not have the same time/date metadata.

If you want both - you want the timestamps in there, but you also want the checksum to be consistent - then you can manually set the AccessedTime property on each Entry as it is added to the zip, prior to saving it. You can set it to a fixed time, or set it to the latest "last modified" time of any entry.   Either way it will result in a zip file that has the same checksum, each time.

 

Nov 13, 2009 at 7:01 PM
Edited Nov 13, 2009 at 7:16 PM

Hi Cheeso,

The first scenario is what I want. I am using the checksum to compare with what I already had on the server to see if I need to make the upload or not. Since the directory that I want has a bunch of files and I don't want to index each one by itself. What I do is zipped the whole directory and compare the checksum of the server version with the current version. If they are not the same then I will transmit if they are the same then abort. So, I really do not want the last access time. I just want to get the Create/Modified time.

So after reading the documentation on the zip doc, I think I will set this flag to false. I really do not care about the time on the zip file itself, just the content of the zip only.

 

Thank you very very much for all of your help. This is such a great work. Keep up the great job you are doing.

Coordinator
Nov 13, 2009 at 7:22 PM

Thanks for the compliments.

> I really do not want the last access time. I just want to get the Create/Modified time.

> I really do not care about the time on the zip file itself,...

to clarify, the timestamps I mentioned previously are on the entries within the zip, not on the zip file itself.  

AND, If you want the create+modified times  - those are 2 different timestamps -  then you need the extended tiemstamp.  To do this you must set EmitTimesInWindowsFormatWhenSaving = True, or don't set it at all (True is the default).  In this case, you will need to set the AccessedTime on each entry in order to keep the checksum of the generated zip file, constant across multiple runs.

If you are happy with only the modified time on each entry inside the zip, then you can set EmitTimesInWindowsFormatWhenSaving to False.