Compare Files

Feb 17, 2009 at 9:03 PM
Cheeso, thanks so much for your time with all these posts..
If you or anyone else has some vb examples to help me with comparisons, I would appreciate it.
I want to compare the contents of the zip file (which contains directories and files) with the directory from which it was created. Then based on file modified date, make additions/subtractions based on that date comparison.
Being the rookie I am, I could use a push on how to accomplish the comparison.
Coordinator
Feb 17, 2009 at 11:44 PM
Hey BFJ,
you are building a backup tool?

If so, I'd recommend doing it in three passes:

First pass, go through the zip file, enumerate each entry (This is a For Each loop on the ZipFile instance).  For each entry, check System.IO.File.Exists(pathname), where pathname is the entry.FileName prefixed with your desired extract directory on disk.  If c:\temp is the  desired extract directory, and the FileName on the entry is "documents\WeatherReport.doc" then the combined path is "c:\temp\documents\WeatherReport.doc".  You can use System.IO.Path.Combine() to combine those parts to get the full pathname.  Ok, with that call to File.Exists, you are checking if the file that is present in the zip archive is also present on the disk.  If the file no longer exists on the disk, then maybe you want to remove the file from the ZipFile.  Mark this entry for deletion. (You cannot delete it yet, not within the scope of an enumeration).   If the file does exist on disk, then create a System.IO.FileInfo on the file, and within that FileInfo object, grab the LastModified time.  Compare the LastModified time on the FileInfo object against the LastModified time on the ZipEntry.  If the one on disk is later, then I guess you want to add that file to the zip, because the disk file has been updated later than the thing in the zip archive.

Pass 2:  enumerate all files on disk.  For any files on disk that are not present in the ZipFile, you will want to mark them to be added to the archive.

Pass 3: go through the list of things to be added or deleted, and perform that action.

----

There are lots of ways to "mark an entry for deletion" as I said above, but the simplest is to just keep a list of filenames to be added or deleted.  By this I mean, a System.Collections.Generic.List(of String).  You will want a list for entries-to-be-added (maybe it is called EntriesToBeAdded)  and another list to hold the entries-to-de-deleted. 

To remove an entry, just set the indexer to Nothing, or you can call ZipFile.RemoveEntry(entryName).  To add an entry, just call ZipFile.AddFile() or ZipFile.AddDirectory().   

----
Last thing,  be careful about overwriting old backups.  Let's say you have a backup created the way I described above, dating from Monday.  Then on Wednesday, you delete a file that you really want, though you do not realize it at the time.  On Friday, you make another backup.  Because the file was removed from the disk on Wednesday, according to the logic I outlined above, you would remove the file from the "backup" zip archive.  Then suppose on Sunday you realize that you actually want the file that you had inadvertently removed on Wednesday.  Whoops!  Your backup from Friday has erased the file from the backup zip.  On the other hand if you had a different backup zip for every day of the week, then you'd be protected.

The point is, the logic in your backup tool has to sort of mesh with your backup management practice.

Feb 18, 2009 at 12:52 PM
Thanks, I'll use these ideas to tweak the code.
Feb 18, 2009 at 8:56 PM
I get an argument exception saying the filename, in this case "Aaron.jpg" does not exist in the zip file. But it does exist in the file...
The code below, MarkFileRemove holds 5 items, 3 files and 2 directories, the very first entry to be removed is the Aaron.jpg.
2 Questions-
1-What am I overlooking?
2-The save file at the end won't allow me to save to the original file, says in use by another process, so I have to save it to another file name. Why?



 

Dim sw As New System.IO.StringWriter

 

 

Dim PathName As String

 

 

'pass 1, mark entries for removal from zip if they no longer exist

 

 

Dim MarkFileRemove As New System.Collections.Generic.List(Of Zip.ZipEntry)

 

 

Dim FileEntryRemove As Zip.ZipEntry

 

 

For Each FileEntryRemove In Zip.ZipFile.Read(OriginalFile, sw)

 

PathName = System.IO.Path.Combine(

My.Computer.FileSystem.SpecialDirectories.MyPictures, FileEntryRemove.FileName)

 

 

If Not System.IO.File.Exists(PathName) Then

 

MarkFileRemove.Add(FileEntryRemove)

 

End If

 

 

Next

 

 

'pass 2, remove entries from pass 1

 

 

Dim RemoveFile As Zip.ZipEntry

 

 

For Each RemoveFile In MarkFileRemove           < (I can put a line break here to some things, just not on DotNet code..anyway, at this break I get my info)

 

zipArchive.RemoveEntry(RemoveFile)

 

Next


 

'save file

 

 

 

zipArchive.Save(cDrive &

"jeffUpdate.zip")

 

Coordinator
Feb 19, 2009 at 2:06 AM

Ok, I see a couple rough spots.
First, it looks like you are reading the zipfile twice. you have a variable, "zipArchive" which I guess you get from a ZipFile.Read() method call. While that variable is not "Disposed" , the zipfile is open and readable. If you are learning .NET, you need to know about the Dispose() method and the Using construct. The name of the method indicates its purpose - The Dispose() method disposes or discards resources associated to the object (instance) in question. Not all objects in .NET have "resources" that need disposing, but the ZipFile is one that does. It holds filehandles which need to be released when an application is finished using the ZipFile. The Using construct is just a convenient way of using a "disposable" object and calling Dispose() on it when finished.

So, in the following code:

Using Foo = GetAnObject()
  ...do stuff here with Foo ... 
End Using

...allows you to do stuff with an object, and then call Dispose() on it implicitly. This is the model that you should follow for using a ZipFile object. Wrap it in a Using clause, always. Of course there are exceptions to this rule, but in your app, it seems like you ought to be using a Using clause. Maybe you are and I just didn't see it.

Ok, next thing. Before the zipArchive variable has been Disposed - that is to say, while it is still usable and the filehandle is still open - you do a ZipFile.Read() again, to enumerate the entries in the zipfile. This is probably wrong. What you have done there is open the file again. You now have 2 ZipFile objects, each with a distinct filehandle, open on the same file. You may not be clear on this, because you call ZipFile.Read() and don't assign the result to a variable of type ZipFile.  But even though you have not assigned it, the ZipFile object is being instantiated.  (For Each FileEntryRemove In Zip.ZipFile.Read(OriginalFile))     This ZipFile.Read() is not wrapped in a Using clause, so the ZipFile object that gets created, does not get Disposed. The filehandle remains open and active. Later when you try to save the original zip file by calling zipArchive.Save(), it fails, because of this 2nd ZipFile that you have implicitly created and not Disposed. I think what you really want to do there instead of reading the file again, is just enumerate the entries in the file you have already read. In other words, enumerate the entries in the zipArchive object itself. This is easy enough to do.

The last thing has to do with object identity.  You get a message saying "Aaron.jpg" does not exist in the zip file.  The message is probably misleading.  The thing is, the entry you are trying to remove from zipArchive, actually belongs to a different instance of ZipFile, the one you implicitly opened (but did not assign to a variable) as I described above.  You can only remove an entry that belongs to its parent ZipFile.  Even though the ZipEntry object refers to the same stuff inside the zip file, if you got the ZipEntry from a different instance of ZipFile, then it is not the same ZipEntry.  I hope you can see what I mean.  In any case, this object identity problem goes away if you don't read the file twice, as I suggested above.
 
Here is some code that may work better for you.

	Using zipArchive as ZipFile = ZipFile.Read(OriginalFile)

	    'pass 1, enumerate entries in the zipArchive, mark them for removal from zip if they no longer exist 
	    Dim MarkFileRemove As New System.Collections.Generic.List(Of Zip.ZipEntry) 
'enumerate entries in the zipArchive For Each Candidate as ZipEntry in zipArchive Dim PathName as String = System.IO.Path.Combine(My.Computer.FileSystem.SpecialDirectories.MyPictures, Candidate.FileName) If Not System.IO.File.Exists(PathName) Then MarkFileRemove.Add(Candidate) End If Next 'pass 2, remove entries marked in pass 1 Dim RemoveFile As Zip.ZipEntry For Each RemoveFile In MarkFileRemove zipArchive.RemoveEntry(RemoveFile) Next 'save file zipArchive.Save() End Using