Could zip.UpdateFiles function be overloaded to accept DataTable?

Jan 14, 2010 at 1:56 PM

I have included your DotNetZip utility (version 1.8) into a VB.net 2008 (win forms) program.

I run this program daily 12:01am and it zips thousands of files into a daily zip folder. (YYYY-MM-DD.zip format).

Last night it zipped 8,319 files.  It took 23 seconds to load the file names into a DataTable, where as it took 9 minutes 35 seconds to load those same file names into a List(Of T) structure that was passed to the UpdateFiles function like the following:  zip.UpdateFiles(FileList.ToArray, "").

As the number of files goes up, it exponentially takes longer to load the List(of t) structure. e.g. 11,000 files takes approximately 15 minutes.

If I could simply pass a single column DataTable into the UpdateFiles function, I think you can see how much time would be saved.

Maybe you know of a faster method that I'm just not aware of?  Thanks in advance for any advice you might be able to give.

Coordinator
Jan 14, 2010 at 6:35 PM

Hmm, let's see.  Are you saying that it takes your code over 9 minutes to create a List<String>?   You're not saying that it takes a lot of time for DotNetZip to do something.  You're saying that it takes a lot of time for your own code to prepare a data structure that will eventually be used in a call to DotNetZip.  Is that right?

If so, that seems really slow - there is absolutely something amiss in your code.  I just tried and on my machine it takes less than 0.06s to create a list of 9000 randomly generated strings.  In other words, your code that builds a list is running 10,000 times slower than mine.

Are you, by chance, querying the filesystem for each file?  Querying the database (you mentioned a datatable)?  Look for something unusual that is causing the delay.  Use a finer-grained analysis to figure out where in your code the 9 minutes is coming from.

 

Jan 14, 2010 at 7:05 PM

If yours goes that fast, there must certainly be something wrong in my code. 

What I'm doing is taking the filenames from a directory and based on the last-modified date of each file am assigning it a zip file name which I'm loading into a DataTable.  The columns of my DataTable are: ZipFileName, FileName, LastModifyDate.  I sort my DataTable in the order of those 3 columns.

I loop through the DataTable to build my List<String> structure and Call the DotNetZip code when the ZipFileName changes.

The DotNetZip code itself is working nicely.  It zipped the 8,319 files in 2 min. 21 seconds.

I'll have to dig deeper into my code to see what could be causing the delay. 

Would you mind posting your snippet of code for generating the 9000 strings?  I would like to run that on my end, just to see how long it takes.

Thanks, Dave

Coordinator
Jan 14, 2010 at 7:54 PM

No problem Dave.

Here's the code I used to measure the performance of List<String>:

http://cheeso.members.winisp.net/srcview.aspx?file=ListTest.cs

The snip that's interesting:

// Time an operation. This method accepts a single argument.
// The argument is a Func<> (a delegate) that accepts NO
// arguments, and returns a List<String>.  This method starts a
// stopwatch, invokes the delegate, then stops the stopwatch.
private double TimedOperation(int trial, Func<List<String>> a)
{
    var timer= new System.Diagnostics.Stopwatch();
    timer.Start();
    a();
    timer.Stop();
    System.Console.WriteLine("  trial {0}: elapsed time: {1:N4}s", trial, timer.Elapsed.TotalSeconds);
    return timer.Elapsed.TotalSeconds;
}


public List<String> FillList()
{
    var list = new List<String>();
    for (int i=0; i<_numEntries;  i++)
    {
        // pick a length, between 17 and 81
        int length = _rnd.Next(64) + 17;
        // generate a string of that length
        var s = GenerateRandomAsciiString(length);
        // add that string to the list
        list.Add(s);
    }

    return list;
}

public void Run()
{
    System.Console.WriteLine("Testing the filling of List<String> with {0} entries ({1} cycles)",
                             _numEntries, _nCycles);

    double totalTime = 0.0;
    Console.WriteLine();
    Console.WriteLine(new String('=',55));
    for (int i=0; i<_nCycles+1; i++)
    {
        totalTime += TimedOperation(i, FillList);
        // throw out the first trial.
        if (i==0) totalTime = 0.0;
    }
    Console.WriteLine(new String('=',55));
    Console.WriteLine("  avg (excluding 0th trial): {0,7:F4}s", totalTime / _nCycles);
}


Coordinator
Jan 14, 2010 at 8:10 PM

Whoops - here it is in a VB Console application

http://cheeso.members.winisp.net/srcview.aspx?file=ListTest.VB.vb

the interesting parts:

Public Function FillList() As List(Of String)
    Dim list As New List(Of String)
    Dim i As Integer
    For i = 0 To Me._numEntries - 1
        Dim length As Integer = (Me._rnd.Next(&H40) + &H11)
        Dim item As String = Me.GenerateRandomAsciiString(length)
        list.Add(item)
    Next i
    Return list
End Function

Private Function TimedOperation(ByVal trial As Integer, ByVal a As Func(Of List(Of String))) As Double
    Dim stopwatch As New Stopwatch
    stopwatch.Start
    a.Invoke
    stopwatch.Stop
    Console.WriteLine("  trial {0}: elapsed time: {1:N4}s", trial, stopwatch.Elapsed.TotalSeconds)
    Return stopwatch.Elapsed.TotalSeconds
End Function

Public Sub Run()
    Console.WriteLine("Testing the filling of List<String> with {0} entries ({1} cycles)", Me._numEntries, Me._nCycles)
    Dim num As Double = 0
    Console.WriteLine
    Console.WriteLine(New String("="c, &H37))
    Dim i As Integer
    For i = 0 To (Me._nCycles + 1) - 1
        num = (num + Me.TimedOperation(i, New Func(Of List(Of String))(AddressOf Me.FillList)))
        If (i = 0) Then
            num = 0
        End If
    Next i
    Console.WriteLine(New String("="c, &H37))
    Console.WriteLine("  avg (excluding 0th trial): {0,7:F4}s", (num / CDbl(Me._nCycles)))
End Sub

 

Jan 14, 2010 at 8:34 PM

Thanks for including the VB.  I'll let you know what I find out.

Jan 18, 2010 at 5:29 PM

This was was definately a coding error on my behalf.  Thanks for all your help in pointing me in the right direction!

Coordinator
Jan 18, 2010 at 7:32 PM

you're welcome. glad you figured it out.