Progress Bar not available for zipfile? How to give feedback when program seems to hang

I would use a progress bar but the ‘new’ (.net 4.5) library for zipfile from System.IO.Compression which replaced Ionic.Zip.ZipFile does not have a method to report progress? Is there a way around this? Should I be using a Thread? or DoWork?

You really have two issues here:

  1. The .NET version of the ZipFile class does not include progress reporting.
  2. The CreateFromDirectory() method blocks until the entire archive has been created.

I am not that familiar with the Ionic/DotNetZip library, but browsing the docs, I don’t see any asynchronous methods for creating an archive from a directory. So #2 would be an issue regardless. The easiest way to solve it is to run the work in a background thread, e.g. using Task.Run().

As for the #1 issue, I would not characterize the .NET ZipFile class as having replaced the Ionic library. Yes, it’s new. But .NET already had .zip archive support in previous versions. Just not a convenience class like ZipFile. And neither the earlier support for .zip archives nor ZipFile provide progress reporting “out-of-the-box”. So neither really replace the Ionic DLL per se.

So IMHO, it seems to me that if you were using the Ionic DLL and it worked for you, the best solution is to just keep using it.

If you really don’t want to use it, your options are limited. The .NET ZipFile just doesn’t do what you want. There are some hacky things you could do, to work around the lack of feature. For writing an archive, you could estimate the compressed size, then monitor the file size as it’s being written and compute an estimated progress based on that (i.e. poll the file size in a separate async task, every second or so). For extracting an archive, you could monitor the files being generated, and compute progress that way.

But at the end of the day, that sort of approach is far from ideal.

Another option is to monitor the progress by using the older ZipArchive-based features, writing the archive yourself explicitly and tracking the bytes as they are read from the source file. To do this, you can write a Stream implementation that wraps the real input stream, and which provides progress reporting as the bytes are read.

Here’s a simple example of what that Stream might look like (note comment about this being for illustration purposes…it really would be better to delegate all the virtual methods, not just the two you’re required to):

Note: in the course of looking for existing questions related to this one, I found one that is essentially a duplicate, except that it’s asking for a VB.NET answer instead of C#. It also asked for progress updates while extracting from an archive, in addition to creating one. So I adapted my answer here, for VB.NET, adding the extraction method, and tweaking the implementation a little. I’ve updated the answer below to incorporate those changes.

StreamWithProgress.cs

class StreamWithProgress : Stream
{
    // NOTE: for illustration purposes. For production code, one would want to
    // override *all* of the virtual methods, delegating to the base _stream object,
    // to ensure performance optimizations in the base _stream object aren't
    // bypassed.

    private readonly Stream _stream;
    private readonly IProgress<int> _readProgress;
    private readonly IProgress<int> _writeProgress;

    public StreamWithProgress(Stream stream, IProgress<int> readProgress, IProgress<int> writeProgress)
    {
        _stream = stream;
        _readProgress = readProgress;
        _writeProgress = writeProgress;
    }

    public override bool CanRead { get { return _stream.CanRead; } }
    public override bool CanSeek {  get { return _stream.CanSeek; } }
    public override bool CanWrite {  get { return _stream.CanWrite; } }
    public override long Length {  get { return _stream.Length; } }
    public override long Position
    {
        get { return _stream.Position; }
        set { _stream.Position = value; }
    }

    public override void Flush() { _stream.Flush(); }
    public override long Seek(long offset, SeekOrigin origin) { return _stream.Seek(offset, origin); }
    public override void SetLength(long value) { _stream.SetLength(value); }

    public override int Read(byte[] buffer, int offset, int count)
    {
        int bytesRead = _stream.Read(buffer, offset, count);

        _readProgress?.Report(bytesRead);
        return bytesRead;
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        _stream.Write(buffer, offset, count);
        _writeProgress?.Report(count);
    }
}

With that in hand, it’s relatively simple to handle the archive creation explicitly, using that Stream to monitor the progress:

ZipFileWithProgress.cs

static class ZipFileWithProgress
{
    public static void CreateFromDirectory(string sourceDirectoryName, string destinationArchiveFileName, IProgress<double> progress)
    {
        sourceDirectoryName = Path.GetFullPath(sourceDirectoryName);

        FileInfo[] sourceFiles =
            new DirectoryInfo(sourceDirectoryName).GetFiles("*", SearchOption.AllDirectories);
        double totalBytes = sourceFiles.Sum(f => f.Length);
        long currentBytes = 0;

        using (ZipArchive archive = ZipFile.Open(destinationArchiveFileName, ZipArchiveMode.Create))
        {
            foreach (FileInfo file in sourceFiles)
            {
                // NOTE: naive method to get sub-path from file name, relative to
                // input directory. Production code should be more robust than this.
                // Either use Path class or similar to parse directory separators and
                // reconstruct output file name, or change this entire method to be
                // recursive so that it can follow the sub-directories and include them
                // in the entry name as they are processed.
                string entryName = file.FullName.Substring(sourceDirectoryName.Length + 1);
                ZipArchiveEntry entry = archive.CreateEntry(entryName);

                entry.LastWriteTime = file.LastWriteTime;

                using (Stream inputStream = File.OpenRead(file.FullName))
                using (Stream outputStream = entry.Open())
                {
                    Stream progressStream = new StreamWithProgress(inputStream,
                        new BasicProgress<int>(i =>
                        {
                            currentBytes += i;
                            progress.Report(currentBytes / totalBytes);
                        }), null);

                    progressStream.CopyTo(outputStream);
                }
            }
        }
    }

    public static void ExtractToDirectory(string sourceArchiveFileName, string destinationDirectoryName, IProgress<double> progress)
    {
        using (ZipArchive archive = ZipFile.OpenRead(sourceArchiveFileName))
        {
            double totalBytes = archive.Entries.Sum(e => e.Length);
            long currentBytes = 0;

            foreach (ZipArchiveEntry entry in archive.Entries)
            {
                string fileName = Path.Combine(destinationDirectoryName, entry.FullName);

                Directory.CreateDirectory(Path.GetDirectoryName(fileName));
                using (Stream inputStream = entry.Open())
                using(Stream outputStream = File.OpenWrite(fileName))
                {
                    Stream progressStream = new StreamWithProgress(outputStream, null,
                        new BasicProgress<int>(i =>
                        {
                            currentBytes += i;
                            progress.Report(currentBytes / totalBytes);
                        }));

                    inputStream.CopyTo(progressStream);
                }

                File.SetLastWriteTime(fileName, entry.LastWriteTime.LocalDateTime);
            }
        }
    }
}

Notes:

  • This uses a class called BasicProgress<T> (see below). I tested the code in a console program, and the built-in Progress<T> class will use the thread pool to execute the ProgressChanged event handlers, which in turn can lead to out-of-order progress reports. The BasicProgress<T> simply calls the handler directly, avoiding that issue. In a GUI program using Progress<T>, the execution of the event handlers would be dispatched to the UI thread in order. IMHO, one should still use the synchronous BasicProgress<T> in a library, but the client code for a UI program would be fine using Progress<T> (indeed, that would probably be preferable, since it handles the cross-thread dispatching on your behalf there).
  • This tallies the sum of the file lengths before doing any work. Of course, this incurs a slight start-up cost. For some scenarios, it might be sufficient to just report total bytes processed, and let the client code worry about whether there’s a need to do that initial tally or not.

BasicProgress.cs

class BasicProgress<T> : IProgress<T>
{
    private readonly Action<T> _handler;

    public BasicProgress(Action<T> handler)
    {
        _handler = handler;
    }

    void IProgress<T>.Report(T value)
    {
        _handler(value);
    }
}

And of course, a little program to test it all:

Program.cs

class Program
{
    static void Main(string[] args)
    {
        string sourceDirectory = args[0],
            archive = args[1],
            archiveDirectory = Path.GetDirectoryName(Path.GetFullPath(archive)),
            unpackDirectoryName = Guid.NewGuid().ToString();

        File.Delete(archive);
        ZipFileWithProgress.CreateFromDirectory(sourceDirectory, archive,
            new BasicProgress<double>(p => Console.WriteLine($"{p:P2} archiving complete")));

        ZipFileWithProgress.ExtractToDirectory(archive, unpackDirectoryName,
            new BasicProgress<double>(p => Console.WriteLine($"{p:P0} extracting complete")));
    }
}

Leave a Comment