Parallel.ForEach memory usage keeps growing

The Parallel.ForEach method is intended for parallelizing CPU-bound workloads. Downloading a file is an I/O bound workload, and so the Parallel.ForEach is not ideal for this case because it needlessly blocks ThreadPool threads. The correct way to do it is asynchronously, with async/await. The recommended class for making asynchronous web requests is the HttpClient, and for controlling the level of concurrency an excellent option is the TPL Dataflow library. For this case it is enough to use the simplest component of this library, the ActionBlock class:

async Task DownloadListAsync(List<string> list)
{
    using (var httpClient = new HttpClient())
    {
        var rest = ExcludeDownloaded(list);
        var block = new ActionBlock<string>(async link =>
        {
            await DownloadFileAsync(httpClient, link);
        }, new ExecutionDataflowBlockOptions()
        {
            MaxDegreeOfParallelism = 10
        });
        foreach (var link in rest)
        {
            await block.SendAsync(link);
        }
        block.Complete();
        await block.Completion;
    }
}

async Task DownloadFileAsync(HttpClient httpClient, string link)
{
    var fileName = Guid.NewGuid().ToString(); // code to generate unique fileName;
    var filePath = Path.Combine(SavePath, fileName);
    if (File.Exists(filePath)) return;
    var response = await httpClient.GetAsync(link);
    response.EnsureSuccessStatusCode();
    using (var contentStream = await response.Content.ReadAsStreamAsync())
    using (var fileStream = new FileStream(filePath, FileMode.Create,
        FileAccess.Write, FileShare.None, 32768, FileOptions.Asynchronous))
    {
        await contentStream.CopyToAsync(fileStream);
    }
}

The code for downloading a file with HttpClient is not as simple as the WebClient.DownloadFile(), but it’s what you have to do in order to keep the whole process asynchronous (both reading from the web and writing to the disk).


Caveat: Asynchronous filesystem operations are currently not implemented efficiently in .NET. For maximum efficiency it may be preferable to avoid using the FileOptions.Asynchronous option in the FileStream constructor.


.NET 6 update: The preferable way for parallelizing asynchronous work is now the Parallel.ForEachAsync API. A usage example can be found here.

Leave a Comment