Best Practices to Create and Download a huge ZIP (from several BLOBs) in a WebApp

Kick-off example of a totally dynamic ZIP file created by streaming each BLOB from the database directly to the client’s File System.

Tested with huge archives with the following performances:

  • Server disk space cost: 0 MegaBytes
  • Server RAM cost: ~ xx Megabytes. the memory consumption is not testable (or at least I don’t know how to do it properly), because I got different, apparently random results from running the same routine multiple times (by using Runtime.getRuntime().freeMemory()) before, during and after the loop). However, the memory consumption is lower than using byte[], and that’s enough.

FileStreamDto.java using InputStream instead of byte[]

public class FileStreamDto implements Serializable {
    @Getter @Setter private String filename;
    @Getter @Setter private InputStream inputStream; 
}

Java Servlet (or Struts2 Action)

/* Read the amount of data to be streamed from Database to File System,
   summing the size of all Oracle's BLOB, PostgreSQL's ABYTE etc: 
   SELECT sum(length(my_blob_field)) FROM my_table WHERE my_conditions
*/          
Long overallSize = getMyService().precalculateZipSize();

// Tell the browser is a ZIP
response.setContentType("application/zip"); 
// Tell the browser the filename, and that it needs to be downloaded instead of opened
response.addHeader("Content-Disposition", "attachment; filename=\"myArchive.zip\"");        
// Tell the browser the overall size, so it can show a realistic progressbar
response.setHeader("Content-Length", String.valueOf(overallSize));      

ServletOutputStream sos = response.getOutputStream();       
ZipOutputStream zos = new ZipOutputStream(sos);

// Set-up a list of filenames to prevent duplicate entries
HashSet<String> entries = new HashSet<String>();

/* Read all the ID from the interested records in the database, 
   to query them later for the streams: 
   SELECT my_id FROM my_table WHERE my_conditions */           
List<Long> allId = getMyService().loadAllId();

for (Long currentId : allId){
    /* Load the record relative to the current ID:         
       SELECT my_filename, my_blob_field FROM my_table WHERE my_id = :currentId            
       Use resultset.getBinaryStream("my_blob_field") while mapping the BLOB column */
    FileStreamDto fileStream = getMyService().loadFileStream(currentId);

    // Create a zipEntry with a non-duplicate filename, and add it to the ZipOutputStream
    ZipEntry zipEntry = new ZipEntry(getUniqueFileName(entries,fileStream.getFilename()));
    zos.putNextEntry(zipEntry);

    // Use Apache Commons to transfer the InputStream from the DB to the OutputStream
    // on the File System; at this moment, your file is ALREADY being downloaded and growing
    IOUtils.copy(fileStream.getInputStream(), zos);

    zos.flush();
    zos.closeEntry();

    fileStream.getInputStream().close();                    
}

zos.close();
sos.close();    

Helper method for handling duplicate entries

private String getUniqueFileName(HashSet<String> entries, String completeFileName){                         
    if (entries.contains(completeFileName)){                                                
        int extPos = completeFileName.lastIndexOf('.');
        String extension = extPos>0 ? completeFileName.substring(extPos) : "";          
        String partialFileName = extension.length()==0 ? completeFileName : completeFileName.substring(0,extPos);
        int x=1;
        while (entries.contains(completeFileName = partialFileName + "(" + x + ")" + extension))
            x++;
    } 
    entries.add(completeFileName);
    return completeFileName;
}

Thanks a lot @prunge for giving me the idea of the direct streaming.

Leave a Comment