Is there a library for retrieving a file from a remote zip? [closed]

You can solve this a bit more generally with less code. Essentially, create enough of a file-like object for ZipFile to use. So you wind up with z = ZipFile(HttpFile(url)) and it dynamically downloads just the portion needed. The advantage with this is you write less code, and it applies to more than just zip files. (In fact, I wonder if there is something like this already… I’m not finding it though.)

Using the same idea, you could also create a caching wrapper for HttpFile to avoid repeated downloads.

And here’s the code: (note the lack of error-handling)

#!/usr/bin/python
import urllib2

class HttpFile(object):
    def __init__(self, url):
        self.url = url
        self.offset = 0
        self._size = -1

    def size(self):
        if self._size < 0:
            f = urllib2.urlopen(self.url)
            self._size = int(f.headers["Content-length"])
        return self._size

    def read(self, count=-1):
        req = urllib2.Request(self.url)
        if count < 0:
            end = self.size() - 1
        else:
            end = self.offset + count - 1
        req.headers['Range'] = "bytes=%s-%s" % (self.offset, end)
        f = urllib2.urlopen(req)
        data = f.read()
        # FIXME: should check that we got the range expected, etc.
        chunk = len(data)
        if count >= 0:
            assert chunk == count
        self.offset += chunk
        return data

    def seek(self, offset, whence=0):
        if whence == 0:
            self.offset = offset
        elif whence == 1:
            self.offset += offset
        elif whence == 2:
            self.offset = self.size() + offset
        else:
            raise Exception("Invalid whence")

    def tell(self):
        return self.offset

Leave a Comment