You can solve this a bit more generally with less code. Essentially, create enough of a file-like object for ZipFile to use. So you wind up with z = ZipFile(HttpFile(url))
and it dynamically downloads just the portion needed. The advantage with this is you write less code, and it applies to more than just zip files. (In fact, I wonder if there is something like this already… I’m not finding it though.)
Using the same idea, you could also create a caching wrapper for HttpFile to avoid repeated downloads.
And here’s the code: (note the lack of error-handling)
#!/usr/bin/python
import urllib2
class HttpFile(object):
def __init__(self, url):
self.url = url
self.offset = 0
self._size = -1
def size(self):
if self._size < 0:
f = urllib2.urlopen(self.url)
self._size = int(f.headers["Content-length"])
return self._size
def read(self, count=-1):
req = urllib2.Request(self.url)
if count < 0:
end = self.size() - 1
else:
end = self.offset + count - 1
req.headers['Range'] = "bytes=%s-%s" % (self.offset, end)
f = urllib2.urlopen(req)
data = f.read()
# FIXME: should check that we got the range expected, etc.
chunk = len(data)
if count >= 0:
assert chunk == count
self.offset += chunk
return data
def seek(self, offset, whence=0):
if whence == 0:
self.offset = offset
elif whence == 1:
self.offset += offset
elif whence == 2:
self.offset = self.size() + offset
else:
raise Exception("Invalid whence")
def tell(self):
return self.offset