You can use HTTP Range
header to fetch just part of file (already covered for python here).
Just start several threads and fetch different range with each and you’re done 😉
def download(url,start):
req = urllib2.Request('http://www.python.org/')
req.headers['Range'] = 'bytes=%s-%s' % (start, start+chunk_size)
f = urllib2.urlopen(req)
parts[start] = f.read()
threads = []
parts = {}
# Initialize threads
for i in range(0,10):
t = threading.Thread(target=download, i*chunk_size)
t.start()
threads.append(t)
# Join threads back (order doesn't matter, you just want them all)
for i in threads:
i.join()
# Sort parts and you're done
result="".join(parts[i] for i in sorted(parts.keys()))
Also note that not every server supports Range
header (and especially servers with php scripts responsible for data fetching often don’t implement handling of it).