urllib2 - w3toppers.com

Python3 error: initial_value must be str or None, with StringIO

response.read() returns an instance of bytes while StringIO is an in-memory stream for text only. Use BytesIO instead. From What’s new in Python 3.0 – Text Vs. Data Instead Of Unicode Vs. 8-bit The StringIO and cStringIO modules are gone. Instead, import the io module and use io.StringIO or io.BytesIO for text and data respectively.

urllib2 and json

Messa’s answer only works if the server isn’t bothering to check the content-type header. You’ll need to specify a content-type header if you want it to really work. Here’s Messa’s answer modified to include a content-type header: import json import urllib2 data = json.dumps([1, 2, 3]) req = urllib2.Request(url, data, {‘Content-Type’: ‘application/json’}) f = urllib2.urlopen(req) … Read more

How to resolve URLError:

The error code 10060 means it cannot connect to the remote peer. It might be because of the network problem or mostly your setting issues, such as proxy setting. You could try to connect the same host with other tools(such as ncat) and/or with another PC within your same local network to find out where … Read more

Multiprocessing useless with urllib2?

Ah here comes yet another discussion about the GIL. Well here’s the thing. Fetching content with urllib2 is going to be mostly IO-bound. Native threading AND multiprocessing will both have the same performance when the task is IO-bound (threading only becomes a problem when it’s CPU-bound). Yes you can speed it up, I’ve done it … Read more

how to follow meta refreshes in Python

Here is a solution using BeautifulSoup and httplib2 (and certificate based authentication): import BeautifulSoup import httplib2 def meta_redirect(content): soup = BeautifulSoup.BeautifulSoup(content) result=soup.find(“meta”,attrs={“http-equiv”:”Refresh”}) if result: wait,text=result[“content”].split(“;”) if text.strip().lower().startswith(“url=”): url=text.strip()[4:] return url return None def get_content(url, key, cert): h=httplib2.Http(“.cache”) h.add_certificate(key,cert,””) resp, content = h.request(url,”GET”) # follow the chain of redirects while meta_redirect(content): resp, content = h.request(meta_redirect(content),”GET”) return … Read more

How do I send a custom header with urllib2 in a HTTP Request?

Not quite. Creating a Request object does not actually send the request, and Request objects have no Read() method. (Also: read() is lowercase.) All you need to do is pass the Request as the first argument to urlopen() and that will give you your response. import urllib2 request = urllib2.Request(“http://www.google.com”, headers={“Accept” : “text/html”}) contents = … Read more

Why do I get “‘str’ object has no attribute ‘read'” when trying to use `json.load` on a string? [duplicate]

The problem is that for json.load you should pass a file like object with a read function defined. So either you use json.load(response) or json.loads(response.read()).

Using urllib2 with SOCKS proxy

Try with pycurl: import pycurl c1 = pycurl.Curl() c1.setopt(pycurl.URL, ‘http://www.google.com’) c1.setopt(pycurl.PROXY, ‘localhost’) c1.setopt(pycurl.PROXYPORT, 8080) c1.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5) c2 = pycurl.Curl() c2.setopt(pycurl.URL, ‘http://www.yahoo.com’) c2.setopt(pycurl.PROXY, ‘localhost’) c2.setopt(pycurl.PROXYPORT, 8081) c2.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5) c1.perform() c2.perform()

How to save “complete webpage” not just basic html using Python

Try emulating your browser with selenium. This script will pop up the save as dialog for the webpage. You will still have to figure out how to emulate pressing enter for download to start as the file dialog is out of selenium’s reach (how you do it is also OS dependent). from selenium import webdriver … Read more

urllib2 file name

Did you mean urllib2.urlopen? You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()[‘Content-Disposition’], but as it is I think you’ll just have to parse the url. You could use urlparse.urlsplit, but if you have any URLs like at the second example, you’ll end up having to … Read more