urllib - w3toppers.com

multiprocessing.pool.MaybeEncodingError: ‘TypeError(“cannot serialize ‘_io.BufferedReader’ object”,)’

The http.client.HTTPResponse-object you get back from urlopen() has a _io.BufferedReader-object attached, and this object cannot be pickled. pickle.dumps(urllib.request.urlopen(‘http://www.python.org’).fp) Traceback (most recent call last): … pickle.dumps(urllib.request.urlopen(‘http://www.python.org’).fp) TypeError: cannot serialize ‘_io.BufferedReader’ object multiprocessing.Pool will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy uses threads instead … Read more

What should I do if socket.setdefaulttimeout() is not working?

While socket.setsocketimeout will set the default timeout for new sockets, if you’re not using the sockets directly, the setting can be easily overwritten. In particular, if the library calls socket.setblocking on its socket, it’ll reset the timeout. urllib2.open has a timeout argument, hovewer, there is no timeout in urllib2.Request. As you’re using mechanize, you should … Read more

Making a POST call instead of GET using urllib2

Do it in stages, and modify the object, like this: # make a string with the request type in it: method = “POST” # create a handler. you can specify different handlers here (file uploads etc) # but we go for the default handler = urllib2.HTTPHandler() # create an openerdirector instance opener = urllib2.build_opener(handler) # … Read more

How to know if urllib.urlretrieve succeeds?

Consider using urllib2 if it possible in your case. It is more advanced and easy to use than urllib. You can detect any HTTP errors easily: >>> import urllib2 >>> resp = urllib2.urlopen(“http://google.com/abc.jpg”) Traceback (most recent call last): <<MANY LINES SKIPPED>> urllib2.HTTPError: HTTP Error 404: Not Found resp is actually HTTPResponse object that you can … Read more

urllib.error.URLError:

You should use urllib.parse.urlencode(), urllib.parse.urljoin(), etc functions to construct urls instead of manually joining the strings. It would take care of : -> %3A conversion e.g.: >>> import urllib.parse >>> urllib.parse.quote(‘:’) ‘%3A’

How to unquote a urlencoded unicode string in python?

%uXXXX is a non-standard encoding scheme that has been rejected by the w3c, despite the fact that an implementation continues to live on in JavaScript land. The more common technique seems to be to UTF-8 encode the string and then % escape the resulting bytes using %XX. This scheme is supported by urllib.unquote: >>> urllib2.unquote(“%0a”) … Read more

Python: Get HTTP headers from urllib2.urlopen call?

Use the response.info() method to get the headers. From the urllib2 docs: urllib2.urlopen(url[, data][, timeout]) … This function returns a file-like object with two additional methods: geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed info() — return the meta-information of the page, such as headers, … Read more

python save image from url

import requests img_data = requests.get(image_url).content with open(‘image_name.jpg’, ‘wb’) as handler: handler.write(img_data)

SSL: CERTIFICATE_VERIFY_FAILED with Python3

In my case, I used the ssl module to “workaround” the certification like so: import ssl ssl._create_default_https_context = ssl._create_unverified_context Then to read your link content, you can use: urllib.request.urlopen(urllink)