multiprocessing.pool.MaybeEncodingError: ‘TypeError(“cannot serialize ‘_io.BufferedReader’ object”,)’

The http.client.HTTPResponse-object you get back from urlopen() has a _io.BufferedReader-object attached, and this object cannot be pickled. pickle.dumps(urllib.request.urlopen(‘http://www.python.org’).fp) Traceback (most recent call last): … pickle.dumps(urllib.request.urlopen(‘http://www.python.org’).fp) TypeError: cannot serialize ‘_io.BufferedReader’ object multiprocessing.Pool will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy uses threads instead … Read more

What should I do if socket.setdefaulttimeout() is not working?

While socket.setsocketimeout will set the default timeout for new sockets, if you’re not using the sockets directly, the setting can be easily overwritten. In particular, if the library calls socket.setblocking on its socket, it’ll reset the timeout. urllib2.open has a timeout argument, hovewer, there is no timeout in urllib2.Request. As you’re using mechanize, you should … Read more

How to know if urllib.urlretrieve succeeds?

Consider using urllib2 if it possible in your case. It is more advanced and easy to use than urllib. You can detect any HTTP errors easily: >>> import urllib2 >>> resp = urllib2.urlopen(“http://google.com/abc.jpg”) Traceback (most recent call last): <<MANY LINES SKIPPED>> urllib2.HTTPError: HTTP Error 404: Not Found resp is actually HTTPResponse object that you can … Read more

Replace special characters in a string in Python

One way is to use re.sub, that’s my preferred way. import re my_str = “hey th~!ere” my_new_string = re.sub(‘[^a-zA-Z0-9 \n\.]’, ”, my_str) print my_new_string Output: hey there Another way is to use re.escape: import string import re my_str = “hey th~!ere” chars = re.escape(string.punctuation) print re.sub(r'[‘+chars+’]’, ”,my_str) Output: hey there Just a small tip about … Read more

urllib.error.URLError:

You should use urllib.parse.urlencode(), urllib.parse.urljoin(), etc functions to construct urls instead of manually joining the strings. It would take care of : -> %3A conversion e.g.: >>> import urllib.parse >>> urllib.parse.quote(‘:’) ‘%3A’

How to unquote a urlencoded unicode string in python?

%uXXXX is a non-standard encoding scheme that has been rejected by the w3c, despite the fact that an implementation continues to live on in JavaScript land. The more common technique seems to be to UTF-8 encode the string and then % escape the resulting bytes using %XX. This scheme is supported by urllib.unquote: >>> urllib2.unquote(“%0a”) … Read more