how to follow meta refreshes in Python

Here is a solution using BeautifulSoup and httplib2 (and certificate based authentication): import BeautifulSoup import httplib2 def meta_redirect(content): soup = BeautifulSoup.BeautifulSoup(content) result=soup.find(“meta”,attrs={“http-equiv”:”Refresh”}) if result: wait,text=result[“content”].split(“;”) if text.strip().lower().startswith(“url=”): url=text.strip()[4:] return url return None def get_content(url, key, cert): h=httplib2.Http(“.cache”) h.add_certificate(key,cert,””) resp, content = h.request(url,”GET”) # follow the chain of redirects while meta_redirect(content): resp, content = h.request(meta_redirect(content),”GET”) return … Read more

104, ‘Connection reset by peer’ socket error, or When does closing a socket result in a RST rather than FIN?

I’ve had this problem. See The Python “Connection Reset By Peer” Problem. You have (most likely) run afoul of small timing issues based on the Python Global Interpreter Lock. You can (sometimes) correct this with a time.sleep(0.01) placed strategically. “Where?” you ask. Beats me. The idea is to provide some better thread concurrency in and … Read more

How do I get the IP address from a http request using the requests library?

It turns out that it’s rather involved. Here’s a monkey-patch while using requests version 1.2.3: Wrapping the _make_request method on HTTPConnectionPool to store the response from socket.getpeername() on the HTTPResponse instance. For me on python 2.7.3, this instance was available on response.raw._original_response. from requests.packages.urllib3.connectionpool import HTTPConnectionPool def _make_request(self,conn,method,url,**kwargs): response = self._old_make_request(conn,method,url,**kwargs) sock = getattr(conn,’sock’,False) if … Read more