urllib - w3toppers.com

AttributeError: ‘module’ object has no attribute ‘urlopen’

This works in Python 2.x. For Python 3 look in the docs: import urllib.request with urllib.request.urlopen(“http://www.python.org”) as url: s = url.read() # I’m guessing this would output the html source code ? print(s)

How can I percent-encode URL parameters in Python?

Python 2 From the documentation: urllib.quote(string[, safe]) Replace special characters in string using the %xx escape. Letters, digits, and the characters ‘_.-‘ are never quoted. By default, this function is intended for quoting the path section of the URL.The optional safe parameter specifies additional characters that should not be quoted — its default value is … Read more

How to retrieve the values of dynamic html content using Python

Assuming you are trying to get values from a page that is rendered using javascript templates (for instance something like handlebars), then this is what you will get with any of the standard solutions (i.e. beautifulsoup or requests). This is because the browser uses javascript to alter what it received and create new DOM elements. … Read more

How to send POST request?

If you really want to handle with HTTP using Python, I highly recommend Requests: HTTP for Humans. The POST quickstart adapted to your question is: >>> import requests >>> r = requests.post(“http://bugs.python.org”, data={‘number’: 12524, ‘type’: ‘issue’, ‘action’: ‘show’}) >>> print(r.status_code, r.reason) 200 OK >>> print(r.text[:300] + ‘…’) <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”> … Read more

can we use XPath with BeautifulSoup?

Nope, BeautifulSoup, by itself, does not support XPath expressions. An alternative library, lxml, does support XPath 1.0. It has a BeautifulSoup compatible mode where it’ll try and parse broken HTML the way Soup does. However, the default lxml HTML parser does just as good a job of parsing broken HTML, and I believe is faster. … Read more

Downloading a picture via urllib and python

Python 2 Using urllib.urlretrieve import urllib urllib.urlretrieve(“http://www.gunnerkrigg.com//comics/00000001.jpg”, “00000001.jpg”) Python 3 Using urllib.request.urlretrieve (part of Python 3’s legacy interface, works exactly the same) import urllib.request urllib.request.urlretrieve(“http://www.gunnerkrigg.com//comics/00000001.jpg”, “00000001.jpg”)

urllib2.HTTPError: HTTP Error 403: Forbidden

By adding a few more headers I was able to get the data: import urllib2,cookielib site= “http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true” hdr = {‘User-Agent’: ‘Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11’, ‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8’, ‘Accept-Charset’: ‘ISO-8859-1,utf-8;q=0.7,*;q=0.3’, ‘Accept-Encoding’: ‘none’, ‘Accept-Language’: ‘en-US,en;q=0.8’, ‘Connection’: ‘keep-alive’} req = urllib2.Request(site, headers=hdr) try: page = urllib2.urlopen(req) except urllib2.HTTPError, e: print e.fp.read() content = … Read more

How to urlencode a querystring in Python?

Python 2 What you’re looking for is urllib.quote_plus: safe_string = urllib.quote_plus(‘string_of_characters_like_these:$#@=?%^Q^$’) #Value: ‘string_of_characters_like_these%3A%24%23%40%3D%3F%25%5EQ%5E%24′ Python 3 In Python 3, the urllib package has been broken into smaller components. You’ll use urllib.parse.quote_plus (note the parse child module) import urllib.parse safe_string = urllib.parse.quote_plus(…)

What are the differences between the urllib, urllib2, urllib3 and requests module?

I know it’s been said already, but I’d highly recommend the requests Python package. If you’ve used languages other than python, you’re probably thinking urllib and urllib2 are easy to use, not much code, and highly capable, that’s how I used to think. But the requests package is so unbelievably useful and short that everyone … Read more

UnicodeEncodeError: ‘charmap’ codec can’t encode characters

I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code: with open(fname, “w”) as f: f.write(html) with this: with open(fname, “w”, encoding=”utf-8″) as f: f.write(html) If you need to support Python 2, then use this: import io with io.open(fname, “w”, encoding=”utf-8″) as f: f.write(html) … Read more