How to handle IncompleteRead: in python

The link you included in your question is simply a wrapper that executes urllib’s read() function, which catches any incomplete read exceptions for you. If you don’t want to implement this entire patch, you could always just throw in a try/catch loop where you read your links. For example: try: page = urllib2.urlopen(urls).read() except httplib.IncompleteRead, … Read more

Scrape the absolute URL instead of a relative path in python

urllib.parse.urljoin() might help. It does a join, but it is smart about it and handles both relative and absolute paths. Note this is python 3 code. >>> import urllib.parse >>> base=”https://www.example-page-xl.com” >>> urllib.parse.urljoin(base, “https://stackoverflow.com/helloworld/index.php”) ‘https://www.example-page-xl.com/helloworld/index.php’ >>> urllib.parse.urljoin(base, ‘https://www.example-page-xl.com/helloworld/index.php’) ‘https://www.example-page-xl.com/helloworld/index.php’

How do I parse an HTML table with Nokogiri?

#!/usr/bin/ruby1.8 require ‘nokogiri’ require ‘pp’ html = <<-EOS (The HTML from the question goes here) EOS doc = Nokogiri::HTML(html) rows = doc.xpath(‘//table/tbody[@id=”threadbits_forum_251″]/tr’) details = rows.collect do |row| detail = {} [ [:title, ‘td[3]/div[1]/a/text()’], [:name, ‘td[3]/div[2]/span/a/text()’], [:date, ‘td[4]/text()’], [:time, ‘td[4]/span/text()’], [:number, ‘td[5]/a/text()’], [:views, ‘td[6]/text()’], ].each do |name, xpath| detail[name] = row.at_xpath(xpath).to_s.strip end detail end pp details … Read more

mechanize python click a button

clicking a type=”button” in a pure html form does nothing. For it to do anything, there must be javascript involved. And mechanize doesn’t run javascript. So your options are: Read the javascript yourself and simulate with mechanize what it would be doing Use spidermonkey to run the javascript code I’d do the first one, since … Read more

Using Python and Mechanize to submit form data and authenticate

I would definitely suggest trying to use the API if possible, but this works for me (not for your example post, which has been deleted, but for any active one): #!/usr/bin/env python import mechanize import cookielib import urllib import logging import sys def main(): br = mechanize.Browser() cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) … Read more

What should I do if socket.setdefaulttimeout() is not working?

While socket.setsocketimeout will set the default timeout for new sockets, if you’re not using the sockets directly, the setting can be easily overwritten. In particular, if the library calls socket.setblocking on its socket, it’ll reset the timeout. urllib2.open has a timeout argument, hovewer, there is no timeout in urllib2.Request. As you’re using mechanize, you should … Read more