BeautifulSoup return unexpected extra spaces
I believe this is a bug with Lxml’s HTML parser. Try: from bs4 import BeautifulSoup import urllib2 html = urllib2.urlopen (“http://www.beppegrillo.it”) prova = html.read() soup = BeautifulSoup(prova.replace(‘ISO-8859-1’, ‘utf-8’)) print soup Which is a workaround for the problem. I believe the issue was fixed in lxml 3.0 alpha 2 and lxml 2.3.6, so it could be … Read more