Missing parts on Beautiful Soup results

BeautifulSoup can use different parsers to handle HTML input. The HTML input here is a little broken, and the default HTMLParser parser doesn’t handle it very well.

Use the html5lib parser instead:

>>> len(BeautifulSoup(r.text, 'html').find('td', attrs={'class': 'eelantext'}).find_all('p'))
0
>>> len(BeautifulSoup(r.text, 'lxml').find('td', attrs={'class': 'eelantext'}).find_all('p'))
0
>>> len(BeautifulSoup(r.text, 'html5lib').find('td', attrs={'class': 'eelantext'}).find_all('p'))
22

More Related Contents:

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa0′ in position 20: ordinal not in range(128)
Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org
Difference between .string and .text BeautifulSoup
Scraping Google Finance (BeautifulSoup)
How to scrape a website which requires login using python and beautifulsoup?
Parsing HTML in python – lxml or BeautifulSoup? Which of these is better for what kinds of purposes?
Get an attribute value based on the name attribute with BeautifulSoup
ImportError: No Module Named bs4 (BeautifulSoup)
How to extract and download all images from a website using beautifulSoup?
Beautiful Soup cannot find a CSS class if the object has other classes, too
Scrape a dynamic website
Extracting text from script tag using BeautifulSoup in Python
Convert HTML into CSV
BeautifulSoup returns empty list when searching by compound class names
Extract content of with BeautifulSoup
How to scrape only visible webpage text with BeautifulSoup?
Beautifulsoup : Difference between .find() and .select()
How to find all comments with Beautiful Soup
How to find tag with particular text with Beautiful Soup?
Can bs4 get the dynamic content of a webpage if requests can’t?
Extract the ‘src’ attribute from an ‘img’ tag using Beautiful Soup
How can I access namespaced XML elements using BeautifulSoup?
How find specific data attribute from html tag in BeautifulSoup4?
BeautifulSoup – extracting attribute values
What causes `None` results from BeautifulSoup functions? How can I avoid “AttributeError: ‘NoneType’ object has no attribute…” with BeautifulSoup?
Scrape the absolute URL instead of a relative path in python
How to find the comment tag with BeautifulSoup?
Install Beautiful Soup using pip [duplicate]
BeautifulSoup getText from between , not picking up subsequent paragraphs
How to scrape dynamic webpages by Python

More Related Contents:

Leave a Comment Cancel reply