Beautiful Soup 4 find_all don't find links that Beautiful Soup 3 finds

You have lxml installed, which means that BeautifulSoup 4 will use that parser over the standard-library html.parser option.

You can upgrade lxml to 3.2.1 (which for me returns 1701 results for your test page); lxml itself uses libxml2 and libxslt which may be to blame too here. You may have to upgrade those instead / as well. See the lxml requirements page; currently libxml2 2.7.8 or newer is recommended.

Or explicitly specify the other parser when parsing the soup:

s4 = bs4.BeautifulSoup(r.text, 'html.parser')

More Related Contents:

Scraping Google Finance (BeautifulSoup)
retrieve links from web page using python and BeautifulSoup [closed]
How to find elements by class
can we use XPath with BeautifulSoup?
Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org
Using BeautifulSoup to extract text without tags
How to scrape a website which requires login using python and beautifulsoup?
Web scraping program cannot find element which I can see in the browser
BeautifulSoup webscraping find_all( ): finding exact match
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\xe9’ – -when using urlib.request python3
How to get text from span tag in BeautifulSoup
How to scrape only visible webpage text with BeautifulSoup?
How to find tag with particular text with Beautiful Soup?
Can bs4 get the dynamic content of a webpage if requests can’t?
Python regular expression for HTML parsing
How find specific data attribute from html tag in BeautifulSoup4?
Exclude unwanted tag on Beautifulsoup Python
BeautifulSoup: Get the contents of a specific table
Speeding up beautifulsoup
Scrape Dynamic contents created by Javascript using Python
BeautifulSoup returns None even though the element exists
Clicking link using beautifulsoup in python
Beautiful Soup and Table Scraping – lxml vs html parser
Download all pdf files from a website using Python
How to scrape dynamic webpages by Python
How to handle IncompleteRead: in python
BeautifulSoup getting href [duplicate]
Python BeautifulSoup extract text between element
How can I get href links from HTML using Python?
Extract the ‘src’ attribute from an ‘img’ tag using Beautiful Soup

Beautiful Soup 4 find_all don’t find links that Beautiful Soup 3 finds

Leave a Comment Cancel reply