What causes `None` results from BeautifulSoup functions? How can I avoid "AttributeError: 'NoneType' object has no attribute..." with BeautifulSoup?

Overview

In general, there are two kinds of queries offered by BeautifulSoup: ones that look for a single specific element (tag, attribute, text etc.), and those which look for each element that meets the requirements.

For the latter group – the ones like .find_all that can give multiple results – the return value will be a list. If there weren’t any results, then the list is simply empty. Nice and simple.

However, for methods like .find and .select_one that can only give a single result, if nothing is found in the HTML, the result will be None. BeautifulSoup will not directly raise an exception to explain the problem. Instead, an AttributeError will commonly occur in the following code, which tries to use the None inappropriately (because it expected to receive something else – typically, an instance of the Tag class that BeautifulSoup defines). This happens because None simply doesn’t support the operation; it’s called an AttributeError because the . syntax means to access an attribute of whatever is on the left-hand side.
[TODO: once a proper canonical exists, link to an explanation of what attributes are and what AttributeError is.]

Examples

Let’s consider the non-working code examples in the question one by one:

>>> print(soup.sister)
None

This tries to look for a <sister> tag in the HTML (not a different tag that has a class, id or other such attribute equal to sister). There isn’t one, so the result is `None.

>>> print(soup.find('a', class_='brother'))
None

This tries to find an <a> tag that has a class attribute equal to brother, like <a href="https://example.com/bobby" class="brother">Bobby</a>. The document doesn’t contain anything like that; none of the a tags have that class (they all have the sister class instead).

>>> print(soup.select_one('a.brother'))
None

This is another way to do the same thing as the previous example, with a different method. (Instead of passing a tag name and some attribute values, we pass a CSS query selector.) The result is the same.

>>> soup.select_one('a.brother').text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'text'

Since soup.select_one('a.brother') returned None, this is the same as trying to do None.text. The error means exactly what it says: None doesn’t have a text to access. In fact, it doesn’t have any “ordinary” attributes; the NoneType class only defines special methods like __str__ (which converts None to the string 'None', so that it can look like the actual text None when it is printed).

“But why isn’t the data found?”

First, carefully check for typos, of course.

However, the most common problem is trying to scrape dynamic content. Keep in mind that BeautifulSoup processes static HTML, not JavaScript. It can only use data that would be seen when visiting the webpage with JavaScript disabled.

Modern webpages commonly generate a lot of the page data by running JavaScript in the client’s web browser. In typical cases, this JavaScript code will make more HTTP requests to get data, format it, and effectively edit the page (alter the DOM) on the fly. BeautifulSoup cannot handle any of this. It sees the JavaScript code in the web page as just more text.

To scrape a dynamic website, consider using Selenium to emulate interacting with the web page. Alternately, investigate what happens when using the site normally, in order to look for API requests that can then be handled directly. For example, the JavaScript code on the page might make a call to some API “endpoint” (a URI that can be directly accessed in the same way as a web page URL), which returns some JSON data. Therefore, write code to compute the needed URI, load it normally, read and parse the JSON response, and proceed with that data. In many cases this will be much easier than working with BeautifulSoup would have been even on the static content.

Another possible cause of confusion is that a browser’s “inspector” view might fill in missing tags that are not present in the actual HTML, but part of the DOM that the web browser creates when (very leniently) parsing that data. In the linked example, the browser showed a <tbody> tag inside a <table> in its “Inspect Element” view, even though that was not present in the actual page source. Thus, the corresponding Tag element created by BeautifulSoup to represent the table, reported None for its tbody attribute. Typically, problems like this can be worked around by searching within a subsection of the soup, rather than trying to “step into” each nested tag.

What causes `None` results from BeautifulSoup functions? How can I avoid “AttributeError: ‘NoneType’ object has no attribute…” with BeautifulSoup?

Overview

Examples

“But why isn’t the data found?”

Leave a Comment Cancel reply

Overview

Examples

“But why isn’t the data found?”

More Related Contents:

Leave a Comment Cancel reply