The problem is that your <a>
tag with the <i>
tag inside, doesn’t have the string
attribute you expect it to have. First let’s take a look at what text=""
argument for find()
does.
NOTE: The text
argument is an old name, since BeautifulSoup 4.4.0 it’s called string
.
From the docs:
Although string is for finding strings, you can combine it with
arguments that find tags: Beautiful Soup will find all tags whose
.string matches your value for string. This code finds the tags
whose .string is “Elsie”:soup.find_all("a", string="Elsie") # [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]
Now let’s take a look what Tag
‘s string
attribute is (from the docs again):
If a tag has only one child, and that child is a NavigableString, the
child is made available as .string:title_tag.string # u'The Dormouse's story'
(…)
If a tag contains more than one thing, then it’s not clear what
.string should refer to, so .string is defined to be None:print(soup.html.string) # None
This is exactly your case. Your <a>
tag contains a text and <i>
tag. Therefore, the find gets None
when trying to search for a string and thus it can’t match.
How to solve this?
Maybe there is a better solution but I would probably go with something like this:
import re
from bs4 import BeautifulSoup as BS
soup = BS("""
<a href="https://stackoverflow.com/customer-menu/1/accounts/1/update">
<i class="fa fa-edit"></i> Edit
</a>
""")
links = soup.find_all('a', href="https://stackoverflow.com/customer-menu/1/accounts/1/update")
for link in links:
if link.find(text=re.compile("Edit")):
thelink = link
break
print(thelink)
I think there are not too many links pointing to /customer-menu/1/accounts/1/update
so it should be fast enough.