BeautifulSoup getText from between , not picking up subsequent paragraphs

You are getting close!

# Find all of the text between paragraph tags and strip out the html
page = soup.find('p').getText()

Using find (as you’ve noticed) stops after finding one result. You need find_all if you want all the paragraphs. If the pages are formatted consistently ( just looked over one), you could also use something like

soup.find('div',{'id':'ctl00_PlaceHolderMain_RichHtmlField1__ControlWrapper_RichHtmlField'})

to zero in on the body of the article.

Leave a Comment