BeautifulSoup getText from between , not picking up subsequent paragraphs

You are getting close!

# Find all of the text between paragraph tags and strip out the html
page = soup.find('p').getText()

Using find (as you’ve noticed) stops after finding one result. You need find_all if you want all the paragraphs. If the pages are formatted consistently ( just looked over one), you could also use something like

soup.find('div',{'id':'ctl00_PlaceHolderMain_RichHtmlField1__ControlWrapper_RichHtmlField'})

to zero in on the body of the article.

More Related Contents:

How to remove \xa0 from string in Python?
bs4.FeatureNotFound: Couldn’t find a tree builder with the features you requested: lxml. Do you need to install a parser library?
BeautifulSoup returns empty list when searching by compound class names
Extract content of with BeautifulSoup
Install Beautiful Soup using pip [duplicate]
How to handle IncompleteRead: in python
Searching for IP addresses in a file
python program doesn’t work
BeautifulSoup Grab Visible Webpage Text
Skip the headers when editing a csv file using Python
“OSError: [Errno 1] Operation not permitted” when installing Scrapy in OSX 10.11 (El Capitan) (System Integrity Protection)
In Python, why is list[] automatically global?
Improve Row Append Performance On Pandas DataFrames
Spell Checker for Python
Scrape Dynamic contents created by Javascript using Python
TypeError: cannot perform reduce with flexible type
Understanding Pickling in Python
How can I read pdf in python? [duplicate]
How to download a full webpage with a Python script?
How to avoid “WindowsError: [Error 5] Access is denied”
How to save python screen output to a text file
Python: How to resize an image using PIL module
Python not able to open file with non-english characters in path
BeautifulSoup returns None even though the element exists
numpy.sum() giving strange results on large arrays
How to fix TypeError: ‘int’ object is not subscriptable
Elegant way to perform tuple arithmetic
‘is’ operator behaves unexpectedly with floats
Extract content within a tag with BeautifulSoup
Searching in Google with Python

More Related Contents:

Leave a Comment Cancel reply