Extract content of with BeautifulSoup

From the documentation:

As of Beautiful Soup version 4.9.0, when lxml or html.parser are in use, the contents of <script>, <style>, and <template> tags are not considered to be ‘text’, since those tags are not part of the human-visible content of the page.

So basically the accepted answer from falsetru above is all good, but use .string instead of .text with newer versions of Beautiful Soup, or you’ll be puzzled as I was by .text always returning None for <script> tags.

Leave a Comment