Extracting text from HTML file using Python

The best piece of code I found for extracting text without getting javascript or not wanted things : from urllib.request import urlopen from bs4 import BeautifulSoup url = “http://news.bbc.co.uk/2/hi/health/2284783.stm” html = urlopen(url).read() soup = BeautifulSoup(html, features=”html.parser”) # kill all script and style elements for script in soup([“script”, “style”]): script.extract() # rip it out # get … Read more