Python regular expression for HTML parsing

For this particular case, BeautifulSoup is harder to write than a regex, but it is much more robust… I’m just contributing with the BeautifulSoup example, given that you already know which regexp to use 🙂

from BeautifulSoup import BeautifulSoup

#Or retrieve it from the web, etc. 
html_data = open('/yourwebsite/page.html','r').read()

#Create the soup object from the HTML data
soup = BeautifulSoup(html_data)
fooId = soup.find('input',name="fooId",type="hidden") #Find the proper tag
value = fooId.attrs[2][1] #The value of the third attribute of the desired tag 
                          #or index it directly via fooId['value']

More Related Contents:

Leave a Comment Cancel reply