Is there a built in package to parse html into dom?

I would recommend lxml. I like BeautifulSoup, but there are maintenance issues generally and compatibility issues with the later releases. I’ve been happy using lxml.


Later: the best recommendations are to use lxml, html5lib, or BeautifulSoup 3.0.8. BeautifulSoup 3.1.x is meant for python 3.x and is known to have problems with earlier python versions, as noted on the BeautifulSoup website.

Ian Bicking has a good article on using lxml.

ElementTree is a further recommendation, but I have never used it.


2012-01-18: someone has come by and decided to downvote me and Bartosz because we recommended python packages that are easily obtained but not part of the python distribution. So for the highly literal StackOverflowers: “You can use xml.dom.minidom, but no one will recommend this over the alternatives.”

Leave a Comment