Beautiful Soup and Table Scraping – lxml vs html parser

Short answer.

If you already installed lxml, just use it.


html.parserBeautifulSoup(markup, "html.parser")

  • Advantages: Batteries included, Decent speed, Lenient (as of Python
    2.7.3 and 3.2.)

  • Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2)

lxmlBeautifulSoup(markup, "lxml")

  • Advantages: Very fast, Lenient

  • Disadvantages: External C dependency

html5libBeautifulSoup(markup, "html5lib")

  • Advantages: Extremely lenient, Parses pages the same way a web browser does, Creates valid HTML5

  • Disadvantages: Very slow, External Python dependency

Leave a Comment