BeautifulSoup: get css classes from html

BeautifulSoup itself doesn’t parse CSS style declarations at all, but you can extract such sections then parse them with a dedicated CSS parser.

Depending on your needs, there are several CSS parsers available for python; I’d pick cssutils (requires python 2.5 or up (including python 3)), it is the most complete in it’s support, and supports inline styles too.

Other options are css-py and tinycss.

To grab and parse such all style sections (example with cssutils):

import cssutils
sheets = []
for styletag in tree.findAll('style', type="text/css")
    if not styletag.string: # probably an external sheet
        continue
    sheets.append(cssutils.parseStyle(styletag.string))

With cssutil you can then combine these, resolve imports, and even have it fetch external stylesheets.

Leave a Comment