Equivalent to InnerHTML when using lxml.html to parse HTML

Sorry for bringing this up again, but I’ve been looking for a solution and yours contains a bug:

<body>This text is ignored
<h1>Title</h1><p>Some text</p></body>

Text directly under the root element is ignored. I ended up doing this:

(body.text or '') +\
''.join([html.tostring(child) for child in body.iterchildren()])

Leave a Comment