lxml - w3toppers.com

src/lxml/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory

Install libxslt-devel & libxml2-devel using sudo apt-get install libxml2-dev libxslt1-dev After installing follow the above one

Why does this xpath fail using lxml in python?

1. Browsers frequently change the HTML Browsers quite frequently change the HTML served to it to make it “valid”. For example, if you serve a browser this invalid HTML: <table> <p>bad paragraph</p> <tr><td>Note that cells and rows can be unclosed (and valid) in HTML </table> To render it, the browser is helpful and tries to … Read more

Extracting lxml xpath for html table

You are probably looking at the HTML in Firebug, correct? The browser will insert the implicit tag <tbody> when it is not present in the document. The lxml library will only process the tags present in the raw HTML string. Omit the tbody level in your XPath. For example, this works: tree = lxml.html.fromstring(raw_html) tree.xpath(‘//table[@class=”quotes”]/tr’) … Read more

Equivalent to InnerHTML when using lxml.html to parse HTML

Sorry for bringing this up again, but I’ve been looking for a solution and yours contains a bug: <body>This text is ignored <h1>Title</h1><p>Some text</p></body> Text directly under the root element is ignored. I ended up doing this: (body.text or ”) +\ ”.join([html.tostring(child) for child in body.iterchildren()])

How to install lxml on Windows

Or you can also go to Christoph’s Gohlke’s Python page and then download the right lxml file. (Generally since I use python 3.4 and I have a windows, I download the lxml-3.4.4-cp34-none-win32.whl) Go to the folder it is in. Click in the background (so nothing is selected), then leftshift + rightclick at the same time … Read more

How do you install lxml on OS X Leopard without using MacPorts or Fink?

Thanks to @jessenoller on Twitter I have an answer that fits my needs – you can compile lxml with static dependencies, hence avoiding messing with the libxml2 that ships with OS X. Here’s what worked for me: cd /tmp curl -O http://lxml.de/files/lxml-3.6.0.tgz tar -xzvf lxml-3.6.0.tgz cd lxml-3.6.0 python setup.py build –static-deps –libxml2-version=2.7.3 –libxslt-version=1.1.24 sudo python … Read more

How to Pretty Print HTML to a file, with indentation

I ended up using BeautifulSoup directly. That is something lxml.html.soupparser uses for parsing HTML. BeautifulSoup has a prettify method that does exactly what it says it does. It prettifies the HTML with proper indents and everything. BeautifulSoup will NOT fix the HTML, so broken code, remains broken. But in this case, since the code is … Read more

Installing lxml module in python

Just do: sudo apt-get install python-lxml For Python 2 (e.g., required by Inkscape): sudo apt-get install python2-lxml If you are planning to install from source, then albertov’s answer will help. But unless there is a reason, don’t, just install it from the repository.

How to include the namespaces into a xml file using lxml?

Here is how it can be done: from lxml import etree attr_qname = etree.QName(“http://www.w3.org/2001/XMLSchema-instance”, “schemaLocation”) nsmap = {None: “http://www.xxxx”, “stm”: “http://xxxx/1/0/0”, “xsi”: “http://www.w3.org/2001/XMLSchema-instance”} root = etree.Element(“route”, {attr_qname: “http://xxxx/1/0/0 stm_extensions.xsd”}, version=”1.1″, nsmap=nsmap) print etree.tostring(root) Output from this code (line breaks have been added for readability): <route xmlns:stm=”http://xxxx/1/0/0″ xmlns=”http://www.xxxx” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://xxxx/1/0/0 stm_extensions.xsd” version=”1.1″/> The main “trick” is … Read more

python – lxml: enforcing a specific order for attributes

It looks like lxml serializes attributes in the order you set them: >>> from lxml import etree as ET >>> x = ET.Element(“x”) >>> x.set(‘a’, ‘1’) >>> x.set(‘b’, ‘2’) >>> ET.tostring(x) ‘<x a=”1″ b=”2″/>’ >>> y= ET.Element(“y”) >>> y.set(‘b’, ‘2’) >>> y.set(‘a’, ‘1’) >>> ET.tostring(y) ‘<y b=”2″ a=”1″/>’ Note that when you pass attributes using … Read more