src/lxml/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory
Install libxslt-devel & libxml2-devel using sudo apt-get install libxml2-dev libxslt1-dev After installing follow the above one
Install libxslt-devel & libxml2-devel using sudo apt-get install libxml2-dev libxslt1-dev After installing follow the above one
1. Browsers frequently change the HTML Browsers quite frequently change the HTML served to it to make it “valid”. For example, if you serve a browser this invalid HTML: <table> <p>bad paragraph</p> <tr><td>Note that cells and rows can be unclosed (and valid) in HTML </table> To render it, the browser is helpful and tries to … Read more
You are probably looking at the HTML in Firebug, correct? The browser will insert the implicit tag <tbody> when it is not present in the document. The lxml library will only process the tags present in the raw HTML string. Omit the tbody level in your XPath. For example, this works: tree = lxml.html.fromstring(raw_html) tree.xpath(‘//table[@class=”quotes”]/tr’) … Read more
Sorry for bringing this up again, but I’ve been looking for a solution and yours contains a bug: <body>This text is ignored <h1>Title</h1><p>Some text</p></body> Text directly under the root element is ignored. I ended up doing this: (body.text or ”) +\ ”.join([html.tostring(child) for child in body.iterchildren()])
Or you can also go to Christoph’s Gohlke’s Python page and then download the right lxml file. (Generally since I use python 3.4 and I have a windows, I download the lxml-3.4.4-cp34-none-win32.whl) Go to the folder it is in. Click in the background (so nothing is selected), then leftshift + rightclick at the same time … Read more
Thanks to @jessenoller on Twitter I have an answer that fits my needs – you can compile lxml with static dependencies, hence avoiding messing with the libxml2 that ships with OS X. Here’s what worked for me: cd /tmp curl -O http://lxml.de/files/lxml-3.6.0.tgz tar -xzvf lxml-3.6.0.tgz cd lxml-3.6.0 python setup.py build –static-deps –libxml2-version=2.7.3 –libxslt-version=1.1.24 sudo python … Read more
I ended up using BeautifulSoup directly. That is something lxml.html.soupparser uses for parsing HTML. BeautifulSoup has a prettify method that does exactly what it says it does. It prettifies the HTML with proper indents and everything. BeautifulSoup will NOT fix the HTML, so broken code, remains broken. But in this case, since the code is … Read more
Just do: sudo apt-get install python-lxml For Python 2 (e.g., required by Inkscape): sudo apt-get install python2-lxml If you are planning to install from source, then albertov’s answer will help. But unless there is a reason, don’t, just install it from the repository.
Here is how it can be done: from lxml import etree attr_qname = etree.QName(“http://www.w3.org/2001/XMLSchema-instance”, “schemaLocation”) nsmap = {None: “http://www.xxxx”, “stm”: “http://xxxx/1/0/0”, “xsi”: “http://www.w3.org/2001/XMLSchema-instance”} root = etree.Element(“route”, {attr_qname: “http://xxxx/1/0/0 stm_extensions.xsd”}, version=”1.1″, nsmap=nsmap) print etree.tostring(root) Output from this code (line breaks have been added for readability): <route xmlns:stm=”http://xxxx/1/0/0″ xmlns=”http://www.xxxx” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://xxxx/1/0/0 stm_extensions.xsd” version=”1.1″/> The main “trick” is … Read more
It looks like lxml serializes attributes in the order you set them: >>> from lxml import etree as ET >>> x = ET.Element(“x”) >>> x.set(‘a’, ‘1’) >>> x.set(‘b’, ‘2’) >>> ET.tostring(x) ‘<x a=”1″ b=”2″/>’ >>> y= ET.Element(“y”) >>> y.set(‘b’, ‘2’) >>> y.set(‘a’, ‘1’) >>> ET.tostring(y) ‘<y b=”2″ a=”1″/>’ Note that when you pass attributes using … Read more