Extracting lxml xpath for html table

You are probably looking at the HTML in Firebug, correct? The browser will insert the implicit tag <tbody> when it is not present in the document. The lxml library will only process the tags present in the raw HTML string. Omit the tbody level in your XPath. For example, this works: tree = lxml.html.fromstring(raw_html) tree.xpath(‘//table[@class=”quotes”]/tr’) … Read more

How do you install lxml on OS X Leopard without using MacPorts or Fink?

Thanks to @jessenoller on Twitter I have an answer that fits my needs – you can compile lxml with static dependencies, hence avoiding messing with the libxml2 that ships with OS X. Here’s what worked for me: cd /tmp curl -O http://lxml.de/files/lxml-3.6.0.tgz tar -xzvf lxml-3.6.0.tgz cd lxml-3.6.0 python setup.py build –static-deps –libxml2-version=2.7.3 –libxslt-version=1.1.24 sudo python … Read more

How to include the namespaces into a xml file using lxml?

Here is how it can be done: from lxml import etree attr_qname = etree.QName(“http://www.w3.org/2001/XMLSchema-instance”, “schemaLocation”) nsmap = {None: “http://www.xxxx”, “stm”: “http://xxxx/1/0/0”, “xsi”: “http://www.w3.org/2001/XMLSchema-instance”} root = etree.Element(“route”, {attr_qname: “http://xxxx/1/0/0 stm_extensions.xsd”}, version=”1.1″, nsmap=nsmap) print etree.tostring(root) Output from this code (line breaks have been added for readability): <route xmlns:stm=”http://xxxx/1/0/0″ xmlns=”http://www.xxxx” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://xxxx/1/0/0 stm_extensions.xsd” version=”1.1″/> The main “trick” is … Read more

python – lxml: enforcing a specific order for attributes

It looks like lxml serializes attributes in the order you set them: >>> from lxml import etree as ET >>> x = ET.Element(“x”) >>> x.set(‘a’, ‘1’) >>> x.set(‘b’, ‘2’) >>> ET.tostring(x) ‘<x a=”1″ b=”2″/>’ >>> y= ET.Element(“y”) >>> y.set(‘b’, ‘2’) >>> y.set(‘a’, ‘1’) >>> ET.tostring(y) ‘<y b=”2″ a=”1″/>’ Note that when you pass attributes using … Read more