lxml - w3toppers.com

lxml: add namespace to input file

Modifying the namespace mapping of a node is not possible in lxml. See this open ticket that has this feature as a wishlist item. It originated from this thread on the lxml mailing list, where a workaround replacing the root node is given as an alternative. There are some issues with replacing the root node … Read more

Parsing broken XML with lxml.etree.iterparse

Edit: This is an older answer and I would have done it differently today. And I’m not just referring to the dumb snark … since then BeutifulSoup4 is available and it’s really quite nice. I recommend that to anyone who stumbles over here. The currently accepted answer is, well, not what one should do. The … Read more

Beautiful Soup and Table Scraping – lxml vs html parser

Short answer. If you already installed lxml, just use it. html.parser – BeautifulSoup(markup, “html.parser”) Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.) Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2) lxml – BeautifulSoup(markup, “lxml”) Advantages: Very fast, Lenient Disadvantages: External C dependency html5lib – BeautifulSoup(markup, “html5lib”) Advantages: Extremely lenient, Parses … Read more

Why is lxml.etree.iterparse() eating up all my memory?

As iterparse iterates over the entire file a tree is built and no elements are freed. The advantage of doing this is that the elements remember who their parent is, and you can form XPaths that refer to ancestor elements. The disadvantage is that it can consume a lot of memory. In order to free … Read more

get errors when import lxml.etree to python

I had the same problem. If you have installed it with pip as follows: pip install lxml Instead, try to use STATIC_DEPS=true pip install lxml This solved the problem for me. Found at this website

Installing lxml for Python 3.4 on Windows x 86 (32 bit) with Visual Studio C++ 2010 Express

I also got this problem, but the workarounds provided above are not work for me as well. Here is my system configuration: Win7 64bit python3.3 visual studio 2013 I tried to use the method in the first link in the Related questions, but it’s fail. This method is to create a system variable for vs2010 … Read more

Why doesn’t xpath work when processing an XHTML document with lxml (in python)?

The problem is the namespaces. When parsed as XML, the img tag is in the http://www.w3.org/1999/xhtml namespace since that is the default namespace for the element. You are asking for the img tag in no namespace. Try this: >>> tree.getroot().xpath( … “//xhtml:img”, … namespaces={‘xhtml’:’http://www.w3.org/1999/xhtml’} … ) [<Element {http://www.w3.org/1999/xhtml}img at 11a29e0>]

Installing lxml, libxml2, libxslt on Windows 8.1

I was able to fix the installation with the following steps. I hope others find this helpful. My installation of “pip” was working fine before the problem. I went to the Windows command line and made sure that “wheel” was installed. C:\Python34>python -m pip install wheel Requirement already satisfied (use –upgrade to upgrade): wheel in … Read more

Pretty print in lxml is failing when I add tags to a parsed tree

It has to do with how lxml treats whitespace — see the lxml FAQ for details. To fix this, change the loading part of the file to the following: parser = etree.XMLParser(remove_blank_text=True) root = etree.parse(‘file.xml’, parser).getroot() I didn’t test it, but it should indent your file just fine with this change.

Python: Using xpath locally / on a specific element

Your xpath starts with a slash (/) and is therefore absolute. Add a dot (.) in front to make it relative to the current element i.e. links = table.xpath(“.//a[contains(@href, ‘http://www.example.com/filter/’)]”)