libxml install error using pip

** make sure the development packages of libxml2 and libxslt are installed ** From the lxml documentation, assuming you are running a Debian-based distribution : sudo apt-get install libxml2-dev libxslt-dev python-dev For Debian based systems, it should be enough to install the known build dependencies of python-lxml or python3-lxml, e.g. sudo apt-get install build-dep python3-lxml

How to install lxml on Ubuntu

Since you’re on Ubuntu, don’t bother with those source packages. Just install those development packages using apt-get. apt-get install libxml2-dev libxslt1-dev python-dev If you’re happy with a possibly older version of lxml altogether though, you could try apt-get install python-lxml and be done with it. 🙂

Using Python Iterparse For Large XML Files

Try Liza Daly’s fast_iter. After processing an element, elem, it calls elem.clear() to remove descendants and also removes preceding siblings. def fast_iter(context, func, *args, **kwargs): “”” http://lxml.de/parsing.html#modifying-the-tree Based on Liza Daly’s fast_iter http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ See also http://effbot.org/zone/element-iterparse.htm “”” for event, elem in context: func(elem, *args, **kwargs) # It’s safe to call clear() here because no descendants … Read more