Is there a way to get a line number from an ElementTree Element

Took a while for me to work out how to do this using Python 3.x (using 3.3.2 here) so thought I would summarize: # Force python XML parser not faster C accelerators # because we can’t hook the C implementation sys.modules[‘_elementtree’] = None import xml.etree.ElementTree as ET class LineNumberingParser(ET.XMLParser): def _start_list(self, *args, **kwargs): # Here … Read more

Converting xml to dictionary using ElementTree

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON “specification”: from collections import defaultdict def etree_to_dict(t): d = {t.tag: {} if t.attrib else None} children = list(t) if children: dd = defaultdict(list) for dc in map(etree_to_dict, children): for k, v in dc.items(): dd[k].append(v) d = {t.tag: {k: v[0] if len(v) == … Read more

Faithfully Preserve Comments in Parsed XML

Tested with Python 2.7 and 3.5, the following code should work as intended. #!/usr/bin/env python # CommentedTreeBuilder.py from xml.etree import ElementTree class CommentedTreeBuilder(ElementTree.TreeBuilder): def comment(self, data): self.start(ElementTree.Comment, {}) self.data(data) self.end(ElementTree.Comment) Then, in the main code use parser = ElementTree.XMLParser(target=CommentedTreeBuilder()) as the parser instead of the current one. By the way, comments work correctly out of … Read more

lxml etree xmlparser remove unwanted namespace

import io import lxml.etree as ET content=””‘\ <Envelope xmlns=”http://www.example.com/zzz/yyy”> <Header> <Version>1</Version> </Header> <Body> some stuff </Body> </Envelope> ”’ dom = ET.parse(io.BytesIO(content)) You can find namespace-aware nodes using the xpath method: body=dom.xpath(‘//ns:Body’,namespaces={‘ns’:’http://www.example.com/zzz/yyy’}) print(body) # [<Element {http://www.example.com/zzz/yyy}Body at 90b2d4c>] If you really want to remove namespaces, you could use an XSL transformation: # http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl xslt=””‘<xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> … Read more

How to write XML declaration using xml.etree.ElementTree

I am surprised to find that there doesn’t seem to be a way with ElementTree.tostring(). You can however use ElementTree.ElementTree.write() to write your XML document to a fake file: from io import BytesIO from xml.etree import ElementTree as ET document = ET.Element(‘outer’) node = ET.SubElement(document, ‘inner’) et = ET.ElementTree(document) f = BytesIO() et.write(f, encoding=’utf-8′, xml_declaration=True) … Read more

parsing XML file gets UnicodeEncodeError (ElementTree) / ValueError (lxml)

You are using the decoded unicode value. Use r.raw raw response data instead: r = requests.get(url, params=payload, stream=True) r.raw.decode_content = True etree.parse(r.raw) which will read the data from the response directly; do note the stream=True option to .get(). Setting the r.raw.decode_content = True flag ensures that the raw socket will give you the decompressed content … Read more

ElementTree and unicode

Might you have stumbled upon this problem while using Requests (HTTP for Humans), response.text decodes the response by default, you can use response.content to get the undecoded data, so ElementTree can decode it itself. Just remember to use the correct encoding. More info: http://docs.python-requests.org/en/latest/user/quickstart/#response-content

Using XPath in ElementTree

There are 2 problems that you have. 1) element contains only the root element, not recursively the whole document. It is of type Element not ElementTree. 2) Your search string needs to use namespaces if you keep the namespace in the XML. To fix problem #1: You need to change: element = ET.parse(fp).getroot() to: element … Read more

parsing xml containing default namespace to get an element value using lxml

This is a common error when dealing with XML having default namespace. Your XML has default namespace, a namespace declared without prefix, here : <sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″> Note that not only element where default namespace declared is in that namespace, but all descendant elements inherit ancestor default namespace implicitly, unless otherwise specified (using explicit namespace prefix … Read more