Dealing with malformed XML [duplicate]

I know this isn’t the answer you want – but the XML spec is quite clear and strict. Malformed XML is fatal. If it doesn’t work in a validator, then your code should not even attempt to “fix” it, any more than you’d try and automatically ‘fix’ some program code. From The Anotated XML Specification: … Read more

get the namespaces from xml with python ElementTree

The code for creating a dictionary with all the declared namespaces can be made quite simple. This is all that is needed: import xml.etree.ElementTree as ET my_namespaces = dict([node for _, node in ET.iterparse(‘file.xml’, events=[‘start-ns’])]) You don’t need to use StringIO or open(). Just provide the XML filename as an argument to iterparse(). Each item … Read more

How to read plain text content with XSLT 1.0

It is not possible for the input document to be plain text because the input to an XSLT 1.0 transformation must be well-formed XML. Here are some alternative ways to access plain text in an XSLT transformation: Use unparsed-text in XSLT 2.0. Pass the plain text in via top-level parameters (xsl:param). Preprocess the text file … Read more

XSLT Transformation – dynamic element names

This XSL Stylesheet: <xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:output indent=”yes”/> <xsl:strip-space elements=”*”/> <xsl:template match=”node()|@*”> <xsl:copy> <xsl:apply-templates select=”node()|@*”/> </xsl:copy> </xsl:template> <xsl:template match=”Field”> <xsl:element name=”{@Name}”> <xsl:value-of select=”@Value”/> </xsl:element> </xsl:template> </xsl:stylesheet> Applied to well-formed input: <SiebelMessage MessageId=”1-18J35″ IntObjectName=”XRX R5 Letter Instance” MessageType=”Integration Object” IntObjectFormat=”Siebel Hierarchical”> <LetterInstance Id=”1-1RUYIF” Language=”ENU” TemplateType=”SA”> <Field Value=”CO Last Name” Datatype=”String” Name=”ContractingOfficerLastName”/> </LetterInstance> </SiebelMessage> Produces: <SiebelMessage MessageId=”1-18J35″ … Read more

XSLT – How to keep only wanted elements from XML

This general transformation: <xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” xmlns:ns=”some:ns”> <xsl:output omit-xml-declaration=”yes” indent=”yes”/> <xsl:strip-space elements=”*”/> <ns:WhiteList> <name>ns:currency</name> <name>ns:currency_code3</name> </ns:WhiteList> <xsl:template match=”node()|@*”> <xsl:copy> <xsl:apply-templates select=”node()|@*”/> </xsl:copy> </xsl:template> <xsl:template match= “*[not(descendant-or-self::*[name()=document(”)/*/ns:WhiteList/*])]”/> </xsl:stylesheet> when applied on the provided XML document (with namespace definition added to make it well-formed): <ns:stuff xmlns:ns=”some:ns”> <ns:things> <ns:currency>somecurrency</ns:currency> <ns:currency_code/> <ns:currency_code2/> <ns:currency_code3/> <ns:currency_code4/> </ns:things> </ns:stuff> produces the wanted … Read more