Obtain attribute names from XML using VBA

Displaying XML structures including attributes via recursive function calls

My example code demonstrates a way to

  • [1] assign the entire XML structure to a 2-dim array using XMLDOM methods and
  • [2] optionally write it back to a sheet.

Amplifying hints:

I added these ► structured hints to offer more help than by displaying code only, as I remarked that many of these points lead to repeated questions by other users, too:

  • Trying to list XML structures you lose good view with increasing hierarchy depth of your node elements (type constant 1 NODE_ELEMENT), so I urgently recommend the use of ► recursive calls as used in this example code.
  • Furthermore you might have not considered the special construction of node text (type constant 3 NODE_TEXT) being the first child of a name giving parent element – c.f. sections A. and B. in main function listChildNodes. Your loops through child nodes would not distinguish between the mentioned types. Just study the comments in the cited function for details.
  • I suppose your XML file starts with a needed processing instruction like e.g. <?xml version="1.0" encoding="utf-8"?>, so that it can be actually identified as XML file.
  • The calling procedure DisplayXML() uses late binding instead of early bound reference to MS XML similar to your post, but uses the recommended MSXML2 version 6.0. It calls the main function via its DocumentElement <Elements> (BTW a single node element) and a second argument referring to a predefined 2-dim array v.
  • Versioning: If you would set your XMLFILE object to memory with Set XDoc = CreateObject("MSXML2.DOMDocument") generally you are getting the older default Version (3.0), so in most cases it’s preferrable to use explicitly Set XDoc = CreateObject("MSXML2.DOMDocument.6.0") instead (including XPath automatically).
  • If you don’t use the Load function to get a True (file loaded successfully) or False (load error) back, it is not necessary to set the file name into brackets ().
  • The XPath operator // in search strings would return any occurences at any level (c.f. XMLFile.SelectNodes("//Elements") in your OP).
  • Consider also the use of XSLT, a special-purpose language designed to tranform XML files into all kind of end-use formats.

Calling procedure DisplayXML

Hint: It would sufficient to dimension the array ‘s row count only with an estimated number of items in the calling procedure (e.g. 1000), as the main function executes a ReDim (including a double transposition) automatically if needed. Nevertheless I added the exact items count here from the start via XPath/XMLDOM expression XMLFile.SelectNodes("//*").Length counting any item in the entire file.

Option Explicit          ' declaration head of your code module

Sub DisplayXML()
Dim XMLFile As Object
Dim XMLFileName As String
'Set XMLFile = CreateObject("Microsoft.XMLDOM")   ' former style not recommended
Set XMLFile = CreateObject("MSXML2.DOMDocument.6.0")

XMLFileName = "C:\Users\Input.xml"                             ' << change to your xml file name
XMLFile.Async = False
XMLFile.ValidateOnParse = False
Debug.Print XMLFile.XML

If XMLFile.Load(XMLFileName) Then
' [1] write xml info to array with exact or assumed items count
  Dim v As Variant: ReDim v(1 To XMLFile.SelectNodes("//*").Length, 1 To 2)
  listChildNodes XMLFile.DocumentElement, v                 ' call helper function

' [2] write results to sheet "Dump"                         ' change to your sheet name
  With ThisWorkbook.Worksheets("Dump")
       .Range("A:B") = ""                                   ' clear result range
       .Range("A1:B1") = Split("XML Tag,Node Value", ",")   ' titles
       .Range("A2").Resize(UBound(v), UBound(v, 2)) = v     ' get  2-dim info array
  End With
Else
       MsgBox "Load Error " & XMLFileName
End If
Set XMLFile = Nothing
End Sub

Structured results displayed in sheet

Hint: If you don’t want the level indentation or enumerated Level hierarchy, you can easily adapt the main function listChildNodes() below.

+----+---------------------+-----------------+
|    |         A           |       B         |
+----+---------------------+-----------------+
|1   | XML Tag             | Node Value      |
+----+---------------------+-----------------+
|2   | 0 Elements          |                 |
+----+---------------------+-----------------+
|3   |   1 Details         |                 |
+----+---------------------+-----------------+
|4   |     2 Name          | ABC             |
+----+---------------------+-----------------+
|5   |     2 Address       | 123ABC          |
+----+---------------------+-----------------+
|6   |     2 College       |                 |
+----+---------------------+-----------------+
|7   |       3 collname    | safasf          |
+----+---------------------+-----------------+
|8   |       3 collnumber  |                 |
+----+---------------------+-----------------+
|9   |   1 Dept[@num="123"]|                 |
+----+---------------------+-----------------+
|10  |     2 Deptname      | IT              |
+----+---------------------+-----------------+
|11  |     2 ID            | A123            |
+----+---------------------+-----------------+

It is also possible to refer to a precise node element, e.g. via

listChildNodes XMLFile.DocumentElement.SelectSingleNode("Dept[@num=""123""]"),v, 1, 1       ' starting from item no 1 and Level no 1

This would list the indicated node set alone:

+----+---------------------+-----------------+
|    |         A           |       B         |
+----+---------------------+-----------------+
|1   | XML Tag             | Node Value      |
+----+---------------------+-----------------+
|2   |   1 Dept[@num="123"]|                 |
+----+---------------------+-----------------+
|3   |     2 Deptname      | IT              |
+----+---------------------+-----------------+
|4   |     2 ID            | A123            |
+----+---------------------+-----------------+

Recursive main function listChildNodes()

Looping through childnode collections this function calls itself (i.e. the current node object) repeatedly (“recursively”) and assigns the entire XML structure to a given 2-dim array (2nd argument). Furthermore it allows indendation and indicates the hierarchy levels. Note that the array in this example has to be 1-based.

Edit 20/8 2018 includes an automatic increase of array size if the items counter i exceeds the current array’s upper boundary (UBound(v), i.e. in its first dimension = items count). Technical note: As such a ReDim isn’t possible in a minor (here 1st) dimension, an intermediate transposition changing ‘rows’ (dim 1) to ‘columns’ (dim 2) is necessary.

Function listChildNodes(oCurrNode As Object, _
                        ByRef v As Variant, _
                        Optional ByRef i As Long = 1, _
                        Optional iLvl As Integer = 0 _
                        ) As Boolean
' Purpose: assign the complete node structure with contents to a 1-based 2-dim array
' Author:  T.M.
' Note: Late binding XML doesn't allow the use of IXMLDOMNodeType enumeration constants
'       (1 ... NODE_ELEMENT, 2 ... NODE_ATTRIBUTE, 3 ... NODE_TEXT etc.)
' Escape
  If oCurrNode Is Nothing Then Exit Function
  If i < 1 Then i = 1                                       ' one based items Counter
' Edit 20/8 2018 - Automatic increase of array size if needed 
  If i >= UBound(v) Then                                    ' change array size if needed
     Dim tmp As Variant
     tmp = Application.Transpose(v)                         ' change rows to columns
     ReDim Preserve tmp(1 To 2, 1 To UBound(v) + 1000)      ' increase row numbers
     v = Application.Transpose(tmp)                         ' transpose back
     Erase tmp
  End If
  Const NAMEColumn& = 1, VALUEColumn& = 2                   ' constants for column 1 and 2
' Declare variables
  Dim oChildNode As Object                                  ' late bound node object
  Dim bDisplay   As Boolean
' ---------------------------------------------------------------------
' A. It's nothing but a TextNode (i.e. a parent node's firstChild!)
' ---------------------------------------------------------------------
If (oCurrNode.NodeType = 3) Then                                 ' 3 ... NODE_TEXT
  ' display pure text content (NODE_TEXT) of parent elements
    v(i, VALUEColumn) = oCurrNode.Text                           ' nodeValue of text node
  ' return
    listChildNodes = True
ElseIf oCurrNode.NodeType = 1 Then                                ' 1 ... NODE_ELEMENT
   ' --------------------------------------------------------------
   ' B.1 NODE_ELEMENT WITHOUT text node immediately below,
   '     a) e.g. <Details> followed by node element <NAME>,
   '        (i.e. FirstChild.NodeType must not be of type NODE_TEXT = 3)
   '     b) e.g. <College> node element without any child node
   '     Note: a text content (NODE_TEXT) actually is a child node(!) to an element node
   '           (see section A. getting the FirstChild of a NODE_ELEMENT)
   ' --------------------------------------------------------------
   ' a) display parent elements of other element nodes
     If oCurrNode.HasChildNodes Then
         If Not oCurrNode.FirstChild.NodeType = 3 Then             ' <>3 ... not a NODE_TEXT
            bDisplay = True
         End If
   ' b) always display empty node elements
     Else                                                           ' empty NODE_ELEMENT
            bDisplay = True
     End If
     If bDisplay Then
            v(i, NAMEColumn) = String(iLvl * 2, " ") & _
                               iLvl & " " & _
                               oCurrNode.nodename & getAtts(oCurrNode)
            i = i + 1
     End If

   ' --------------------------------------------------------------
   ' B.2 check child nodes
   ' --------------------------------------------------------------
     For Each oChildNode In oCurrNode.ChildNodes
      ' ~~~~~~~~~~~~~~~~~
      ' recursive call <<
      ' ~~~~~~~~~~~~~~~~~
        bDisplay = listChildNodes(oChildNode, v, i, iLvl + 1)

        If bDisplay Then
            v(i, NAMEColumn) = String(iLvl * 2, " ") & _
                               iLvl & " " & _
                               oCurrNode.nodename & getAtts(oCurrNode)
            i = i + 1
        End If
     Next oChildNode
   ' return
     listChildNodes = False

Else    ' just to demonstrate the use of other xml types as e.g. <!-- comments -->
     If oCurrNode.NodeType = 8 Then   ' 8 ... NODE_COMMENT
        v(i, VALUEColumn) = "<!-- " & oCurrNode.NodeValue & "-->"
        i = i + 1
     End If
   ' return
     listChildNodes = False
End If

End Function

Helper function getAtts()

This helper function called by the above function returns a string enumerating all attribute names and values of a given node similar to XPath notation; the code can be easily adapted to your needs.

Function getAtts(ByRef node As Object) As String
' Purpose: return attribute(s) string in brackets, e.g. '[@num="123"]'
' Note:    called by above function listChildNodes()
' Author:  T.M.
  Dim sAtts$, ii&
  If node.Attributes.Length > 0 Then
      ii = 0: sAtts = ""
      For ii = 0 To node.Attributes.Length - 1
        sAtts = sAtts & "[@" & node.Attributes.Item(ii).nodename & "=""" & node.Attributes.Item(ii).NodeValue & """]"
      Next ii
  End If
' return
  getAtts = sAtts
End Function

Leave a Comment