XML Attributes vs Elements [duplicate]

There is an article titled “Principles of XML design: When to use elements versus attributes” on IBM’s website.

Though there doesn’t appear to be many hard and fast rules, there are some good guidelines mentioned in the posting. For instance, one of the recommendations is to use elements when your data must not be normalized for white space as XML processors can normalize data within an attribute thereby modifying the raw text.

I find myself referring to this article from time to time as I develop various XML structures. Hopefully this will be helpful to others as well.

edit – From the site:

Principle of core content

If you consider the information in question to be part of the essential material that is being expressed or communicated in the XML, put it in an element. For human-readable documents this generally means the core content that is being communicated to the reader. For machine-oriented records formats this generally means the data that comes directly from the problem domain. If you consider the information to be peripheral or incidental to the main communication, or purely intended to help applications process the main communication, use attributes. This avoids cluttering up the core content with auxiliary material. For machine-oriented records formats, this generally means application-specific notations on the main data from the problem-domain.

As an example, I have seen many XML formats, usually home-grown in businesses, where document titles were placed in an attribute. I think a title is such a fundamental part of the communication of a document that it should always be in element content. On the other hand, I have often seen cases where internal product identifiers were thrown as elements into descriptive records of the product. In some of these cases, attributes were more appropriate because the specific internal product code would not be of primary interest to most readers or processors of the document, especially when the ID was of a very long or inscrutable format.

You might have heard the principle data goes in elements, metadata in attributes. The above two paragraphs really express the same principle, but in more deliberate and less fuzzy language.

Principle of structured information

If the information is expressed in a structured form, especially if the structure may be extensible, use elements. On the other hand: If the information is expressed as an atomic token, use attributes. Elements are the extensible engine for expressing structure in XML. Almost all XML processing tools are designed around this fact, and if you break down structured information properly into elements, you’ll find that your processing tools complement your design, and that you thereby gain productivity and maintainability. Attributes are designed for expressing simple properties of the information represented in an element. If you work against the basic architecture of XML by shoehorning structured information into attributes you may gain some specious terseness and convenience, but you will probably pay in maintenance costs.

Dates are a good example: A date has fixed structure and generally acts as a single token, so it makes sense as an attribute (preferably expressed in ISO-8601). Representing personal names, on the other hand, is a case where I’ve seen this principle surprise designers. I see names in attributes a lot, but I have always argued that personal names should be in element content. A personal name has surprisingly variable structure (in some cultures you can cause confusion or offense by omitting honorifics or assuming an order of parts of names). A personal name is also rarely an atomic token. As an example, sometimes you may want to search or sort by a forename and sometimes by a surname. I should point out that it is just as problematic to shoehorn a full name into the content of a single element as it is to put it in an attribute.

Leave a Comment