Why use an HTML5 semantic tag instead of div? [duplicate]

The Oxford Dictionary states:

semantics: the branch of linguistics and logic concerned with meaning.

As their name says, these tags are meant to improve the meaning of your web page. Good semantics plays an important role the automated processing of documents. This automated processing happens more often than you realize – each website ranking from search engines is derived from automated processing of all the website out there.

If you visit a (well designed) web page, you as the human reader can immediately (visually) distinguish all the page elements and more importantly understand the content. In the top left you see the company logo, next to it is the site navigation, there is a search bar and some text about the company, a link to a product you can buy and a legal disclaimer at the bottom.

However, machines are dumb and cannot do this:
Looking at the same page as you, all the web crawler would see is an image, a list of anchors tags, a text node, an input field and an image with a link on it. At the bottom there is another text node.
Now, how should they know, what part of the document you intended to be the navigation or the main article, or some not-so-important footnote? They can guess by analyzing your document structure using some common criteria which are a hint for a specific element.
E.g. an ul list of internal links is most likely some kind of page navigation and the text at the end of the document is something necessary but not so important to the everyday viewer (the legal disclaimer).

Now imagine instead of a plain div, a nav element would be used – the machine immediately knows what the purpose of this element is:

// machine: okay, this structure looks like it might be a navigation element?
<div><ul><li><a href="internal_link">...</div>

// machine: ah, a navigation element!
<nav><ul><li><a>...</nav>

Now the text inside a main tag – this is clearly the most important information of the page! Over there to the left, that text node, the image and the anchor node all belong together, because they are grouped inside a section tag, and down at the bottom there is some text inside a footer element (they still don’t know the meaning of that text, but now they can deduce it’s some sort of fine print).

Example:
You, as the user (reading a page without seeing the actual markup), don’t care if an element is enclosed in an <i> or <em> tag. In most browsers both of these tags will be rendered identically – as italic text – and as long as it stands out between the surrounding text it serves its purpose.

However, there is a big difference in terms of semantics:
<i> means italic – it’s simply a presentational hint for the browser on how to render it (italic) and does not necessarily contain deeper semantic information.
<em> means emphasize – it indicates an important piece of information. Now the browser is not bound to the italic instruction any more, but could render it in italic or bold or underlined or in a different color… For visually impaired persons, the screen readers can raise the voice – whatever method seems most suited in a specific situation to emphasise this important information.

Final thought:
Semantic tags are not the end. There are things like metadata, ontologies, resource description languages which go a step further and help connect data between different web pages and can even help create new knowledge!

E.g. wikipedia is doing a really bad job at semantically presenting data.

https://en.wikipedia.org/wiki/Barack_Obama
https://en.wikipedia.org/wiki/Donald_Trump
https://en.wikipedia.org/wiki/Joe_Biden

All three are persons who at some point in time where president of the USA.

All three articles contain a sidebar that displays these information, and you can compare them (by opening both pages and then switching back and forth), but they are not semantically described.
Instead, if wikipedia used an ontology to describe a person: http://dbpedia.org/ontology/Person

<!-- President is a subclass of Politician which is a subclass of Person -->
<President> 
    <birthname>Barrack Hussein Obama II</birthname>
    <birthdate>1961-08-04</birthdate>
    <headOf>country::USA</headOf>
    <tenure>2009-01-20 – 2017-01-20</tenure>
</President>

Not only could you (and machines) now directly compare those three directly (on a dynamically generated page!), but you could even create new knowledge, e.g. show a list of all presidents of the United States – quite boring but also cool stuff like who are all the current world leaders, how many female world leaders do we have, who is the youngest leader, how many types of leaders are there (presidents/emperors/queens/dictators), who served the longest, how many of them are taller than 175cm and have brown eyes, etc. etc.

In conclusion, good semantics is super cool (but also – on a technical level – hard to achieve and maintain).

Leave a Comment