Can Schema.org Expected Types be collections, too?

Every Schema.org property can have multiple values. It doesn’t necessarily make sense for some properties (e.g., birthDate), but it’s still allowed. In JSON-LD: “sameAs”: [“https://stackoverflow.com/foo”, “/bar”], In Microdata: <link itemprop=”sameAs” href=”https://stackoverflow.com/foo” /> <link itemprop=”sameAs” href=”http://stackoverflow.com/bar” /> In RDFa: <link property=”sameAs” href=”https://stackoverflow.com/foo” /> <link property=”sameAs” href=”http://stackoverflow.com/bar” /> This doesn’t necessarily mean that Google (or any other … Read more

Why does Google Testing Tool use the “id” attribute to generate a URL for the Microdata item?

This is strange. It’s definitely not conforming to the Microdata Note. Apart from Microdata’s itemref attribute, HTML5’s id attribute has no special meaning in Microdata. If Google wants to use the id value anyway, they should at least generate the URL with a fragment identifier, i.e., http://www.example.com/#foobar. My guess is that they are (probably unintentionally) … Read more

Schema.org NewsArticle: invalid value for logo property

Your markup is valid HTML5+Microdata and you are using the Schema.org vocabulary appropriately. With “validator”, you probably refer to Google’s Structured Data Testing Tool. Note that errors shown in this tool don’t necessarily mean that your markup is wrong; they often mean that you won’t get a certain Google search result feature unless you provide … Read more

How to extract information from a Wikipedia infobox?

The wrong way: trying to parse HTML Use (cURL/jQuery/file_get_contents/requests/wget/more jQuery) to fetch the HTML article code of the article, then use a DOM parser to extract table.infobox tr[3] td / use a regex. This is actually a really bad idea most of the time. Wikipedia’s HTML code is not particularly parsing-friendly (especially infoboxes which are … Read more