How does Apple find dates, times and addresses in emails?

They likely use Information Extraction techniques for this.

Here is a demo of Stanford’s SUTime tool:

http://nlp.stanford.edu:8080/sutime/process

You would extract attributes about n-grams (consecutive words) in a document:

  • numberOfLetters
  • numberOfSymbols
  • length
  • previousWord
  • nextWord
  • nextWordNumberOfSymbols

And then use a classification algorithm, and feed it positive and negative examples:

Observation  nLetters  nSymbols  length  prevWord  nextWord isPartOfDate  
"Feb."       3         1         4       "Wed"     "29th"   TRUE  
"DEC"        3         0         3       "company" "went"   FALSE  
...

You might get away with 50 examples of each, but the more the merrier. Then, the algorithm learns based on those examples, and can apply to future examples that it hasn’t seen before.

It might learn rules such as

  • if previous word is only characters and maybe periods…
  • and current word is in “february”, “mar.”, “the” …
  • and next word is in “twelfth”, any_number …
  • then is date

Here is a decent video by a Google engineer on the subject

Leave a Comment