How does Apple find dates, times and addresses in emails?

They likely use Information Extraction techniques for this. Here is a demo of Stanford’s SUTime tool: http://nlp.stanford.edu:8080/sutime/process You would extract attributes about n-grams (consecutive words) in a document: numberOfLetters numberOfSymbols length previousWord nextWord nextWordNumberOfSymbols … And then use a classification algorithm, and feed it positive and negative examples: Observation nLetters nSymbols length prevWord nextWord isPartOfDate … Read more

Extract list of Persons and Organizations using Stanford NER Tagger in NLTK

Thanks to the link discovered by @Vaulstein, it is clear that the trained Stanford tagger, as distributed (at least in 2012) does not chunk named entities. From the accepted answer: Many NER systems use more complex labels such as IOB labels, where codes like B-PERS indicates where a person entity starts. The CRFClassifier class and … Read more

NLTK Named Entity recognition to a Python list

nltk.ne_chunk returns a nested nltk.tree.Tree object so you would have to traverse the Tree object to get to the NEs. Take a look at Named Entity Recognition with Regular Expression: NLTK >>> from nltk import ne_chunk, pos_tag, word_tokenize >>> from nltk.tree import Tree >>> >>> def get_continuous_chunks(text): … chunked = ne_chunk(pos_tag(word_tokenize(text))) … continuous_chunk = [] … Read more