How do you implement a “Did you mean”? [duplicate]

Actually what Google does is very much non-trivial and also at first counter-intuitive. They don’t do anything like check against a dictionary, but rather they make use of statistics to identify “similar” queries that returned more results than your query, the exact algorithm is of course not known. There are different sub-problems to solve here, … Read more

Stemmers vs Lemmatizers

Q1: “[..] are English stemmers any useful at all today? Since we have a plethora of lemmatization tools for English” Yes. Stemmers are much simpler, smaller and usually faster than lemmatizers, and for many applications their results are good enough. Using a lemmatizer for that is a waste of resources. Consider, for example, dimensionality reduction … Read more

Detecting syllables in a word

Read about the TeX approach to this problem for the purposes of hyphenation. Especially see Frank Liang’s thesis dissertation Word Hy-phen-a-tion by Com-put-er. His algorithm is very accurate, and then includes a small exceptions dictionary for cases where the algorithm does not work.

What are some simple NLP projects that a CS undergrad can try implementing? [closed]

There are plenty of them. Here is a list of different NLP problems: spam detection text genre categorization (news, fiction, science paper) finding similar texts (for example search for similar articles) find something about author (genre, native-speaker/non-native-speaker) create automatic grader for student’s work check text for plagiarism create an application that looks for grammatical errors … Read more