What is NLTK POS tagger asking me to download?

From NLTK versions higher than v3.2, please use: >>> import nltk >>> nltk.__version__ ‘3.2.1’ >>> nltk.download(‘averaged_perceptron_tagger’) [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /home/alvas/nltk_data… [nltk_data] Package averaged_perceptron_tagger is already up-to-date! True For NLTK versions using the old MaxEnt model, i.e. v3.1 and below, please use: >>> import nltk >>> nltk.download(‘maxent_treebank_pos_tagger’) [nltk_data] Downloading package maxent_treebank_pos_tagger to [nltk_data] … Read more

NLTK WordNet Lemmatizer: Shouldn’t it lemmatize all inflections of a word?

The WordNet lemmatizer does take the POS tag into account, but it doesn’t magically determine it: >>> nltk.stem.WordNetLemmatizer().lemmatize(‘loving’) ‘loving’ >>> nltk.stem.WordNetLemmatizer().lemmatize(‘loving’, ‘v’) u’love’ Without a POS tag, it assumes everything you feed it is a noun. So here it thinks you’re passing it the noun “loving” (as in “sweet loving”).

What is the difference between lemmatization vs stemming?

Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope … Read more

NLTK v3.2: Unable to nltk.pos_tag()

EDITED This issue has been resolved from NLTK v3.2.1. Upgrading your NLTK version would resolve the issue, e.g. pip install -U nltk. I faced the same issue and the error encountered was as follows; Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “C:\Python27\lib\site-packages\nltk-3.2-py2.7.egg\nltk\tag\__init__.py”, line 110, in pos_tag tagger = PerceptronTagger() File … Read more

Using my own corpus instead of movie_reviews corpus for Classification in NLTK

If you have you data in exactly the same structure as the movie_review corpus in NLTK, there are two ways to “hack” your way through: 1. Put your corpus directory into where you save the nltk.data First check where is your nltk.data saved: >>> import nltk >>> nltk.data.find(‘corpora/movie_reviews’) FileSystemPathPointer(u’/home/alvas/nltk_data/corpora/movie_reviews’) Then move your directory to where … Read more

Saving nltk drawn parse tree to image file

Using the nltk.draw.tree.TreeView object to create the canvas frame automatically: >>> from nltk.tree import Tree >>> from nltk.draw.tree import TreeView >>> t = Tree.fromstring(‘(S (NP this tree) (VP (V is) (AdjP pretty)))’) >>> TreeView(t)._cframe.print_to_file(‘output.ps’) Then: >>> import os >>> os.system(‘convert output.ps output.png’) [output.png]: