To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:

$ python3
>>> import nltk
>>> nltk.download('punkt')

If you’re unsure of which data/model you need, you can start out with the basic list of data + models with:

>>> import nltk
>>> nltk.download('popular')

It will download a list of “popular” resources.

Ensure that you’ve the latest version of NLTK because it’s always improving and constantly maintain:

$ pip install --upgrade nltk

EDITED

In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')

And if anyone wants to find nltk_data directory, see https://stackoverflow.com/a/36383314/610569

And to config nltk_data path, see https://stackoverflow.com/a/22987374/610569

downloading error using nltk.download()

EDITED

Leave a Comment Cancel reply

EDITED

More Related Contents:

Leave a Comment Cancel reply