Java or Python for Natural Language Processing [closed]

Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you’ll need to use one or the other and often there isn’t much of a choice unless you’re heading a project.

Other than NLTK (www.nltk.org), there are actually other libraries for text processing in python:

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)

For Java, there’re tonnes of others but here’s another list:

This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

If you’re uncertain, which is the language to go for NLP, personally i say, “any language that will give you the desired analysis/output”, see Which language or tools to learn for natural language processing?

Here’s a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp

An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp


Other than language processing tools, you would very much need machine learning tools to incorporate into NLP pipelines.

There’s a whole range in Python and Java, and once again it’s up to preference and whether the libraries are user-friendly enough:

Machine Learning libraries in python:

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)


With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

I’ll avoid listing deep learning tools out of non-favoritism / neutrality.


Other Stackoverflow questions that also asked for NLP/ML tools:

Leave a Comment