Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you’ll need to use one or the other and often there isn’t much of a choice unless you’re heading a project.
Other than NLTK
(www.nltk.org), there are actually other libraries for text processing in python
:
- TextBlob: http://textblob.readthedocs.org/en/dev/
- Gensim: http://radimrehurek.com/gensim/
- Pattern: http://www.clips.ua.ac.be/pattern
- Spacy:: http://spacy.io
- Orange: http://orange.biolab.si/features/
- Pineapple: https://github.com/proycon/pynlpl
(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)
For Java
, there’re tonnes of others but here’s another list:
- Freeling: http://nlp.lsi.upc.edu/freeling/
- OpenNLP: http://opennlp.apache.org/
- LingPipe: http://alias-i.com/lingpipe/
- Stanford CoreNLP: http://stanfordnlp.github.io/CoreNLP/ (comes with wrappers for other languages, python included)
- CogComp NLP: https://github.com/CogComp/cogcomp-nlp
This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html
A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4
If you’re uncertain, which is the language to go for NLP, personally i say, “any language that will give you the desired analysis/output”, see Which language or tools to learn for natural language processing?
Here’s a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp
An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp
Other than language processing tools, you would very much need machine learning
tools to incorporate into NLP
pipelines.
There’s a whole range in Python
and Java
, and once again it’s up to preference and whether the libraries are user-friendly enough:
Machine Learning libraries in python:
- Sklearn (Scikit-learn): http://scikit-learn.org/stable/
- Milk: http://luispedro.org/software/milk
- Scipy: http://www.scipy.org/
- Theano: http://deeplearning.net/software/theano/
- PyML: http://pyml.sourceforge.net/
- pyBrain: http://pybrain.org/
- Graphlab Create (Commerical tool but free academic license for 1 year): https://dato.com/products/create/
(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)
- Weka: http://www.cs.waikato.ac.nz/ml/weka/index.html
- Mallet: http://mallet.cs.umass.edu/
- Mahout: https://mahout.apache.org/
With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
I’ll avoid listing deep learning tools out of non-favoritism / neutrality.
Other Stackoverflow questions that also asked for NLP/ML tools:
- Machine Learning and Natural Language Processing
- What are good starting points for someone interested in natural language processing?
- Natural language processing
- Natural Language Processing in Java (NLP)
- Is there a good natural language processing library
- Simple Natural Language Processing Startup for Java
- What libraries offer basic or advanced NLP methods?
- Latest good languages and books for Natural Language Processing, the basics
- (For NER) Entity Extraction/Recognition with free tools while feeding Lucene Index
- (With PHP) NLP programming tools using PHP?
- (With Ruby) https://stackoverflow.com/questions/3776361/ruby-nlp-libraries