What is a good Java library for Parts-Of-Speech tagging? [closed]

Are you looking to tag POS in a specific domain? Most of the general purpose taggers are trained on newswire text. Typically they don’t perform well when you are using them in specific domains (such and biomedical text). There are other taggers specifically trained for such domains such as dTagger (java) for biomedical text.

For newswire text, Adwait Ratnaparkhi’s MXPOST is very good and is the one I would recommend.

Other Java implementations include:

  1. MontyLingua
  2. Berkeley Parser (Not really a POS tagger but all full blown parsers will typically include POS taggers. Google for Java syntactic parsers and you will find many.)
  3. QTag
  4. LBJ

OpenNLP and Lingpipe as posted by the other posters are also pretty decent.

Info on the state-of-the-art on POS tagging can be found here. As you can see LTAG-Spinal (also mentioned by another poster) ranks best as of now, but the variation across the various taggers is not much. I have not used LTAG myself.

Also note that the baseline performance for POS tagging is about 90%. Baseline means – (a) tag every word by most frequent POS tag from a lexicon, and (b) tag every unknown word as a noun.

Leave a Comment