nlp - w3toppers.com

Anyone know of some good Word Sense Disambiguation software? [closed]

My list are not exhaustive but surely Googling for more will be better for your purposes. For softwares here’s a short list, remember to CITE the relevant sources!!! GWSD: Unsupervised Graph-based Word Sense Disambiguation http://lit.csci.unt.edu/~rada/downloads/GWSD/GWSD.1.0.tar.gz SenseLearner: All-Words Word Sense Disambiguation Tool http://lit.csci.unt.edu/~rada/downloads/senselearner/SenseLearner2.0.tar.gz KYOTO UKB graph-based WSD http://ixa2.si.ehu.es/ukb/ pyWSD: Python Implementation of Simple WSD algorithms https://github.com/alvations/pywsd … Read more

TreeTagger installation successful but cannot open .par file

I think there are two problems: first, the scripts should have “-utf8” in their name, e.g. cmd/tagger-chunker-german-utf8, because you downloaded the UTF-8 data. Second, tagging and chunking requires a data file each. See the homepage which has a section “Parameter files for PC” and “Chunker parameter files for PC” – download the files from both … Read more

Creating a custom categorized corpus in NLTK and Python

Here is the answer to my question. Since I was thinking about using two cases I think it’s good to cover both in case someone needs the answer in the future. If you have the same setup as the movie_review corpus – several folders labeled in the same way you would like your labels to … Read more

Language recognition in Java [closed]

See what you think of the version in Apache Tika. This assumes that you want to find out what language text is in, as opposed to wanting to build a parser for a programming language.

Natural Language date and time parser for java [closed]

Natty is a really good replacement for JChronic.

Saving nltk drawn parse tree to image file

Using the nltk.draw.tree.TreeView object to create the canvas frame automatically: >>> from nltk.tree import Tree >>> from nltk.draw.tree import TreeView >>> t = Tree.fromstring(‘(S (NP this tree) (VP (V is) (AdjP pretty)))’) >>> TreeView(t)._cframe.print_to_file(‘output.ps’) Then: >>> import os >>> os.system(‘convert output.ps output.png’) [output.png]:

How do I open multiple instances of Visual Studio Code?

Ctrl + Shift + N will open a new window, while Ctrl+K then releases the keys, and pressing O would open the current tab in a new window. You can then use menu File → Open Folder to have two instances of Visual Studio Code with different folders in each window. ⌘ + Shift + … Read more

LDA model generates different topics everytime i train on the same corpus

Why does the same LDA parameters and corpus generate different topics everytime? Because LDA uses randomness in both training and inference steps. And how do i stabilize the topic generation? By resetting the numpy.random seed to the same value every time a model is trained or inference is performed, with numpy.random.seed: SOME_FIXED_SEED = 42 # … Read more

Java API for plural forms of English words

Check Evo Inflector which implements English pluralization algorithm based on Damian Conway paper “An Algorithmic Approach to English Pluralization“. The library is tested against data from Wiktionary and reports 100% success rate for 1000 most used English words and 70% success rate for all the words listed in Wiktionary. If you want even more accuracy … Read more

Difference between Python’s collections.Counter and nltk.probability.FreqDist

nltk.probability.FreqDist is a subclass of collections.Counter. From the docs: A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency … Read more