nltk NaiveBayesClassifier training for sentiment analysis

You need to change your data structure. Here is your train list as it currently stands: >>> train = [(‘I love this sandwich.’, ‘pos’), (‘This is an amazing place!’, ‘pos’), (‘I feel very good about these beers.’, ‘pos’), (‘This is my best work.’, ‘pos’), (“What an awesome view”, ‘pos’), (‘I do not like this restaurant’, … Read more

Python: tf-idf-cosine: to find document similarity

First off, if you want to extract count features and apply TF-IDF normalization and row-wise euclidean normalization you can do it in one operation with TfidfVectorizer: >>> from sklearn.feature_extraction.text import TfidfVectorizer >>> from sklearn.datasets import fetch_20newsgroups >>> twenty = fetch_20newsgroups() >>> tfidf = TfidfVectorizer().fit_transform(twenty.data) >>> tfidf <11314×130088 sparse matrix of type ‘<type ‘numpy.float64′>’ with 1787553 … Read more

wordnet lemmatization and pos tagging in python

First of all, you can use nltk.pos_tag() directly without training it. The function will load a pretrained tagger from a file. You can see the file name with nltk.tag._POS_TAGGER: nltk.tag._POS_TAGGER >>> ‘taggers/maxent_treebank_pos_tagger/english.pickle’ As it was trained with the Treebank corpus, it also uses the Treebank tag set. The following function would map the treebank tags … Read more

Convert words between verb/noun/adjective forms

This is more a heuristic approach. I have just coded it so appologies for the style. It uses the derivationally_related_forms() from wordnet. I have implemented nounify. I guess verbify works analogous. From what I’ve tested works pretty well: from nltk.corpus import wordnet as wn def nounify(verb_word): “”” Transform a verb to the closest noun: die … Read more

Classification using movie review corpus in NLTK/Python

Yes, the tutorial on chapter 6 is aim for a basic knowledge for students and from there, the students should build on it by exploring what’s available in NLTK and what’s not. So let’s go through the problems one at a time. Firstly, the way to get ‘pos”https://stackoverflow.com/”neg’ documents through the directory is most probably … Read more

pip issue installing almost any library

I found it sufficient to specify the pypi host as trusted. Example: pip install –trusted-host pypi.python.org pytest-xdist pip install –trusted-host pypi.python.org –upgrade pip This solved the following error: Could not fetch URL https://pypi.python.org/simple/pytest-cov/: There was a problem confirming the ssl certificate: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600) – skipping Could not find a version that … Read more

Why is my NLTK function slow when processing the DataFrame?

Your original nlkt() loops through each row 3 times. def nlkt(val): val=repr(val) clean_txt = [word for word in val.split() if word.lower() not in stopwords.words(‘english’)] nopunc = [char for char in str(clean_txt) if char not in string.punctuation] nonum = [char for char in nopunc if not char.isdigit()] words_string = ”.join(nonum) return words_string Also, each time you’re … Read more