Topic distribution: How do we see which document belong to which topic after doing LDA in python

Using the probabilities of the topics, you can try to set some threshold and use it as a clustering baseline, but i am sure there are better ways to do clustering than this ‘hacky’ method. from gensim import corpora, models, similarities from itertools import chain “”” DEMO “”” documents = [“Human machine interface for lab … Read more

Convert word2vec bin file to text

I use this code to load binary model, then save the model to text file, from gensim.models.keyedvectors import KeyedVectors model = KeyedVectors.load_word2vec_format(‘path/to/GoogleNews-vectors-negative300.bin’, binary=True) model.save_word2vec_format(‘path/to/GoogleNews-vectors-negative300.txt’, binary=False) References: API and nullege. Note: Above code is for new version of gensim. For previous version, I used this code: from gensim.models import word2vec model = word2vec.Word2Vec.load_word2vec_format(‘path/to/GoogleNews-vectors-negative300.bin’, binary=True) model.save_word2vec_format(‘path/to/GoogleNews-vectors-negative300.txt’, binary=False)

LDA model generates different topics everytime i train on the same corpus

Why does the same LDA parameters and corpus generate different topics everytime? Because LDA uses randomness in both training and inference steps. And how do i stabilize the topic generation? By resetting the numpy.random seed to the same value every time a model is trained or inference is performed, with numpy.random.seed: SOME_FIXED_SEED = 42 # … Read more

Using Gensim Fasttext model with LSTM nn in keras

here the procedure to incorporate the fasttext model inside an LSTM Keras network # define dummy data and precproces them docs = [‘Well done’, ‘Good work’, ‘Great effort’, ‘nice work’, ‘Excellent’, ‘Weak’, ‘Poor effort’, ‘not good’, ‘poor work’, ‘Could have done better’] docs = [d.lower().split() for d in docs] # train fasttext from gensim api … Read more

How to calculate the sentence similarity using word2vec model of gensim with python

This is actually a pretty challenging problem that you are asking. Computing sentence similarity requires building a grammatical model of the sentence, understanding equivalent structures (e.g. “he walked to the store yesterday” and “yesterday, he walked to the store”), finding similarity not just in the pronouns and verbs but also in the proper nouns, finding … Read more

My Doc2Vec code, after many loops/epochs of training, isn’t giving good results. What might be wrong?

Do not call .train() multiple times in your own loop that tries to do alpha arithmetic. It’s unnecessary, and it’s error-prone. Specifically, in the above code, decrementing the original 0.025 alpha by 0.001 forty times results in (0.025 – 40*0.001) -0.015 final alpha, which would also have been negative for many of the training epochs. … Read more