LDA model generates different topics everytime i train on the same corpus

Why does the same LDA parameters and corpus generate different topics everytime?

Because LDA uses randomness in both training and inference steps.

And how do i stabilize the topic generation?

By resetting the numpy.random seed to the same value every time a model is trained or inference is performed, with numpy.random.seed:

SOME_FIXED_SEED = 42

# before training/inference:
np.random.seed(SOME_FIXED_SEED)

(This is ugly, and it makes Gensim results hard to reproduce; consider submitting a patch. I’ve already opened an issue.)

More Related Contents:

Topic distribution: How do we see which document belong to which topic after doing LDA in python
Extract email id from text file by showing path
How to remove English words from a file containing Dari words?
how to count average sentence length (in words) from a text file contains 100 sentences using python [closed]
feature extraction in python for nlp
Ordinal numbers replacement
How to config nltk data directory from code?
Classification using movie review corpus in NLTK/Python
Convert words between verb/noun/adjective forms
nltk NaiveBayesClassifier training for sentiment analysis
How to calculate the sentence similarity using word2vec model of gensim with python
Python – RegEx for splitting text into sentences (sentence-tokenizing) [duplicate]
Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle form?
How to get rid of punctuation using NLTK tokenizer?
How to check whether a sentence is correct (simple grammar check in Python)?
English grammar for parsing in NLTK
Fast/Optimize N-gram implementations in python
Extract Word from Synset using Wordnet in NLTK 3.0
How do I do dependency parsing in NLTK?
training data format for NLTK punkt
Efficiently count word frequencies in python
Fuzzy String Comparison
Creating a custom categorized corpus in NLTK and Python
Pandas dataframe groupby text value that occurs in two columns
How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?
Expanding English language contractions in Python
NLTK WordNet Lemmatizer: Shouldn’t it lemmatize all inflections of a word?
What is NLTK POS tagger asking me to download?
How to use malt parser in python nltk
How to tweak the NLTK sentence tokenizer

More Related Contents:

Leave a Comment Cancel reply