LDA model generates different topics everytime i train on the same corpus

Why does the same LDA parameters and corpus generate different topics everytime?

Because LDA uses randomness in both training and inference steps.

And how do i stabilize the topic generation?

By resetting the numpy.random seed to the same value every time a model is trained or inference is performed, with numpy.random.seed:

SOME_FIXED_SEED = 42

# before training/inference:
np.random.seed(SOME_FIXED_SEED)

(This is ugly, and it makes Gensim results hard to reproduce; consider submitting a patch. I’ve already opened an issue.)

Leave a Comment