topic-modeling - w3toppers.com

LDA with topicmodels, how can I see which topics different documents belong to?

LDA model generates different topics everytime i train on the same corpus

Why does the same LDA parameters and corpus generate different topics everytime? Because LDA uses randomness in both training and inference steps. And how do i stabilize the topic generation? By resetting the numpy.random seed to the same value every time a model is trained or inference is performed, with numpy.random.seed: SOME_FIXED_SEED = 42 # … Read more

Remove empty documents from DocumentTermMatrix in R topicmodels?

“Each row of the input matrix needs to contain at least one non-zero entry” The error means that sparse matrix contain a row without entries(words). one Idea is to compute the sum of words by row rowTotals <- apply(dtm , 1, sum) #Find the sum of words in each Document dtm.new <- dtm[rowTotals> 0, ] … Read more

Spark MLlib LDA, how to infer the topics distribution of a new unseen document?

As of Spark 1.5 this functionality has not been implemented for the DistributedLDAModel. What you’re going to need to do is convert your model to a LocalLDAModel using the toLocal method and then call the topicDistributions(documents: RDD[(Long, Vector]) method where documents are the new (i.e. out-of-training) documents, something like this: newDocuments: RDD[(Long, Vector)] = … … Read more