8

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on appearing to a topic with pprint(lda_model.print_topics()) I have results for the first topic similar to:

$0.066*\text{car} + 0.032*\text{gas} + 0.031*\text{model} + 0.031*\text{top} + 0.024*\text{CO2} \ + \ ... \ + \ 0.012*\text{investment}$

The results are good as are indicative about the topic, but when I interact with the relevance parameter ($\lambda$ - lambda value) provided by pyLDAvis, I can have results that are more specific about the topic, for example setting $\lambda=0.2$ the top 5 words are:

car, horsepower, torque, speed, V8

My question: is there any function or parameter in gensim that can return the pair probability - word given a specific lambda value?

Ethan
  • 1,633
  • 9
  • 24
  • 39

1 Answers1

0

According this SO post this is the way:

lambd = 0.6 # a specific relevance metric value

all_topics = {} num_topics = lda_model.num_topics num_terms = 10

for i in range(1,num_topics+1): ## Correct range topic = LDAvis_prepared.topic_info[LDAvis_prepared.topic_info.Category == 'Topic'+str(i)].copy() topic['relevance'] = topic['loglift'](1-lambd)+topic['logprob']lambd all_topics['Topic '+str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values pd.DataFrame(all_topics)

```

scipio1465
  • 11
  • 1
  • 3