Questions tagged [topic-model]

A topic model describes text from a large corpus as a probability distribution over topics which are probability distributions over words. There are quantified contributions from all topics to a specific text.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)
  • Non-Negative Matrix Factorisation

Software / Libraries

151 questions
2
votes
1 answer

Do weights of keywords for each topic add up to 1 in topic modeling?

When you run a topic modeling (say LDA), you can get outputs for some number of topics with corresponding keywords and their weights. Based on my understanding, people usually output top 10 or top 20 keywords for each topic. For these keywords, they…
Todd
  • 123
  • 3
2
votes
1 answer

How to build News Tagging model(s)

I am trying to build a news tagging system. Given a piece of news article, find 5-6 key terms from the news article that best describe the article. Refer to the image below from google news. What are some approaches I can look at to get human…
Anuj Gupta
  • 266
  • 1
  • 10
1
vote
0 answers

automated topic modeling topic naming

Are there well-known automated methods for deriving a name for each topic obtained through topic modeling? for a specifically given problem at hand I will probably default to an algorithm on top an ontology, on top the topic modeling results. But…
matanox
  • 131
  • 4
0
votes
1 answer

Is it correct to create topic models using both train and test data?

I have a dataset of text documents splitted into train and test sets. My task is a binary classification, classifying these documents to either 1 or -1. I have already computed some features using TF-IDF and n-grams and tested my model. Now, I want…
Pedram
  • 133
  • 5
0
votes
0 answers

How to recreate a WE1S project?

The WE1S (WhatEvery1Says) project is so resourceful and well documented that I really want to use it. Unfortunately, I still don't know where is the repo that the Workspace documentation referring to? I search for the new_project.ipynb file in the…
Ooker
  • 123
  • 6
0
votes
0 answers

What to do when there is a jargon that is the same with a common word?

Let's say in a particular field, the word the has a specific meaning and not just be a determination. The common the one and the specific the one are use mixedly in the corpus. Is there a way to handle this? Or manually tagging the specific ones…
Ooker
  • 123
  • 6