0

Are there good datasets or even pretrained models for classifying chat lines? I'd like to infer the topic people are talking about and possibly topic changes.

One of the problems with data that is available for training is very specific bias. There are, for example, large sets of chatlogs of tech support channels, but they have only a few topics and all of them are technical. In general seems it hard to find on which data to train a network that generalizes for different kind of channels with a more or sometimes less fixed topic without training on that.

If you look into non-technical topics of chat rooms you have other focuses, flirt chats are common, but there are also chat rooms which focus for example on the related forum for aquarium owners. Now the aquarium owners probably talk about the very basic topic like weather and, of course, aquariums.

Also data sets like Tweets are probably a bad approximation as most Tweets are self-contained and many chat lines only make sense in the context of a few lines before.

allo
  • 310
  • 1
  • 9

0 Answers0