Topic Flow Model: A Graph Theoretic Temporal Topic Model for Noisy Mediums
Singh, Lisa O
In the modern era, data is being created faster than ever. Social media, in par-ticular, churns out hundreds of millions of short documents a day. It would be usefulto understand the underlying topics being discussed on popular channels of socialmedia, and how those discussions evolve over time. There exist state of the art topicmodels that accurately classify texts large and small, but few attempt to follow topicsthrough time, and many are adversely affected by the large amount of noise in socialmedia documents. We propose Topic Flow Model (TFM), a graph theoretic temporaltopic model that identifies topics as they emerge, and tracks them through time asthey persist, diminish, and re-emerge. TFM identifies topic words by capturing thechanging relationship strength of words over time, and offers solutions for dealing withflood words, i.e., domain specific words that pollute topics. We conduct an extensiveempirical analysis of TFM on Twitter data, newspaper articles, and synthetic dataand find that the topic accuracy and signal to noise ratio are better than state of theart methods.
MetadataShow full item record
Showing items related by title, author, creator and subject.
Churchill, Robert J (Georgetown University, 2021)Data has evolved rapidly since the inception of topic models over twenty years ago.The most popular topic models perform poorly on large contemporary data sets that contain short, noisy texts. This dissertation aims to ...