Topic Flow Model: A Graph Theoretic Temporal Topic Model for Noisy Mediums
Creator
Churchill, Robert
Advisor
Singh, Lisa O
Abstract
In the modern era, data is being created faster than ever. Social media, in par-
ticular, churns out hundreds of millions of short documents a day. It would be useful
to understand the underlying topics being discussed on popular channels of social
media, and how those discussions evolve over time. There exist state of the art topic
models that accurately classify texts large and small, but few attempt to follow topics
through time, and many are adversely affected by the large amount of noise in social
media documents. We propose Topic Flow Model (TFM), a graph theoretic temporal
topic model that identifies topics as they emerge, and tracks them through time as
they persist, diminish, and re-emerge. TFM identifies topic words by capturing the
changing relationship strength of words over time, and offers solutions for dealing with
flood words, i.e., domain specific words that pollute topics. We conduct an extensive
empirical analysis of TFM on Twitter data, newspaper articles, and synthetic data
and find that the topic accuracy and signal to noise ratio are better than state of the
art methods.
Description
M.S.
Permanent Link
http://hdl.handle.net/10822/1044619Date Published
2017Subject
Type
Publisher
Georgetown University
Extent
97 leaves
Metadata
Show full item recordRelated items
Showing items related by title, author, creator and subject.
-
Modernizing Topic Models: Accounting for Noise, Time, and Domain Knowledge
Churchill, Robert J (Georgetown University, 2021)Data has evolved rapidly since the inception of topic models over twenty years ago.The most popular topic models perform poorly on large contemporary data sets that contain short, noisy texts. This dissertation aims to ...