Georgetown University LogoGeorgetown University Library LogoDigitalGeorgetown Home
    • Login
    View Item 
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Computer Science
    • Graduate Theses and Dissertations - Computer Science
    • View Item
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Computer Science
    • Graduate Theses and Dissertations - Computer Science
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Topic Flow Model: A Graph Theoretic Temporal Topic Model for Noisy Mediums

    Cover for Topic Flow Model: A Graph Theoretic Temporal Topic Model for Noisy Mediums
    View/Open
    View/Open: Churchill_georgetown_0076M_13764.pdf (2.8MB) Bookview

    Creator
    Churchill, Robert
    Advisor
    Singh, Lisa O
    Abstract
    In the modern era, data is being created faster than ever. Social media, in par-
     
    ticular, churns out hundreds of millions of short documents a day. It would be useful
     
    to understand the underlying topics being discussed on popular channels of social
     
    media, and how those discussions evolve over time. There exist state of the art topic
     
    models that accurately classify texts large and small, but few attempt to follow topics
     
    through time, and many are adversely affected by the large amount of noise in social
     
    media documents. We propose Topic Flow Model (TFM), a graph theoretic temporal
     
    topic model that identifies topics as they emerge, and tracks them through time as
     
    they persist, diminish, and re-emerge. TFM identifies topic words by capturing the
     
    changing relationship strength of words over time, and offers solutions for dealing with
     
    flood words, i.e., domain specific words that pollute topics. We conduct an extensive
     
    empirical analysis of TFM on Twitter data, newspaper articles, and synthetic data
     
    and find that the topic accuracy and signal to noise ratio are better than state of the
     
    art methods.
     
    Description
    M.S.
    Permanent Link
    http://hdl.handle.net/10822/1044619
    Date Published
    2017
    Subject
    Data Mining; Graph Mining; Machine Learning; Natural Language Processing; Text Mining; Topic Modeling; Computer science; Computer science;
    Type
    thesis
    Publisher
    Georgetown University
    Extent
    97 leaves
    Collections
    • Graduate Theses and Dissertations - Computer Science
    Metadata
    Show full item record

    Related items

    Showing items related by title, author, creator and subject.

    • Cover for Modernizing Topic Models: Accounting for Noise, Time, and Domain Knowledge

      Modernizing Topic Models: Accounting for Noise, Time, and Domain Knowledge 

      Churchill, Robert J (Georgetown University, 2021)
      Data has evolved rapidly since the inception of topic models over twenty years ago.The most popular topic models perform poorly on large contemporary data sets that contain short, noisy texts. This dissertation aims to ...
    Related Items in Google Scholar

    Georgetown University Seal
    ©2009 - 2022 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility
     

     

    Browse

    All of DigitalGeorgetownCommunities & CollectionsCreatorsTitlesBy Creation DateThis CollectionCreatorsTitlesBy Creation Date

    My Account

    Login

    Statistics

    View Usage Statistics

    Georgetown University Seal
    ©2009 - 2022 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility