Georgetown University LogoGeorgetown University Library LogoDigitalGeorgetown Home
    • Login
    View Item 
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Linguistics
    • Graduate Theses and Dissertations - Linguistics
    • View Item
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Linguistics
    • Graduate Theses and Dissertations - Linguistics
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Quantitative Authorship Attribution of Users of Mexican Drug Dealing Related Online Forums

    Cover for Quantitative Authorship Attribution of Users of Mexican Drug Dealing Related Online Forums
    View/Open
    View/Open: RicoSulayes_georgetown_0076D_11981.pdf (1.5MB) Bookview

    Creator
    Rico Sulayes, Antonio
    Advisor
    Schilling, Natalie
    Abstract
    As the violence in the Mexican drug war escalates, a proliferation of social media sites about drug trafficking in Mexico was followed by the murder of some of their users, and the eventual disappearance of many of those sites. Despite these events, there still exist a number of drug-dealing related social media outlets in this country with a large number of contributions. In this dissertation, I show that quantitative authorship attribution techniques, including state of the art machine learning algorithms, can be successfully applied to match posts of unknown authorship with their authors. Employing data from randomly selected prolific users of a drug-dealing related online forum, in this research project I test a number of quantitative classification techniques in over a thousand authorship attribution tasks. These tasks attempt to recognize the author of texts, which are chosen to represent anonymous texts, within a closed set of known authors. In the best results rendered in all these experiments, which include corpora with up to 40 potential authors, the accuracy obtained is higher than for previous research using data from drug dealing related online forums and employing discriminant analysis (DA), the first method ever applied to this kind of data (Rico-Sulayes, 2011). These results are obtained with the statistically relevant contribution of a number of novel discriminating features. Examining the features used in the experiments with the best results, these features (tagged with a fully automated system) represent paralinguistic and grammatically shallow information, and yet they seem to capture stylistic decisions or habits that permit a quantitative approach to obtain a success rate greater than previous research with DA in authorship attribution. By both offering an analysis of the kind of information that renders the best results in the experiments conducted and improving the success rate of previous research, this dissertation should help further the application and acceptance of authorship attribution research in real-life contexts, such as criminal investigation and legal prosecution.
    Description
    Ph.D.
    Permanent Link
    http://hdl.handle.net/10822/557726
    Date Published
    2012
    Subject
    authorship attribution; machine learning classification; social media; Linguistics; Sociolinguistics; Information technology; Linguistics; Sociolinguistics; Information technology;
    Type
    thesis
    Publisher
    Georgetown University
    Extent
    294 leaves
    Collections
    • Graduate Theses and Dissertations - Linguistics
    Metadata
    Show full item record

    Related items

    Showing items related by title, author, creator and subject.

    • Thumbnail

      HIV Risk-Related Sex Behaviors Among Injection Drug Users, Crack Smokers, and Injection Drug Users Who Smoke Crack 

      Booth, Robert E.; Watters, John K.; Chitwood, Dale D. (1993-08)
    Related Items in Google Scholar

    Georgetown University Seal
    ©2009 - 2022 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility
     

     

    Browse

    All of DigitalGeorgetownCommunities & CollectionsCreatorsTitlesBy Creation DateThis CollectionCreatorsTitlesBy Creation Date

    My Account

    Login

    Statistics

    View Usage Statistics

    Georgetown University Seal
    ©2009 - 2022 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility