Quantitative Authorship Attribution of Users of Mexican Drug Dealing Related Online Forums
Rico Sulayes, Antonio
As the violence in the Mexican drug war escalates, a proliferation of social media sites about drug trafficking in Mexico was followed by the murder of some of their users, and the eventual disappearance of many of those sites. Despite these events, there still exist a number of drug-dealing related social media outlets in this country with a large number of contributions. In this dissertation, I show that quantitative authorship attribution techniques, including state of the art machine learning algorithms, can be successfully applied to match posts of unknown authorship with their authors. Employing data from randomly selected prolific users of a drug-dealing related online forum, in this research project I test a number of quantitative classification techniques in over a thousand authorship attribution tasks. These tasks attempt to recognize the author of texts, which are chosen to represent anonymous texts, within a closed set of known authors. In the best results rendered in all these experiments, which include corpora with up to 40 potential authors, the accuracy obtained is higher than for previous research using data from drug dealing related online forums and employing discriminant analysis (DA), the first method ever applied to this kind of data (Rico-Sulayes, 2011). These results are obtained with the statistically relevant contribution of a number of novel discriminating features. Examining the features used in the experiments with the best results, these features (tagged with a fully automated system) represent paralinguistic and grammatically shallow information, and yet they seem to capture stylistic decisions or habits that permit a quantitative approach to obtain a success rate greater than previous research with DA in authorship attribution. By both offering an analysis of the kind of information that renders the best results in the experiments conducted and improving the success rate of previous research, this dissertation should help further the application and acceptance of authorship attribution research in real-life contexts, such as criminal investigation and legal prosecution.
Showing items related by title, author, creator and subject.