Georgetown University LogoGeorgetown University Library LogoDigitalGeorgetown Home
    • Login
    View Item 
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Linguistics
    • Graduate Theses and Dissertations - Linguistics
    • View Item
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Linguistics
    • Graduate Theses and Dissertations - Linguistics
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A Multifactorial, Multitask Approach to Automated Speaker Profiling

    Cover for A Multifactorial, Multitask Approach to Automated Speaker Profiling
    View/Open
    View/Open: Simpson_georgetown_0076D_14405.pdf (2.6MB) Bookview

    Creator
    Simpson, Sean Skyler
    Advisor
    Zeldes, Amir
    Nycz, Jennifer
    ORCID
    0000-0003-1285-5666
    Abstract
    Automated Speaker Profiling (ASP) refers broadly to the computational prediction of speaker traits based on cues mined from the speech signal. Accurate prediction of such traits can have a wide variety of applications such as automating the collection of customer metadata, improving smart-speaker/voice-assistant interactions, narrowing down suspect pools in forensic situations, etc.
     
    Approaches to ASP to date have primarily focused on single-task computational models– i.e. models which each predict one speaker trait in isolation. Recent work however has suggested that using a multi-task learning framework, in which a system
     
    learns to predict multiple related traits simultaneously, each trait-prediction task having access to the training signals of all other trait-prediction tasks, can increase classification accuracy along all trait axes considered.
     
    Likewise, most work on ASP to date has focused primarily on acoustic cues as predictive features for speaker profiling. However, there is a wide range of evidence from the sociolinguistic literature that lexical and phonological cues may also be of use in predicting social characteristics of a given speaker. Recent work in the field of author profiling has also demonstrated the utility of lexical features in predicting social information about authors of textual data, though few studies have investigated whether this carries over to spoken data.
     
    In this dissertation I focus on prediction of five different social traits: sex, ethnicity, age, region, and education. Linguistic features from the acoustic, phonetic, and lexical realms are extracted from 60 second chunks of speech taken from the 2008 NIST SRE corpus and used to train several types of predictive models. Naive (majority class prediction) and informed (single-task neural network) models are trained to provide baseline predictions against which multi-task neural network models are evaluated. Feature importance experiments are performed in order to investigate which features and feature types are most useful for predicting which social traits.
     
    Results presented in chapters 5-7 of this dissertation demonstrate that multitask models consistently outperform single-task models, that models are most accurate when provided information from all three linguistic levels considered, and that lexical features as a group contribute substantially more predictive power than either phonetic or acoustic features.
     
    Description
    Ph.D.
    Permanent Link
    http://hdl.handle.net/10822/1057312
    Date Published
    2019
    Subject
    Automated Speaker Profiling; Computational linguistics; Machine learning; Multi-task learning; Sociolinguistics; Speaker classification; Linguistics; Linguistics;
    Type
    thesis
    Publisher
    Georgetown University
    Extent
    300 leaves
    Collections
    • Graduate Theses and Dissertations - Linguistics
    Metadata
    Show full item record

    Related items

    Showing items related by title, author, creator and subject.

    • Cover for Reflections on International Relations Theory and its Relevance to the Twenty-First Century: The Need to Incorporate a Complex Approach

      Reflections on International Relations Theory and its Relevance to the Twenty-First Century: The Need to Incorporate a Complex Approach 

      Farrell, Sean (Georgetown University, 2011)
      This paper focuses on the problem that security studies has not adequately incorporated complex analytical methods. Current analytical methodologies used in social sciences adhere to reductionist approaches. This approach ...
    Related Items in Google Scholar

    Georgetown University Seal
    ©2009 - 2023 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility
     

     

    Browse

    All of DigitalGeorgetownCommunities & CollectionsCreatorsTitlesBy Creation DateThis CollectionCreatorsTitlesBy Creation Date

    My Account

    Login

    Statistics

    View Usage Statistics

    Georgetown University Seal
    ©2009 - 2023 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility