Georgetown University LogoGeorgetown University Library LogoDigitalGeorgetown Home
    • Login
    View Item 
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Linguistics
    • Graduate Theses and Dissertations - Linguistics
    • View Item
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Linguistics
    • Graduate Theses and Dissertations - Linguistics
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Mining Linguistic Tone Patterns Using Fundamental Frequency Time-Series Data

    Cover for Mining Linguistic Tone Patterns Using Fundamental Frequency Time-Series Data
    View/Open
    View/Open: Zhang_georgetown_0076D_13831.pdf (3.8MB) Bookview

    Creator
    Zhang, Shuo
    Advisor
    Zeldes, Amir
    Zsiga, Elizabeth
    Abstract
    With the rapid advancement in computing powers, recent years have seen the availability of large scale corpora of speech audio data, and within it, fundamental frequency (f0) time-series data of speech prosody. However, the wealth of this f0 data is yet to be mined for knowledge that has many potential theoretical implications and practical applications in prosody-related tasks. Due to the nature of speech prosody data, Speech Prosody Mining (SPM) in a large prosody corpus faces classic time-series data mining challenges such as high dimensionality and high time complexity in distance computation (e.g., Dynamic Time Warping). Meanwhile, the analysis and understanding of speech prosody subsequence patterns demand novel analytical methods that leverage a variety of algorithms and data structures in the computational linguistics and computer science toolkits, prompting us to develop creative solutions in order to extract meaning in large prosody databases.
     
    In this dissertation, we conceptualize SPM in a time-series data mining framework by focusing on a specific task in speech prosody: the analysis and machine learning of Mandarin tones. The dissertation is divided into five parts, each further divided into several chapters. In Part I, we review the necessary background and previous works related to the production, perception, and modeling of Mandarin tones. In Part II, we report the data collection used in this work, and we describe the speech processing and data preprocessing steps in detail.
     
    Part III and IV comprise the core segments of the dissertation, where we develop novel methods for mining tone N-gram data. In Part III, we investigate the use of time-series symbolic representation for computing time-series similarity in the speech prosody domain. In Part IV, we first show how to improve a state-of-the-art motif discovery algorithm to produce more meaningful rankings in the retrieval of previously unknown tone N-gram patterns. In the next chapter, we investigate the most exciting problem at the heart of tone modeling: how well can we predict the tone Ngram contour shape types in spontaneous speech by using a variety of features from various linguistic domains, such as syntax, morphology, discourse, and phonology? The results shed light on the nature of how these factors contribute to the realization of speech prosody in tone production from an information theoretic perspective. In the final part, we describe applications of these methods, including generalization to other tone languages and developing softwares for the retrieval and analysis of speech prosody. Finally, we discuss the extension of the current work to a general framework of corpus-based large-scale intonation analysis based on the research derived from this dissertation.
     
    Description
    Ph.D.
    Permanent Link
    http://hdl.handle.net/10822/1047816
    Date Published
    2017
    Subject
    machine learning; Mandarin tone; speech prosody; time-series data mining; Linguistics; Computer engineering; Computer science; Linguistics; Computer engineering; Computer science;
    Type
    thesis
    Publisher
    Georgetown University
    Extent
    238 leaves
    Collections
    • Graduate Theses and Dissertations - Linguistics
    Metadata
    Show full item record

    Related items

    Showing items related by title, author, creator and subject.

    • Thumbnail

      IT's Time for Empirical Research in Business Ethics 

      Hosmer, LaRue Tone (2000-01)
    Related Items in Google Scholar

    Georgetown University Seal
    ©2009 - 2022 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility
     

     

    Browse

    All of DigitalGeorgetownCommunities & CollectionsCreatorsTitlesBy Creation DateThis CollectionCreatorsTitlesBy Creation Date

    My Account

    Login

    Statistics

    View Usage Statistics

    Georgetown University Seal
    ©2009 - 2022 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility