Mining Linguistic Tone Patterns Using Fundamental Frequency Time-Series Data
With the rapid advancement in computing powers, recent years have seen the availability of large scale corpora of speech audio data, and within it, fundamental frequency (f0) time-series data of speech prosody. However, the wealth of this f0 data is yet to be mined for knowledge that has many potential theoretical implications and practical applications in prosody-related tasks. Due to the nature of speech prosody data, Speech Prosody Mining (SPM) in a large prosody corpus faces classic time-series data mining challenges such as high dimensionality and high time complexity in distance computation (e.g., Dynamic Time Warping). Meanwhile, the analysis and understanding of speech prosody subsequence patterns demand novel analytical methods that leverage a variety of algorithms and data structures in the computational linguistics and computer science toolkits, prompting us to develop creative solutions in order to extract meaning in large prosody databases.In this dissertation, we conceptualize SPM in a time-series data mining framework by focusing on a specific task in speech prosody: the analysis and machine learning of Mandarin tones. The dissertation is divided into five parts, each further divided into several chapters. In Part I, we review the necessary background and previous works related to the production, perception, and modeling of Mandarin tones. In Part II, we report the data collection used in this work, and we describe the speech processing and data preprocessing steps in detail.Part III and IV comprise the core segments of the dissertation, where we develop novel methods for mining tone N-gram data. In Part III, we investigate the use of time-series symbolic representation for computing time-series similarity in the speech prosody domain. In Part IV, we first show how to improve a state-of-the-art motif discovery algorithm to produce more meaningful rankings in the retrieval of previously unknown tone N-gram patterns. In the next chapter, we investigate the most exciting problem at the heart of tone modeling: how well can we predict the tone Ngram contour shape types in spontaneous speech by using a variety of features from various linguistic domains, such as syntax, morphology, discourse, and phonology? The results shed light on the nature of how these factors contribute to the realization of speech prosody in tone production from an information theoretic perspective. In the final part, we describe applications of these methods, including generalization to other tone languages and developing softwares for the retrieval and analysis of speech prosody. Finally, we discuss the extension of the current work to a general framework of corpus-based large-scale intonation analysis based on the research derived from this dissertation.
Showing items related by title, author, creator and subject.