AUTOMATIC DISAMBIGUATION OF CHINESE MODAL EXPRESSIONS - A SUPERVISED MACHINE LEARNING EXPERIMENT
Creator
Chi, Ting
Advisor
Davis, Anthony R.
Portner, Paul H.
Abstract
This thesis reports an annotation on Chinese modal expressions in Chinese Treebank (CHTB) 4.0, with eleven attributes that may affect the reading of modal expressions. The annotated data provide distributional information about modality types and attributes of Chinese modal expressions, signaling terms that determine modality types, and training data for the modality type disambiguation. With the annotated data, this thesis presents a supervised machine learning experiment on the modality type disambiguation of Chinese modal expressions. This disambiguation is based on Priority and Non-Priority classification, using three algorithms: Naive Bayes, maximum entropy, and decision tree, and features extracted from surrounding words as well as annotated data. The results show that maximum entropy has the best performance among the three algorithms. In addition, among the features that are used to train the classifiers, features extracted from annotated data achieve the highest accuracy in predicting the modality type of Chinese modal expressions, which is 0.9383.
Description
M.S.
Permanent Link
http://hdl.handle.net/10822/558392Date Published
2013Subject
Type
Publisher
Georgetown University
Extent
88 leaves
Collections
Metadata
Show full item recordRelated items
Showing items related by title, author, creator and subject.
-
Automatic presentation of sense-specific lexical information in an intelligent learning system
Eom, Soojeong (Georgetown University, 2012)Learning vocabulary and understanding texts present difficulty for language learners due to, among other things, the high degree of lexical ambiguity. By developing an intelligent tutoring system, this dissertation examines ...