AUTOMATIC DISAMBIGUATION OF CHINESE MODAL EXPRESSIONS - A SUPERVISED MACHINE LEARNING EXPERIMENT
Davis, Anthony R.
Portner, Paul H.
This thesis reports an annotation on Chinese modal expressions in Chinese Treebank (CHTB) 4.0, with eleven attributes that may affect the reading of modal expressions. The annotated data provide distributional information about modality types and attributes of Chinese modal expressions, signaling terms that determine modality types, and training data for the modality type disambiguation. With the annotated data, this thesis presents a supervised machine learning experiment on the modality type disambiguation of Chinese modal expressions. This disambiguation is based on Priority and Non-Priority classification, using three algorithms: Naive Bayes, maximum entropy, and decision tree, and features extracted from surrounding words as well as annotated data. The results show that maximum entropy has the best performance among the three algorithms. In addition, among the features that are used to train the classifiers, features extracted from annotated data achieve the highest accuracy in predicting the modality type of Chinese modal expressions, which is 0.9383.
Showing items related by title, author, creator and subject.
Eom, Soojeong (Georgetown University, 2012)Learning vocabulary and understanding texts present difficulty for language learners due to, among other things, the high degree of lexical ambiguity. By developing an intelligent tutoring system, this dissertation examines ...