Relation Extraction for Protein-protein Interactions Affected by Mutations
Precision Medicine (PM) is a promising approach for cancer treatment in the modern medical practice. Information about protein-protein interaction and mutations affecting the interaction is essential in understanding biological processes and is one of the key aims in PM research. While previous research in text mining has achieved great progress in extracting protein-protein interactions (PPIs) in biomedical literature, few efforts have been made to explore methods to extract PPIs which are affected by mutations.In this thesis, I propose a feature-rich supervised method to extract PPIs affected by mutations from biomedical literature. First, a supervised model is trained to predict if a pair of proteins is interacting for new instances. Next, a ‘mutation refinement’ step is incorporated as a filter to determine the final answer. I compared effectiveness of two different training corpora (i. BioCreative VI PM track training; ii. AIMed+BioInfer) for model training. Experimental result shows that supervised model trained with combined corpus (AIMed+BioInfer) achieved better performance. Additionally, features selected from previous PPI extraction work and additional features were tested for model training. Evaluation of the result using BioCreative VI PM track testing dataset proves the effectiveness of the features proposed in my method. The system achieves up to 44% to improvement in F1-score over baseline method.
MetadataShow full item record
Showing items related by title, author, creator and subject.