Computational Analysis of Difficult-to-predict Genes and Detection of Lineage-specific Genes in Honey Bee (Apis mellifera)
Creator
Bennett, Anna Kathleen
Advisor
Elsik, Christine G.
Abstract
Honey bees are key agricultural pollinators and a research model for social behavior and the evolution of eusociality. Generation of reliable gene predictions is critical to the success of laboratory experiments and comparative analyses for sequenced genomes. The honey bee genome, published in 2006 by the Honey Bee Genome Sequencing Consortium, had fewer predicted genes than expected, partially due to a lack of transcriptome data and protein homologs from closely related species. As part of ongoing Consortium efforts, genomes of two closely related taxa were sequenced, dwarf honey bee (Apis florea) and buff-tailed bumble bee (Bombus terrestris), the draft A. mellifera genome was improved with additional sequence, and multiple A. mellifera tissue transcriptomes were sequenced. The Consortium predicted genes using seven methods and used these data to produce an improved official gene set (OGSv3.2) with ~5000 more protein-coding genes than the first set (OGSv1.0).
We present our approach to detect previously unknown genes in A. mellifera. We found new genes through both improvements in genome assembly completeness, and changes to prediction pipelines and additional transcript and homology evidence. The later set of new genes were shorter, less likely to overlap evidence alignments, and had a different surrounding genome GC content than previously predicted genes, which made them challenging to predict. Based on these findings, new genome projects will benefit from more effective gene prediction strategies.
We focused on the N-SCAN method specifically, to determine if it predicted more genes than were in OGSv1.0. N-SCAN leveraged the sequence conservation between A. mellifera and the two additional bee genomes to predict genes not found by other prediction pipelines. Therefore, N-SCAN proved useful and is a worthwhile tool to include in gene prediction efforts where closely related sequenced species are available.
We identified 4,277 genes specific to A. mellifera or to lineages within the insect order Hymenoptera. We detected lineage-specific genes using homology to other species' genomes and proteins, and propose mechanisms for their emergence. We identified genes with tentative roles in brood care, immunity, defense and other processes important to hive health, and thus associated with the evolution of eusociality in Apis.
Description
Ph.D.
Permanent Link
http://hdl.handle.net/10822/559464Date Published
2013Subject
Type
Publisher
Georgetown University
Extent
105 leaves
Collections
Metadata
Show full item recordRelated items
Showing items related by title, author, creator and subject.
-
Identification of A-to-I RNA Editing Sites in the Honey Bee (Apis mellifera) Brain Using RNAseq Data
Tao, Shu (Georgetown University, 2014)RNA editing is a post-transcriptional modification process that leads to addition, substitution and deletion of certain bases in RNA molecules, and accordingly alters their biological properties. As the most widespread ...