Classifying Coding and Noncoding Regions of the Genome Using Only Sequence Data: A Study using Deep and Interpretable Neural Networks
Video
Team Information
Team Members
Shruti Verma, Undergraduate Student, Computer Science Department, Columbia Engineering
Faculty Advisor: Xuebing Wu, Assistant Professor, Department of Systems Biology and Department of Medicine
Abstract
The goal of this project is to learn whether neural network models can be used to accurately identify the coding and noncoding sections of a genome, given only the sequence of nucleotide bases that appear in the segment in question. For this purpose, various model architectures, preprocessing techniques, and evaluation methods were experimented with. Importantly, because this goal stems from a desire to model/understand biological realities as closely as possible, it was important that these networks be interpretable in some way, by indicating, for example, what in a given sequence had alerted it that the sequence was coding or not. Though the project itself is still in progress, results thus far have been favorable and future steps/directions well established.
Contact this Team
Contact: Xiaofu He (use form to send email)