Classifying Coding and Noncoding Regions of the Genome Using Only Sequence Data: A Study using Deep and Interpretable Neural Networks

Screen Shot 2021-03-31 at 6.30.38 PM.png

Video


Team Information

Team Members

  • Shruti Verma, Undergraduate Student, Computer Science Department, Columbia Engineering

  • Faculty Advisor: Xuebing Wu, Assistant Professor, Department of Systems Biology and Department of Medicine

Abstract

The goal of this project is to learn whether neural network models can be used to accurately identify the coding and noncoding sections of a genome, given only the sequence of nucleotide bases that appear in the segment in question. For this purpose, various model architectures, preprocessing techniques, and evaluation methods were experimented with. Importantly, because this goal stems from a desire to model/understand biological realities as closely as possible, it was important that these networks be interpretable in some way, by indicating, for example, what in a given sequence had alerted it that the sequence was coding or not. Though the project itself is still in progress, results thus far have been favorable and future steps/directions well established.


Contact this Team

Contact: Xiaofu He (use form to send email)

Previous
Previous

FlyBrainLab: a Complete Programming Environment for Discovering the Functional Logic of the Fruit Fly Brain

Next
Next

Emotional Brain State Classification on fMRI data using 3D Residual Neural Networks