Revisiting Motif Trees: High Performance White Box Predictors for Biological Sequences Modeling
Presenter
Event Type
Minisymposium
Computer Science and Applied Mathematics
Emerging Application Domains
Chemistry and Materials
Climate and Weather
Physics
Solid Earth Dynamics
Life Sciences
Engineering
TimeThursday, 13 June 201912:45 - 13:15
LocationHG F 3
DescriptionDue to the constant improvement of sequencing methods and equipment, biological databases are increasing exponentially. In order to exploit such amounts of data, machine learning has been routinely used in bioinformatics during the last 25 years. However, most previous research did focus on building black box sequence predictors. While very useful to analyze and sort raw biological data, these approaches did not improve much biological comprehension. In this talk, we present a new method for modeling sequence consensus based on decision trees and language theory. Resulting classifiers allow not only to predict sequence patterns but also to visualise pattern structure and requirements. As an example, we will show how to build an efficient model for the N-terminal acetylation of proteins, which is the most common post translational modification in eukaryotic cells.