/ publications / [4] gene function inference

Genotype-to-phenotype mapping:

The changes in codon adaptation in orthologous gene families can systematically predict function of many genes by employing machine learning to rule out confounding variables. We have experimentally validated novel roles in adaptation to environmental stressors (oxygen, heat, salinity) for tens of E. coli genes.

We have systematically annotated >3,000 prokaryotic taxa with >400 phenotypes, while drawing on comparative genomics and text mining techniques. This reveals thousands of gene families causally involved in various microbial traits, as well as pervasive epistasis that has shaped gene repertoires of these organisms.

Methods for automated gene function prediction:

An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner // Metagenome phyletic profiles (MPPs) can accurately predict 826 Gene Ontology functional categories // MPPs derived from diverse environments infer distinct, non-overlapping sets of gene functions

We analyzed 5 million genes from 2071 genomes to evaluate established methodologies for automated function prediction (AFP). While >1000 functions yielded reliable predictions, the majority of these were accessible to only one or two of the methods. Different methods tend to assign a function to non-overlapping sets of genes. Genomic AFP methods display a striking complementary, both gene-wise and function-wise.

"All we know about the world teaches us that the effects of A and B are always different - in some decimal place - for any A and B. Thus asking 'are the effects different?' is foolish." -- John Tukey