// publications / [3] predicting gene and protein function

Inferring microbial gene function:

An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner // Metagenome phyletic profiles (MPPs) can accurately predict 826 Gene Ontology functional categories // MPPs derived from diverse environments infer distinct, non-overlapping sets of gene functions
The changes in codon adaptation in orthologous gene families can systematically predict function of many genes by employing machine learning to rule out confounding variables. We have experimentally validated novel roles in adaptation to environmental stressors (oxygen, heat, salinity) for tens of E. coli genes.

Predicting oncogenes:

A statistical method, ALFRED, tests Knudson’s two-hit hypothesis to systematically identify inherited cancer predisposing genes // We identify novel genes, such as the chromatin modifier NSD1, which cause cancer through  germline variants and somatic loss-of-heterozygosity // 1 in 50 tumors is associated with novel ALFRED genes

Methods for gene function prediction:

We analyzed 5 million genes from 2071 genomes to evaluate established methodologies for automated function prediction (AFP). While >1000 functions yielded reliable predictions, the majority of these were accessible to only one or two of the methods. Different methods tend to assign a function to non-overlapping sets of genes. Genomic AFP methods display a striking complementary, both gene-wise and function-wise. 

"Prediction is very difficult, especially about the future." -- Niels Bohr.