CMBE | JCSMR | ANU | Search ANU
The Australian National University
The John Curtin School of Medical Research
ANU College of Medicine, Biology & Environment

Overview Computational Genomics

The genome is the information system at the heart of all biological organisms. This central biological role, and the availability of the completed human genome, makes a compelling case for a genome centric unification of modern biomedical research and medical practice. Technical challenges must be overcome and fundamental knowledge acquired, however, if this potential of genome sequence is to be realised. Acquisition of the genome sequence itself is not enough, we must now add meaning to this resource by addressing fundamental questions such as:

  • Where are the genes located.
  • What sequences control the expression of genes.
  • Which genes interact to form molecular networks.
  • Which genetic variants affect human health.

Research in the Computational Genomics laboratory is focussed on genome decryption — identifying regions of the genome which encode functions that influence susceptibility to disease. We aim to maximise the capacity for de novo biological prediction from genome sequence alone, e.g. can we predict whether two molecules interact from an analysis of their DNA sequences alone, or can we predict the phenotypic impact of a genetic variant.
Our approach to these questions lies primarily in the realm of comparative genomics, i.e. we infer genome sequence properties by examining how sequences change. This comparative approach succeeds through detecting the influence of natural selection, a powerful indicator of biological function. Natural selection affects the distribution of within and between species genetic variation. Between species, the effect of natural selection on rates of divergence manifests via the elimination of deleterious variation or endorsement of adaptive variation. These lead to a reduction and an acceleration in divergence rate respectively.
As shifts in substitution rate are relative to the background neutral substitution rate, and because mutation rates vary across the genome, we must understand the basis for mutation rate fluctuation in order to infer the operation of natural selection. We seek this understanding by developing meta-data (genome annotation) driven statistical modeling of genome sequences. By overlaying meta-data onto genome sequence we are able to integrate discovery from other sources directly into our research.

For Students

Our research spans several traditional scientific disciplines that are now largely amalgamated under the label of Bioinformatics -- comparative and population genomics, mathematical statistics and computer science. Students in my group need to have some facility with programming, either formally (in terms of course work) or an aptitude for it. Other valuable backgrounds are either mathematical statistics and/or biology.
Projects are available to outstanding students in the program areas below. Please contact Dr. Gavin Huttley for more details.

Three integrated CG research programs:

Technology and Methodology Development:

Testing novel ideas about sequence change requires a mathematical representation with implementation in software. Most existing software packages for statistical analysis of comparative genomics are narrow in focus and not geared to the broader meta-data driven analyses of interest to us. Accordingly, we have developed the COmparative GENomics Toolkit (COGENT). This toolkit is the hub of our analytical activity, and is freely available. It provides facilities for genomic data manipulation, flexible model specification, and scales from single to multi-CPU architectures. A detailed description is available on our software page.
One project that aims to be generalised across biological processes concerns multiple alignment. This remains a significant challenge with most groups focussing on protein sequence alignments. We aim to develop a probabilistic multiple alignment approach that can incorporate insights from our analyses of genome mutation to improve fidelity. This project is being undertaken in collaboration with Cray Australia Pty. Ltd.

Characterising Mutation:

Making inference about the functional properties of a genomic region from patterns of substitution rates requires calibration to adjust for complexity in the mutation spectra. Rates of substitution are demonstrably heterogeneous across the genome in mammals and between mammal lineages. We have several projects underway that aim to dissect the causes of these important patterns. These efforts are focussed on the properties of DNA that make it particularly prone to mutation, and the influence of DNA replication/DNA repair systems. Of particular interest is the modified nucleotide mC (5-methyl-cytosine), which plays a critical role in mammal developmental biology and also exhibits a very high mutation rate. Published outcomes of this work demonstrate the striking influence of mC on genomic diversity, and reveal a cost to methylation of the protein coding gene BRCA1.

Inferring Function:

Identifying functional elements in the genome is our ultimate objective. These elements are not just confined to protein coding portions, but also RNA coding genes and the regulatory elements that dictate the pattern of expression. Evidence indicates some of these non-protein coding regions have distinct patterns of substitution (see Wakefield, Maxwell and Huttley, 2005). For instance, cytosine residues that are subject to methylation can be functional but prone to mutation. We seek to establish how this mutation selection balance plays out within the genome.
Biological processes arise from networks of interacting molecules. These networks underlie the important genetic phenomenon epistasis, a dependence of the genotype to phenotype map at a locus on the genotypes present at other loci. Such effects underlie the genetic etiology of complex diseases. We have projects underway to assess the potential utility of comparative genomic analyses to revealing these dependencies.
One of the most successful approaches to-date for establishing associations between genetic variation and human phenotype has been using candidate genes, genes that from external evidence such as molecular biology research, contribute to a suspected process. We are involved in assessing the contribution to human lupus of candidate genes identified from a genome wide ENU mouse mutagenesis.