| The genome is the information system
at the heart of all biological organisms. This central biological
role, and the availability of the completed human genome, makes a
compelling case for a genome centric unification of modern biomedical
research and medical practice. Technical challenges must be overcome
and fundamental knowledge acquired, however, if this potential of
genome sequence is to be realised. Acquisition of the genome sequence
itself is not enough, we must now add meaning to this resource by
addressing fundamental questions such as:
- Where are the genes located.
- What sequences control the expression of genes.
- Which genes interact to form molecular networks.
- Which genetic variants affect human health.
Research in the Computational Genomics laboratory is focussed on
genome decryption — identifying regions of the genome which
encode functions that influence susceptibility to disease. We aim
to maximise the capacity for de novo biological prediction from genome
sequence alone, e.g. can we predict whether two molecules interact
from an analysis of their DNA sequences alone, or can we predict the
phenotypic impact of a genetic variant.
Our approach to these questions lies primarily in the realm of comparative
genomics, i.e. we infer genome sequence properties by examining how
sequences change. This comparative approach succeeds through detecting
the influence of natural selection, a powerful indicator of biological
function. Natural selection affects the distribution of within and
between species genetic variation. Between species, the effect of
natural selection on rates of divergence manifests via the elimination
of deleterious variation or endorsement of adaptive variation. These
lead to a reduction and an acceleration in divergence rate respectively.
As shifts in substitution rate are relative to the background neutral
substitution rate, and because mutation rates vary across the genome,
we must understand the basis for mutation rate fluctuation in order
to infer the operation of natural selection. We seek this understanding
by developing meta-data (genome annotation) driven statistical modeling
of genome sequences. By overlaying meta-data onto genome sequence
we are able to integrate discovery from other sources directly into
our research.
For Students
Our research spans several traditional scientific disciplines that
are now largely amalgamated under the label of Bioinformatics -- comparative
and population genomics, mathematical statistics and computer science.
Students in my group need to have some facility with programming,
either formally (in terms of course work) or an aptitude for it. Other
valuable backgrounds are either mathematical statistics and/or biology.
Projects are available to outstanding students in the program areas
below. Please contact Dr.
Gavin Huttley for more details.
Three integrated CG research
programs:
Technology and Methodology Development:
Testing novel ideas about sequence change requires a
mathematical representation with implementation in software. Most
existing software packages for statistical analysis of comparative
genomics are narrow in focus and not geared to the broader meta-data
driven analyses of interest to us. Accordingly, we have developed
the COmparative GENomics Toolkit (COGENT). This toolkit is the hub
of our analytical activity, and is freely available. It provides facilities
for genomic data manipulation, flexible model specification, and scales
from single to multi-CPU architectures. A detailed description is
available on our software page.
One project that aims to be generalised across biological processes
concerns multiple alignment. This remains a significant challenge
with most groups focussing on protein sequence alignments. We aim
to develop a probabilistic multiple alignment approach that can incorporate
insights from our analyses of genome mutation to improve fidelity.
This project is being undertaken in collaboration with Cray
Australia Pty. Ltd.
Characterising Mutation:
Making inference about the functional properties of
a genomic region from patterns of substitution rates requires calibration
to adjust for complexity in the mutation spectra. Rates of substitution
are demonstrably heterogeneous across the genome in mammals and between
mammal lineages. We have several projects underway that aim to dissect
the causes of these important patterns. These efforts are focussed
on the properties of DNA that make it particularly prone to mutation,
and the influence of DNA replication/DNA repair systems. Of particular
interest is the modified nucleotide mC (5-methyl-cytosine), which
plays a critical role in mammal developmental biology and also exhibits
a very high mutation rate. Published outcomes of this work demonstrate
the striking influence of mC on genomic diversity, and reveal a cost
to methylation of the protein coding gene BRCA1.
Inferring Function:
Identifying functional elements in the genome is our
ultimate objective. These elements are not just confined to protein
coding portions, but also RNA coding genes and the regulatory elements
that dictate the pattern of expression. Evidence indicates some of
these non-protein coding regions have distinct patterns of substitution
(see Wakefield, Maxwell and Huttley, 2005). For instance, cytosine
residues that are subject to methylation can be functional but prone
to mutation. We seek to establish how this mutation selection balance
plays out within the genome.
Biological processes arise from networks of interacting molecules.
These networks underlie the important genetic phenomenon epistasis,
a dependence of the genotype to phenotype map at a locus on the genotypes
present at other loci. Such effects underlie the genetic etiology
of complex diseases. We have projects underway to assess the potential
utility of comparative genomic analyses to revealing these dependencies.
One of the most successful approaches to-date for establishing associations
between genetic variation and human phenotype has been using candidate
genes, genes that from external evidence such as molecular biology
research, contribute to a suspected process. We are involved in assessing
the contribution to human lupus of candidate genes identified from
a genome wide ENU mouse mutagenesis. |