Potential PhD projects
Accelerating mRNA design with artificial intelligence
Chemically modified nucleotides are essential building blocks of therapeutic mRNA, as they enhance translation activity and reduce adverse effects. The safety and efficacy of this therapeutic strategy are demonstrated by recent pandemic-driven immunisation initiatives. There are over 150 naturally occurring modifications, all compatible with the human innate immune system, but only a handful can be comprehensively measured in RNA molecules or has been tested so far for therapeutic development. The lack of tools for the identification of chemical modifications hampers critical advances for the next iteration of mRNA therapies.
In this project, we aim to develop innovative deep-learning methodologies in combination with the use of cutting-edge sequencing technology to systematically identify nucleotide modifications in mRNA. We will use the Nanopore sequencing technology, which threads molecules through tiny pores and produces a signal that can be processed to uncover the sequence of mRNA nucleotides and their chemical modifications. The project will develop new signal processing strategies and deep learning approaches to interpret the signal and predict the chemical modifications in individual molecules, and by doing so, facilitate the study of RNA modifications in disease and the development of new mRNA therapeutics.
Computational challenges include the development of methods for complex signal processing, training strategies for deep learning in a highly unbalanced problem, deep learning architectures for input signals of variable length, incorporation of feature engineering in deep learning architectures, use of transformers to identify relevant parts in the input signal, use of generative models to explore configurations.
|Supervisor: Professor Eduardo Eyras||Co-supervisor: Dr Cheng Soon Ong|
MicroRNA as diagnostics and therapeutics for retinal degenerations
MicroRNA (miRNA) are small, endogenous, non-coding molecules that are powerful regulators of genetic information. Despite only being discovered as recently as the turn of this century, miRNAs are already used in clinical trials as therapeutic candidates for complex diseases such as cancer. In fact, miRNAs have already been implicated in the pathogenesis of complex neurodegenerative disorders such as Parkinson’s, Alzheimer’s, and Age-related Macular Degeneration (AMD).
We currently have a number of ongoing projects where we are attempting to harness the regulatory capabilities of miRNAs to use as therapeutics for AMD. A single miRNA has the ability to control multiple different mRNA, often within the same molecular pathway (e.g. inflammation). This ability makes miRNAs promising therapeutic molecules to target multiple players in a single pathway. Due to the complex nature of AMD, we believe that this approach may prove fruitful in ameliorating key pathways known to lead to retinal degeneration, such as inflammation and oxidative stress.
The Clear Vision Research Lab is currently undertaking projects in which we are investigating the use of miRNA as biomarkers for retinal degenerations. Our research aims to identify specific miRNAs indicative for different stages in retinal disease and develop a method of disease grading based on their expression in biofluids.
We have access to a number of human and adult databases with various RNAseq datasets (single cell, RNAseq, Spatial transcriptomics) and require computational support to bring these computational elements together and begin searching both for miRNA and other RNA of interest for the treatment and diagnosis of retinal degenerations.
|Supervisor: A/Professor Riccardo Natoli Natoli||Co-supervisor: Mr Adrian Cioanca|
Development of CRISPR gene editing technology for health applications
Bacteria face the constant pressure from bacteriophage to be killed. In a phage-bacteria arm race, bacteria have developed sophisticated defence systems against phages. These systems such as the Nobel laureate winning CRISPR technology have been harnessed as gene editing tools for gene therapy, molecular diagnostics and synthetic biology.
The fundamental aim of this project is to better understand how the bacteria defend against viral infection using a computational biology approach and how we can take advantage of the bacteria defence system to harness it as a novel CRISPR gene editing technology. This project will involve undertaking the computational analysis of large and highly diverse metagenome datasets to mine for novel antiphage defence systems and to use microbial communities composition as a health biomarkers. This project will combine diverse computational tools and algorithms such as deep learning techniques, pattern recognition algorithms, complex genome assemblies.
Together this project will enable to expand the existing toolbox of CRISPR editing enzymes and to find potentially new gene editing tools and health biomarkers.
This project will involve establishing new analysis pipelines and algorithms to assemble and mine highly diverse metagenomes datasets. The major challenge consists in assembling highly diverse mix of unknown viral, bacterial and eukaryote genomes. This project will involve the use of deep learning techniques, pattern recognition and clustering techniques to assemble and mine these metagenomes .
|Supervisor: Dr Gaetan Burgio||Co-supervisor: Professor Mark Polizzotto|
Better disease detection and diagnosis through the use of deep learning with flow cytometry datasets
Complex genetic diseases, such as some autoimmune diseases, can be the result of a combination of interacting mutations in different genes. Each mutation has a small effect that combines with other small effects due to other mutations, which together add up to cause disease. Sophisticated pattern-recognition data tools are required to identify the genetic causes of these complex diseases. We use data from Fluorescence-Activated Cell Sorting (FACS) machines to capture the small effects of single mutations on the composition of different cell populations in a single individual. The data produced by FACS machines is large and multidimensional, and we have applied dimensionality reduction and deep learning to identify subtle changes that occur due to mutations.
Projects will expose students to the full stack of data activities from data generation, pre-processing, normalisation and dimensionality reduction, through to training of neural networks to use this cellular data to differentiate a normal individual from a patient with a disease. Project data will include that from genetically-identical mice and from hospital patients with autoimmune conditions. We also will investigate methods that may allow us to see what aspects of the data allow accurate diagnoses. This project has a number of avenues that a student can choose to explore depending on their interest in a) clinical applications, b) methodology development, and even c) data generation with cell sorters.
Computational challenges are present in all aspects of the project, from applying and developing DL methodology to diagnose disease, to better extracting information from data through normalisation, to improvement of metadata to allow better data label sets.
|Supervisor: Dr Dan Andrews|
Genome assemblies and annotation to uncover complex biology
We are seeking talented PhD students to take on the challenge of generating telomere-2-telomere genome assemblies for human and reptilian (lizards and snakes) species. We are collecting nationally important and internationally relevant genomic data assets towards this goal. Potential PhD student will work in a highly collaborative network to generate genome assemblies and address key questions in evolutionary biology (e.g. sex-chromosome evolution, chromosomal rearrangements, rDNA and centromere biology), human health genomics (e.g. tempo and mode of structural variations, Alu repeat family, NumTs, rDNA and their role in diseases) and develop novel methods and software. High quality genome assemblies accompanied with annotations will lead to impactful publications and open opportunities to conduct discovery led molecular studies.
Third generation sequencing is enabling rapid progress in generating high-quality genome assemblies. However, rDNA, segmental duplications, centromeres and mitochondrial DNA regions are often difficult to assemble. In addition, sequencing data types and error rates poses different challenges in genome assembly process. Innovative annotation aware computational methods has significant potential in reducing errors and the requirement of manual curation. Second, genome assemblies require annotations for understanding gene families, their evolution and functional roles to uncover species biology. Current predictive methods for gene annotations result in approximately 70% accuracy. Students will have the opportunity to work with high-quality data to develop deep-learning methods for genome annotations using molecular and comparative evidence.
|Supervisor: Dr Hardip Patel||Co-supervisor: Professor Arthur Georges|
Isoform-resolved prediction of mRNA activity for therapeutic applications and synthetic biology
RNA-based technologies are rapidly developing with high demand of information regarding functional messenger(m)RNA design and RNA disease signatures. Control of messenger(mRNA) translation into proteins (protein biosynthesis) is directly involved in development, memory and synaptic plasticity, ageing, viral-host interactions, stress response and cell damage, malignancy onset and drug resistance, and offers an excellent and under-exploited area of therapeutic targeting and diagnostic opportunity. The functional mRNA ‘language’, however, remains largely obscure. Our group has pioneered a best-in-class technology to interrogate deep details of protein biosynthesis control transcript-wise and transcriptome-wide. We further have recently developed a Stochastic Translation Efficiency measure (STE), which employs sophisticated machine learning and the broadest available set of translation complex classes, to provide accurate prediction of the translational rates.
This project will develop STE model for human cells for the first time. Further novelty will be in the stratification of the isoform and modification data in the model based on direct long-read RNA sequencing. We will apply the human STE model to dissect drug response invoked during evolution of drug resistance in diffuse large B-cell lymphoma, a blood cancer model of high relapse rate. The outcomes will create an atlas of human mRNA building blocks and identify new targetable pathways in cancers of significant burden.
Computational challenges and excitement of the proposed research are in developing novel ensemble machine learning approaches capable of screening through and aggregating diverse data types and complex patterns of dependencies. Our methods to profile translational complexes collect multivariate signals and the most complete snapshot of translation complex distribution. We thus enable virtually endless opportunities for extraction of diverse parameters of protein biosynthesis dynamics, improvement of prediction accuracy and classification of the respective mRNA building blocks and their functional language by ‘power’ in translation and response type under pre-determined and predicted conditions. More specifically, this project will address the problem of accurate isoform-aware attribution of short read sequences that are employed by translation complex profiling. Protein biosynthesis predictions are made based on direct measurements of protein metabolism by e.g. mass-spectrometry and assignment back to the transcripts, or the inference of translational rates from high-throughput short RNA-sequiencing-derived signals. Both approaches employ identification of short peptide or RNA fragments and are challenged by the isoform diversity characteristic to the complex eukaryote cells such as of metazoan, mammalian and human origins. Neither of the approaches in their current implementations can confidently attribute the recorded data to the specific isoform types. These computational challenges will be addressed by incorporating direct long-read sequencing data, in addition to the short-read translation profiling data, into the ensemble learning methods. This will enable association of any detected RNA modification types with the respective RNA building block functionality, and will provide potential for subsequent improvements as the number of detectable RNA modification types expands.
|Supervisor: Dr Nikolay Shirokikh||Co-supervisor: Professor Eduardo Eyras|
Neuronal computations underlying sensory decision making
A key challenge in systems neuroscience is to understand how the external world is represented inside our brain. How does the dynamics of neuronal population activity underlie the encoding of sensory inputs and the generation of appropriate behavioural decisions? This project combines electrophysiological, optical imaging, and behavioural methods in mice, with computational analyses to investigate how the elegant circuitry of the mammalian cortex underpins the efficient encoding and decoding of sensory signals.
The computational methods include modelling of neuronal population dynamics and information theoretic analysis of neuronal data.
|Supervisor: Professor Ehsan Arabzadeh||Co-supervisor: Dr Matthew Tang|
Deciphering the spatial role of RNA binding proteins in retina degeneration
Age-related Macular Degeneration (AMD) affects the macular region in the retina, which controls acuteness and colour vision. However, the pathophysiology and molecular pathway that drive AMD progression have not yet been fully understood. In this project, we will investigate the role of neuronal-specific RNA binding proteins (RBPs) in retina neuronal cell type differentiation and regulation of mRNA alternative processing in response to neurodegeneration and inflammatory in AMD. We integrate spatial transcriptome and single-cell RNA-seq data to characterise mRNA alternative usage patterns in retinal cell populations during degeneration. We will further develop new interpretable neural network predictive models for quantification of RBP combinatorial interactions and regulatory modules.
Deep neural network models in genomics, spatial transcriptome analysis, single-cell RNA-seq analysis
|Supervisor: A/Professor Jean (Jiayu) Wen||Co-supervisor: A/Professor Riccardo Natoli|
Advanced imaging tools for dissecting gene function in symmetry breaking
In this project you will use cutting edge imaging techniques and advanced computational tools for post-processing of optical images. Two types of imaging and analysis experiments will be used to understand how genes function to control complex biological functions. In one type of experiment, optical and computational methods that visualise fluid dynamics in living tissues will be used. Fluorescent particles will be added to fluid, the fluid illuminated, and particle movement recorded and analysed to calculate the velocity field of the flow within the fluid. In the second type of experiment gene expression will be visualized in a massively parallel manner in intact tissues using a frontier technology called spatial transcriptomics. For both approaches you will conduct a time series comparison of normal and abnormal mouse embryos to understand how embryonic symmetry is broken and spatial chirality of body organs is established during normal development. A failure of this process is responsible for a variety of birth defects include abnormalities of the heart. The project is therefore relevant to cutting edge genomics, the embryonic development of all vertebrates and to human health and disease.
Advanced computational tools for post-processing of optical images
|Supervisor: Professor Ruth Arkell||Co-supervisor: Dr Woei Ming (Steve) Lee|
Inferring predictive DNA/RNA motif grammar from eukaryotic genomes using deep learning methods
In the past decades, deep learning methods have revolutionised inference and interpretation from Big Data. The human genome can be interpreted as long, unstructured text containing instructions for manufacturing hundreds of different cell types and their gene expression program at steady state and in response to stimuli. One of the biggest challenges of this century is to infer the language by which proteins can communicate with nucleic acids and gain access to or copy their encoded information.
The concept of cell-type specific gene expression theory posits that key transcription factors (TFs) and RNA binding proteins (RBPs) activate developmental switches and cell-type specific gene expressional programs via specific DNA binding on distal regulatory loci (enhancers) and tightly regulated splicing of the pre-messenger RNA. However, there is no available method to elucidate the “motif grammar” guiding TF and RBP binding.
By inferring specific motifs and the underlying grammars this project will greatly contribute to our knowledge on how cell-type specificity is encoded in genome of multicellular organisms. In cancerous cells, mutated DNA/RNA recognition sequences may activate or diminish gene expression programs with detrimental effects on the human body. Outcomes of these project will support the development of diagnostic tools by predicting the effect of mutations on TF or RBP binding.
In the proposed project, the candidate will design and build a deep learning model featuring multilayer Convolutional Neural Networks (CNNs), NLPs and and/or Generative Adversarial Networks (GANs) to infer general motif grammars of TF and RBP recognition sequences based on in-silico and experimentally validated TF binding and splicing data. The building blocks of these grammars will be short, specific DNA/RNA motifs, their order, spacing and co-occurrence expressed by logical relations (e.g. AND/OR gates). Our laboratory has been recently awarded allocation on the NCI Gadi supercomputer ensuring sufficient capacity to perform the underlying computations.
|Supervisor: Dr Attila Horvath||Co-supervisor: Professor Eduardo Eyras|
Characterising the closed loop model of mRNA translation using machine learning methods
Translation initiation is among the most complex and extensively regulated steps of protein synthesis. An intriguing model of translation initiation is the ‘closed-loop’ model, in which the ends of the template mRNA molecule are circularised by specific protein factors and/or ribosome complexes leading to effective translation. As the closed-loop may have an accelerating effect on scanning and ribosome recycling, a differential propensity of mRNAs to form closed-loop would affect their translational outcomes at steady state and in response to stress.
To investigate whether closed-loop formation is different between mRNAs, we have developed novel in vivo approaches such as TCP-seq (Archer & Shirokikh et al. 2016, Nature), a method capable of obtaining snapshots of scanning, translating or disassembling ribosome complexes and their associating protein factors. Using advanced machine learning methods, we now aim to discover characteristic mRNA motifs that can facilitate closed loop formation. Deeper understanding of this process would greatly contribute to our knowledge of translation, and aid future design of translation-related therapeutics.
This project aligns well with other ongoing projects in the Preiss Group investigating the mechanistic details of the closed loop model using cutting-edge wet-lab techniques and the three-dimensional arrangement of ribosome clusters using statistical and machine-learning models (Horvath & Janapala et al. 2022, bioRxiv preprint).
In the proposed project, the candidate will integrate several TCP-Seq data sets and develop computational-intensive machine learning methods to discover necessary and/or sufficient mRNA features initiating and maintaining closed loop formation. Once the main features have been discovered, the candidate will build a comprehensive model that is able to predict the propensity of closed loop formation for individual mRNAs and determine the relative contribution of the underlying features using Deep Convolutional Neural Networks (CNN) combined with Long Short-Term Memory (LSTM) architecture. Our laboratory has been recently awarded access to the NCI Gadi supercomputer ensuring sufficient capacity to perform the underlying computations.
|Supervisor: Professor Thomas Preiss||Co-supervisor: Dr Attila Horvath|
Predicting the 3D structure of eukaryotic genomes based on functional genomics data sets
The eukaryotic genome is organised in a complex and dynamic 3D structure to maintain the safe storage and accurate replication of the genetic information, and allow sophisticated gene regulation. To achieve this, protein complexes, such as the Cohesin complex, work together to tightly regulate the genome both in space and in time. However, how these complexes find their correct genomic locations and what sequence elements contribute to the establishment of the 3D structure, is only scarcely understood and remains one of the most important questions in genome biology.
Recent studies have suggested that non-coding RNAs (ncRNAs) and R-loops have an important role in this process. Our laboratory is generating HiC libraries to map the 3D genome, both in S. pombe and in human cell-cultures. We have also established various systems to modulate nuclear ncRNAs and R-loops to characterise their function. We aim to combine these data sets and develop a predictive framework to determine the role of sequence elements, ncRNAs and R-loops in the spatial organisation of the genome and in the correct localisation of the Cohesin and other relevant complexes.
Understanding the contribution of ncRNAs, R-loops and various sequence elements in genome organisation would enable a greater degree of control over gene expression programs. This could lead to the development of novel therapeutic methods to selectively silence or induce target genes by modulating individual ncRNAs/R-loops.
In the proposed project, the candidate will integrate several functional genomics data sets, including quantitative R-loop maps and genome-wide protein binding data. Using 3D genome mappings inferred from HiC experiments as “ground truth”, we aim to build a comprehensive model to reconstruct the 3D structure of the genome and determine the relative contribution of sequence elements, R-loops and various protein complexes. To achieve this, the candidate will need to build Deep Convolutional Neural Networks (CNN) combined with Multidimensional Scaling (MDS) architecture.
|Supervisor: Dr Attila Horvath||Co-supervisor: A/Professor Tamás Fischer|
Diseased-based deep learning for single-cell multi-omics data integration
Human fibrotic diseases constitute a major health problem. The incomplete knowledge of the pathogenesis of fibrotic process lies in the absence of appropriate and fully validated biomarkers. While high-throughput single cell technologies have revolutionized medical research, integrated single-cell multi-omics workflow is at its infancy. In this project, we aim to develop multi-omics workflow by combining layers of single cell data (integration of sample similarities, joint dimension reduction techniques, and statistical modeling approaches). with deepening our understanding of fibrosis progression. In this project, we aim to further develop a disease-guided deep learning network and profiled fibroblast (stromal) cells through multiple layers of omics data (sequencing, transcriptomics, protein, function). In the computational analysis for recognize resident fibroblast cell types and their states (RNA species protein expression, behavior). We aim to discover novel fibroblast cell populations that contribute to fibrosis in different organs. This project will create a new multi-modal single-cell multi-omics platform and facilitate a collaborative work towards computational multi-omics analysis.
"Less information, better integration” - Efficient multi-omics integration strategies by learning disease progression at multiple scale. Diseased-Informed deep learning network for Fibrosis
|Supervisor: Dr Woei Ming (Steve) Lee||Co-supervisor: Professor Lexing Xie|