Simulation of “in-silico” genomic high-throughput sequencing data to assess and benchmark results from different computational tools, and to guide parameter optimisation

High-throughput sequencing (HTS) data have become a major component in quantitative genomics and computational biology. Simulation of HTS data is a critical component in developing novel and/or improved methods for their analysis. As HTS protocols become more and more complex, the need for methods generating matching synthetic data increases. Synthetic data allow to “in-silico” study the effect of genomic aberrations as present in e.g. cancer on sequencing libraries, and help optimise critical parameters of analysis methods.

This project, supervised by Dr Maurits Evers, will suit a candidate with an interest in computation and/or bioinformatics.

As part of this project you will:

  1. Develop and optimise a method to generate synthetic Chromatin Immunoprecipitation sequencing (ChIP-seq) data, simulating a typical transcription factor binding ChIP experiment.
  2. Use and assess state-of-the-art methods for the identification and quantification of transcription factor binding sites based on synthetic ChIP-seq data. In this context, you will explore the effect of varying copy numbers of specific genes on the resulting synthetic ChIP-seq libraries.
  3. Make use of the high-performance computing environments of the ANU Bioinformatics Consultancy (ABC) unit and the National Computational Infrastructure (NCI).

Time Frame: This project is expected to take 3-12 months

Requirements: Experience with Linux environments; practical experience with programming in R; experience with high-performance computing environments is beneficial