Establish a robust and reproducible high-throughput data analysis workflow using snakemake

High-throughput sequencing (HTS) methods have become the gold-standard for studying genomes and transcriptomes. The increase in quantity and complexity of HTS data and protocols requires the development of complex analysis workflows; for example, the analysis of Chromatin Immunoprecipitation sequencing (ChIP-seq) data requires a multitude of different tools for quality control, read alignment and quantitative assessment of DNA-protein binding. In order to ensure reproducibility of results, computational and statistical analysis steps need to be (1) transparent, (2) reproducible, and (3) well documented.

This project, supervised by Dr Maurits Evers, will suit a candidate with an interest in computation and/or bioinformatics.

As part of this project you will:

  1. Familiarise yourself with the Bioinformatics workflow management system snakemake. You will contribute towards implementing a reproducible and modularised analysis workflow involving available HTS data from The Hannan Group.
  2. Develop an understanding of the different steps involved in state-of-the art Bioinformatics data analysis, and the computational tools associated with it. This will involve learning to adapt and utilise specialised methods, as well as developing an understanding of underlying computational and statistical methods.
  3. Learn how to document and version control collaborative computational projects using GitHub and R.
  4. Make use of the high-performance computing environments of the ANU Bioinformatics Consultancy (ABC) unit and the National Computational Infrastructure (NCI).

Time Frame: This project is expected to take 3-12 months

Requirements: Experience with Linux environments; practical experience with programming in R and/or Python; experience with software development and project version control is beneficial