Bayesian statistical modelling has become an attractive method to analyse data from a wide range of areas. Briefly, Bayesian inference estimates a posterior probability by incorporating prior belief and the likelihood of the observed data. The advantages of Bayesian inference include an intuitive approach (more information/data leads to better predictions) and a flexible model structure to incorporate and utilise more data. With the release of the probabilistic programming language “Stan” and its R (and Python) interfaces, convenient possibilities now exist to implement Bayesian modelling approaches for analysing genomic high-throughput sequencing data in R. The development of these methods will be particularly important in the context of animal and patient cancer data, where (1) replicate numbers are low, and (2) tumour samples show large between-replicate variability.
This project, supervised by Dr Maurits Evers, will suit a candidate with an interest in computation and/or bioinformatics.
As part of this project you will:
- Develop and implement a Bayesian (hierarchical) model in R using Stan, to assess differential expression of genes based on available high-throughput RNA sequencing (RNA-seq) data. You will learn about Bayesian statistical inference, and how to implement, optimise and test a Bayesian model.
- Perform a thorough statistical comparison of results from your model with those from other state-of-the-art methods for assessing differential gene expression. This may involve surveying and testing existing methods, and developing a sensible test environment for benchmarking.
- Learn how to document and track changes of project progress, with a focus on transparency and reproducibility.
Time Frame: This project is expected to take 3-12 months
Requirements: Good background in statistics; practical experience in statistical modelling and/or machine learning is beneficial