How novel computer technology transforms bioinformatics analyses

Dr Denis Bauer, CSIRO

The genome holds information on prospective disease risk. Hence by 2025, 50% of the world population will have their genome sequenced. This will create more data than Astronomy, Twitter, and YouTube combined (>20 Exabyte). To tackle this challenge, we use Apache Spark to create analysis software tailored to the complexity of genomic data. VariantSpark can analyse 3000 samples with 80 million features in under 30 minutes, hence enabling real-time diagnosis by finding “patients like me”. This platform is contributing to Motor Neuron disease research (Ice Bucket Challenge) here in Australi

Another trend seen in life science and other fields alike is real-time analysis through cloud-based solutions. Keeping runtime constant can be challenging for problems that vary in their complexity such as genome engineering. Here, the whole genome needs to be analysed anew for every location where the beneficial genomic change can be introduced, varying by orders of magnitude. Using Lambda we break down this task into smaller sub-tasks that can be solved in parallel by instantaneously recruiting additional Lambda functions as the complexity increases. GT-Scan2 was featured on the prestigious AWS Jeff Barr blog as it brings together novel scientific insights and unprecedented cloud-compute capacity.

Dr Denis Bauer is the team leader of the transformational bioinformatics team in CSIRO’s ehealth program. She has a PhD in Bioinformatics from the University of Queensland and held Post-doctoral appointments in biological machine learning at the Institute for Molecular Bioscience and in genetics at the Queensland Brain institute. Her expertise is in computational genome engineering and BigData compute systems. She is involved in national and international initiatives tasked to include genomic information into medical practice, funded with $200M. She has 30 peer-reviewed publications (14 as first or senior author) with six in journals of IF>8 (e.g. Nat Genet.) and H-index 11. To date she has attracted more than $6.5Million in funding as Chief investigator.