Skip to content
Cloud Computing in Research

Cloud Computing in Research

I have worked with cloud-based scientific computing since 2012, when running genomic analyses on commercial cloud infrastructure was still a fringe activity in academic research. Since then I have contributed to building, teaching, and deploying cloud-backed bioinformatics — across continents and across the boundary between research, training, and institutional adoption.

Early adoption (2012–2015)

I was part of the first wave of researchers moving production-scale genomics onto commercial cloud infrastructure. At Harvard Medical School, I helped design and operate the cloud-based computational environments used for high-throughput sequencing courses and applied research, and contributed to the COSMOS distributed-pipeline framework and its bioinformatics pipelines:

  • GenomeKey — GATK best-practices variant calling.
  • PV-Key — somatic tumour/normal variant calling.
  • MC-Key — multi-cloud implementation of GenomeKey, designed to run across providers.

COSMOS-based pipelines were subsequently deployed in clinical production at Invitae, processing hundreds of thousands of patient samples — one of the early concrete demonstrations that cloud-deployed genomic workflows could meet diagnostic-grade reliability and throughput requirements.

For the underlying tooling, see Software.

Capacity building — cloud-backed workshops

A consistent thread of my capacity-building work has been using cloud infrastructure to teach computational biology at scale in places where local high-performance computing is scarce. Without cloud-backed shared environments, these workshops would not have been possible at the scale they ran:

  • Introduction to Bioinformatics workshop series — Morocco, USA, Tanzania (Pan-African Bioinformatics Network — H3AbioNet, H3Africa), Australia.
  • Advanced Bioinformatics workshop series — same partner network.
  • Indonesia bioinformatics workshop — single-instance training delivered via cloud-hosted environments.

In China, I have given research talks and led a round table on the role of large-scale data infrastructure in genomics — that work is conversational rather than instructional, and is listed under Conferences and Invited Talks.

Capacity building — cloud-backed formal teaching

Cloud-hosted environments have also underpinned my formal teaching, allowing whole cohorts to run real genomic analyses without each student needing dedicated HPC access:

  • Harvard Medical SchoolHigh-throughput Sequencing (BMI714), 2014–2015. Cloud-deployed practicums.
  • The University of Adelaide — Population Genomics, Ancient DNA, and Graph Pangenomes courses (2020–2024). Cloud-backed teaching infrastructure.

For full course details, see Teaching.

Institutional adoption — Adelaide University

I played a key role in the adoption of AWS at Adelaide University (formerly the University of Adelaide) through the RONIN research-cloud platform — translating early-adopter genomic-cloud experience into an institutionally supported pathway for research groups across the university, in partnership with researchers, IT, and procurement.

A concrete downstream outcome: my PhD student Shyamsundar Ravishankar was sponsored by RONIN to present “Scalable genomic data processing on cloud” at the Australasian Genomic Technologies Association (AGTA) 2024 Annual Conference, where his talk on cloud-backed ancient DNA workflows won the best oral student presentation award (post).

Applied and advisory work

For commercial and government engagements where cloud-based genomic analysis has been a core deliverable, see Advisory and Consulting.