Skip to content
Cloud Computing in Research

Cloud Computing in Research

I have worked with cloud-based scientific computing since 2012, when running genomic analyses on commercial cloud infrastructure was still a fringe activity in academic research. Since then I have contributed to building, teaching, and deploying cloud-backed bioinformatics — across continents and across the boundary between research, training, and institutional adoption.

Early adoption (2012–2015)

I was part of the first wave of researchers moving production-scale genomics onto commercial cloud infrastructure. At Harvard Medical School, I helped design and operate the cloud-based computational environments used for high-throughput sequencing courses and applied research, and contributed to the COSMOS distributed-pipeline framework and its bioinformatics pipelines:

  • GenomeKey — GATK best-practices variant calling.
  • PV-Key — somatic tumour/normal variant calling.
  • MC-Key — multi-cloud implementation of GenomeKey, designed to run across providers.

COSMOS-based pipelines were subsequently deployed in clinical production at Invitae, processing hundreds of thousands of patient samples — one of the early concrete demonstrations that cloud-deployed genomic workflows could meet diagnostic-grade reliability and throughput requirements.

For the underlying tooling, see Software.

Capacity building — cloud-backed workshops

A consistent thread of my capacity-building work has been using cloud infrastructure to teach computational biology at scale in places where local high-performance computing is scarce. Without cloud-backed shared environments, these workshops would not have been possible at the scale they ran:

  • Introduction to Bioinformatics workshop series — Morocco, USA, Tanzania (Pan-African Bioinformatics Network — H3AbioNet, H3Africa), Australia.
  • Advanced Bioinformatics workshop series — same partner network.
  • Indonesia bioinformatics workshop — single-instance training delivered via cloud-hosted environments.

In China, I have given research talks and led a round table on the role of large-scale data infrastructure in genomics — that work is conversational rather than instructional, and is listed under Conferences and Invited Talks.

Capacity building — cloud-backed formal teaching

Cloud-hosted environments have also underpinned my formal teaching, allowing whole cohorts to run real genomic analyses without each student needing dedicated HPC access:

  • Harvard Medical SchoolHigh-throughput Sequencing (BMI714), 2014–2015. Cloud-deployed practicums.
  • The University of Adelaide — Population Genomics, Ancient DNA, and Graph Pangenomes courses (2020–2024). Cloud-backed teaching infrastructure.

For full course details, see Teaching.

Institutional adoption — Adelaide University

I played a key role in the adoption of AWS at Adelaide University (formerly the University of Adelaide), through the Ronin research-cloud platform. This involved working with researchers, IT, and procurement to translate early-adopter genomic-cloud experience into an institutionally supported pathway for research groups across the university.

[Specific scope, dates, and counterparts to be added.]

Applied and advisory work

For commercial and government engagements where cloud-based genomic analysis has been a core deliverable, see Advisory and Consulting.