Cloud Computing in Research
I have worked with cloud-based scientific computing since 2012, when running genomic analyses on commercial cloud infrastructure was still a fringe activity in academic research. Since then I have contributed to building, teaching, and deploying cloud-backed bioinformatics — across continents and across the boundary between research, training, and institutional adoption.
Early adoption (2012–2015)
I was part of the first wave of researchers moving production-scale genomics onto commercial cloud infrastructure. At Harvard Medical School, I helped design and operate the cloud-based computational environments used for high-throughput sequencing courses and applied research, and contributed to the COSMOS distributed-pipeline framework and its bioinformatics pipelines:
- GenomeKey — GATK best-practices variant calling.
- PV-Key — somatic tumour/normal variant calling.
- MC-Key — multi-cloud implementation of GenomeKey, designed to run across providers.
COSMOS-based pipelines were subsequently deployed in clinical production at Invitae, processing hundreds of thousands of patient samples — one of the early concrete demonstrations that cloud-deployed genomic workflows could meet diagnostic-grade reliability and throughput requirements.
For the underlying tooling, see Software.
Capacity building — cloud-backed workshops
A consistent thread of my capacity-building work has been using cloud infrastructure to teach computational biology at scale in places where local high-performance computing is scarce. Without cloud-backed shared environments, these workshops would not have been possible at the scale they ran:
- Introduction to Bioinformatics workshop series — Morocco, USA, Tanzania (Pan-African Bioinformatics Network — H3AbioNet, H3Africa), Australia.
- Advanced Bioinformatics workshop series — same partner network.
- Indonesia bioinformatics workshop — single-instance training delivered via cloud-hosted environments.
In China, I have given research talks and led a round table on the role of large-scale data infrastructure in genomics — that work is conversational rather than instructional, and is listed under Conferences and Invited Talks.
Capacity building — cloud-backed formal teaching
Cloud-hosted environments have also underpinned my formal teaching, allowing whole cohorts to run real genomic analyses without each student needing dedicated HPC access:
- Harvard Medical School — High-throughput Sequencing (BMI714), 2014–2015. Cloud-deployed practicums.
- The University of Adelaide — Population Genomics, Ancient DNA, and Graph Pangenomes courses (2020–2024). Cloud-backed teaching infrastructure.
For full course details, see Teaching.
Institutional adoption — Adelaide University
I played a key role in the adoption of AWS at Adelaide University (formerly the University of Adelaide), through the Ronin research-cloud platform. This involved working with researchers, IT, and procurement to translate early-adopter genomic-cloud experience into an institutionally supported pathway for research groups across the university.
[Specific scope, dates, and counterparts to be added.]
Applied and advisory work
For commercial and government engagements where cloud-based genomic analysis has been a core deliverable, see Advisory and Consulting.