New Publication: Diploid Dual Assemblies Reveal the Telocentric Structure and Allelic Heterogeneity of Canine Genomes
We’re delighted to share our latest publication in NAR Genomics and Bioinformatics, led by Jeffrey M. Kidd. This study presents phase-resolved diploid dual assemblies from five canines, delivering an unprecedented view of canine genome architecture, centromere organisation, and the staggering allelic heterogeneity that shapes dog genomes.
By combining long-read sequencing with Hi-C-based scaffolding, we reconstructed both haplotypes of each individual at exceptional contiguity, with over half of all chromosomes recovered as single contigs and 21 gap-free chromosomes in the Australian Cattle Dog assembly. These resources allowed us to revisit long-standing questions about canine centromere structure and to construct a pangenome graph that captures variation invisible to single-reference approaches.
Key Findings
- High-quality diploid assemblies: Phase-resolved dual assemblies for five canines, with 21 gap-free chromosomes in the Australian Cattle Dog
- Telocentric confirmation: 27 centromeres analysed, beginning ~59 kb from chromosome starts and flanked by a conserved ~35 kb subtelomeric repeat segment shared across autosomes
- Pangenome insights: A 10-haplotype pangenome graph reveals that short tandem repeats are ~3× more common than VNTRs, with extensive nested allelic variation
- Structural variation at scale: 145,294 insertion/deletion structural variants identified across the cohort
- Mobile element activity: 52% of SVs are SINEC insertions; new SINEC insertions occur in roughly 1 in 15 births, and LINE-1 insertions in roughly 1 in 117 births
- Active retrotransposition: Identification of segregating full-length LINE-1 elements capable of ongoing retrotransposition
Significance
Dogs are one of the most genetically diverse mammalian species and a key model for studying disease, behaviour, and domestication. Single-reference genomes have long obscured the true scale of allelic variation across breeds. The diploid dual assemblies and pangenome graph released here provide a much richer substrate for variant discovery, association studies, and comparative genomics, and they settle outstanding questions about the telocentric organisation of canine chromosomes.
The frequency of new SINEC and LINE-1 insertions also has direct implications for understanding mutational processes in dogs and the contribution of mobile elements to phenotypic and disease variation across breeds.
Data Availability
- Raw sequencing data: NCBI SRA, BioProject PRJNA1285458
- Assemblies: GenBank accessions JBPXEE000000000–DBJLPD000000000
- Pangenome VCF: Zenodo 10.5281/zenodo.15881717
📄 Read the Full Open-Access Paper
Journal: NAR Genomics and Bioinformatics
Volume 8, Issue 2
Published: 23 April 2026
DOI: 10.1093/nargab/lqag035