A versatile multi-components mixed model for bacterial-Genome Wide association studies

Arthur Frouin


Date
30 juin 2026

Genome-wide Association Studies (GWAS) have played a crucial role in uncovering the genetics underlying complex human traits. Recently, there has been considerable interest in adapting GWAS-like methodologies to investigate pathogenic bacteria. Despite the variety of methods proposed, there remains a lack of clarity on how to effectively model the intricate population structures found in bacterial cohorts. In this study, we analyze the genetic architecture of whole-genome sequencing data from three distinct bacterial species, showing that the standard models used in human genetics, typically employed by existing bacterial GWAS methods, fall short when applied to organisms with highly structured genomes. Building on these findings, we introduce ChoruMM, a robust and powerful multi-component linear mixed model. This model infers components through hierarchical clustering of the bacterial genetic relatedness matrix. Extensive simulations show that our approach reduces false positives while maintaining, or even improving, detection rates compared to current pipelines. The ChoruMM package includes post-processing and visualization tools designed to address the prevalent issue of long-range correlations in bacterial genomes, enabling accurate assessment and calibration of type I error rates.