Pulse Brain · Growing Health Evidence Index
Tier 3 — Observational / field trialPeer-reviewed

Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis

Simon Haworth, Ruth E. Mitchell, Laura J. Corbin, Kaitlin H. Wade, Tom Dudding, Ashley Budu‐Aggrey, David Carslake, Gibran Hemani, Lavinia Paternoster, George Davey Smith, Neil M Davies, Daniel J. Lawson, Nicholas J. Timpson

Nature Communications · 2019

Read source ↗ All evidence

Summary

This study identifies a significant methodological concern for large-scale genetic epidemiology: latent geographic structure within the UK Biobank cohort creates coincident variation in both genotypes and health outcomes, which can bias causal inference from genetic data. The authors demonstrate that standard statistical approaches—including study centre adjustment and principal component analysis—inadequately account for this geographic confounding. The findings highlight the importance of recognising and accounting for population substructure when inferring genetic contributions to complex traits from large biobank studies.

UK applicability

This paper is directly applicable to UK research using UK Biobank data and other large British cohort studies. It provides essential guidance for researchers conducting genetic epidemiology or Mendelian randomisation studies in the United Kingdom, emphasising the need for more sophisticated geographic adjustment methods when analysing UK Biobank and similar resources.

Key measures

Association between genetic variants and birth location; geographic structure in health outcomes; bias in genotype-phenotype associations attributable to latent geographic structure

Outcomes reported

The study demonstrated that single genetic variants and genetic scores are associated with birth location within UK Biobank, and that geographic structure in genotype data cannot be adequately accounted for using routine statistical adjustments. The analysis revealed that major health outcomes appear geographically structured, and that coincident structure in health outcomes and genotype data can yield biased associations in epidemiological inference.

Theme
Measurement & metrics
Subject
Measurement methods & nutrient profiling
Study type
Research
Study design
Observational cohort
Source type
Peer-reviewed study
Status
Published
Geography
United Kingdom
System type
Human clinical
DOI
10.1038/s41467-018-08219-1
Catalogue ID
BFmor3gaas-2njagk

Topic tags

Pulse AI · ask about this record

Dig deeper with Pulse AI.

Pulse AI has read the whole catalogue. Ask about this record, its theme, or how the findings apply to UK farming and policy — every answer cites the underlying studies.