Psychology of Individual Differences
What the science actually says about psychological variation — heritability, environmental shaping, gene-environment interaction, sex differences, cognitive capacity. A minefield of motivated reasoning where the actual generating functions are obscured by politics.
The first topic taken end-to-end through the LLM Iterate pipeline. Six stages, completed: lit review (the landscape), topology (dependency graph of what depends on what), model (variance decomposition + closed-form pieces + interactive dashboard), data (eight predictions tested against published consortium estimates), build (interactive explorer with 24 traits), writeup (long-form synthesis for an educated lay reader).
The headline finding is that heritability is real, replicated, and substantial across most psychological traits — but a sizable fraction of what gets called “genetic” in twin studies is actually environmental in origin, mediated through parents who transmit both the alleles AND the correlated rearing environment (a phenomenon called genetic nurture). Direct biological causation is genuine and important; it’s also typically smaller than the headline numbers suggest, especially for socially-structured traits like educational attainment, where the cleanest estimate of direct genetic effect is about one-third of what classical twin studies report.
If you want the full synthesis in prose, read the writeup — it’s the canonical end-to-end piece, written for an educated lay reader with all acronyms defined. If you want a hands-on tool, the explorer lets you pick any of two dozen traits and see the variance breakdown in three plain-language buckets (direct genes / family setup / environment + chance), plus the four motivated-reasoning traps the field gets caught in and the asymmetric environmental-effects finding. The model and data stages have the formal math and the empirical tests behind those numbers; the topology and lit review document the dependency structure and the underlying literature.
What the science actually shows about psychological variation — heritability, environmental shaping, gene-environment interaction, sex differences, cognitive capacity. The post-2010 genomic era confirmed mid-20th-century behavior genetics while demolishing the candidate-gene paradigm; assortative mating and genetic nurture are actively rewriting older interpretations.
TLDR
Virtually every measured psychological trait is moderately to substantially heritable, hyper-polygenic, and shaped by environments that are themselves partly genetic in origin. The post-2010 genomic era confirmed the core findings of mid-20th-century behavior genetics while simultaneously demolishing the candidate-gene paradigm that dominated psychiatry from 1996–2010. Twin heritability for psychological traits averages ~49% (Polderman et al. 2015); molecular GWAS increasingly accounts for this through thousands of tiny-effect common variants plus rarer large-effect variants in neurodevelopmental conditions.
A crucial methodological development since 2018 is the recognition that assortative mating and gene-environment correlation systematically inflate GWAS-derived estimates. Border et al. (2022, Science) showed that cross-trait assortative mating alone can account for substantial fractions of reported genetic correlations — including some psychiatric cross-disorder correlations previously attributed to shared biology. Kong et al.’s (2018) “genetic nurture” finding demonstrated that roughly half of population-level polygenic score prediction for educational attainment reflects environmentally-mediated parental effects, not direct genetic causation. These corrections don’t eliminate genetic influence — they reframe it.
The field’s most contested findings are not the ones most disputed in public discourse: heritability is settled science, the “parenting wars” are largely resolved, and the candidate-gene-by-environment literature has collapsed. What remains genuinely open is mechanistic — how genes build minds, why the gender equality paradox exists, what drives the Flynn Effect’s reversal, and whether between-population mean differences have any genetic component (a question currently unanswerable with available methods, not “settled” in either direction). The generating function for psychological variation is not “genes vs. environment” but a tightly coupled developmental system in which genetic predispositions, environments created by genetically-similar parents, assortative mating patterns, stochastic noise, and cultural context are deeply entangled.
This document is structured for someone building a formal model of psychological variation. Each section flags effect sizes, replication status, consensus, live debate, and ideological distortion from any direction.
1. Heritability: The Foundation Finding
The Polderman meta-analysis
Polderman et al. (2015, Nature Genetics) meta-analyzed 50 years of twin research — 17,804 traits, 14.5 million twin pairs, 2,748 publications — and reported a mean heritability across all human traits of 49%. Polderman et al., 2015. For ~69% of traits, simple additive ACE models fit cleanly.
Turkheimer’s Laws and the Fourth Law
Turkheimer’s Three Laws (2000) — all human behavioral traits are heritable; shared family environment is smaller than genes; substantial variance is explained by neither — were extended by Chabris et al. (2015) with the Fourth Law: a typical behavioral trait is associated with very many genetic variants of tiny effect. This emerged from the failure of candidate-gene studies and the polygenic architecture revealed by GWAS.
What heritability actually means (and doesn’t)
Heritability is a population statistic, not an individual one. Saying IQ is 70% heritable does not mean 70% of any person’s IQ comes from genes. It is not deterministic (height is ~80% heritable yet rose ~10cm in 20th-century Europe through nutrition) and not immutable (h² changes with environment — if all environments became identical, h² would approach 1.0). The most common misinterpretation collapses statistical variance partitioning into causal mechanism.
Twin and molecular estimates by domain
| Domain | Twin h² | SNP-h² | Largest GWAS | Loci | Best PGS R² |
|---|---|---|---|---|---|
| Adult IQ / g | 0.70–0.80 | ~0.20 | 269,867 (Savage 2018) | 205 | ~0.05 |
| Educational attainment | ~0.40 | ~0.13 | 3M (Okbay 2022) | 3,952 | 0.12–0.16 |
| Big Five (avg) | 0.40–0.60 | 0.05–0.18 | 449k (Nagel 2018) | 136 (N) | <0.05 |
| Political orientation | ~0.40 | — | — | — | — |
| Religiosity | 0.30–0.45 | — | — | — | — |
| Risk tolerance | ~0.30 | 0.05 | 1M (Karlsson Linnér) | 99 | <0.02 |
| Schizophrenia | 0.60–0.80 | 0.24 | 320k (Trubetskoy 2022) | 287 | 0.07–0.10 |
| Bipolar | 0.70–0.85 | 0.18–0.20 | 414k (Mullins 2021) | 64 | 0.04 |
| MDD | 0.35–0.40 | 0.09 | 807k (Howard 2019) | 102 | 0.02–0.03 |
| ADHD | 0.74 | 0.14 | 225k (Demontis 2023) | 27 | 0.04–0.06 |
| Autism | 0.80 | 0.12 | 46k (Grove 2019) | 5 | <0.03 |
Note: Political orientation and religiosity are included because they are among the few adult traits where shared family environment (C) remains substantial (~20–30%), unlike personality and cognition where C ≈ 0 by adulthood. See Alford, Funk & Hibbing (2005); Hatemi et al. (2014).
The Wilson Effect
Bouchard (2013) documented that IQ heritability rises with age — from ~20% at age 5 to ~80% by adulthood, with shared-environment effects dropping from ~55% in early childhood to roughly zero by adolescence. Bouchard 2013. Briley & Tucker-Drob (2013) explained the mechanism: early genetic effects are amplified across development through gene-environment correlation (niche-picking). Briley & Tucker-Drob 2013. This finding is robust, replicated, and counterintuitive — genetic differences become more expressed as people age into self-selected environments.
The “missing heritability” problem
The gap between twin h² (~0.70 for IQ) and SNP-h² (~0.20) launched a decade of debate. Wainschtein et al. (2022, Nature Genetics) essentially closed it for height: using whole-genome sequencing in 25,465 unrelated individuals, h² recovered to 0.68 when rare and low-LD variants were included. Wainschtein et al. 2022. For psychological traits the same pattern is emerging. The current synthesis: missing heritability is partly real (rare variants, dominance, GxE) and partly artifactual (twin overestimation from assortative mating and rGE, measurement noise).
Assortative mating: a pervasive inflation source
Assortative mating (AM) — the tendency for partners to resemble each other on traits — has emerged as a major methodological concern. People mate assortatively on education (spousal r ≈ 0.40–0.60), IQ (~0.40), personality (~0.10–0.20), height (~0.20), and psychiatric conditions. AM has three consequences for genetic estimates:
- Inflated heritability: AM increases additive genetic variance across generations by creating linkage disequilibrium among causal variants. Most twin studies underestimate heritability by ignoring AM (counterintuitively); GWAS-based SNP-h² may be inflated by AM-induced LD. Border et al. 2022, Nat Commun.
- Inflated genetic correlations: Border et al. (2022, Science) introduced cross-trait assortative mating (xAM) and showed that phenotypic cross-mate correlations explain R² = 74% of the variance in reported genetic correlation estimates. Some psychiatric cross-disorder genetic correlations — previously interpreted as evidence of shared biology — may be largely or entirely attributable to xAM. Border et al. 2022.
- Inflated PGS prediction: Within-family PGS effects are roughly half of population-level effects for educational attainment (Okbay 2022), partly because AM and population stratification inflate between-family comparisons.
Plomin (2022, Behav Genet) argues this is a prediction-vs-explanation distinction: AM inflates causal genetic estimates but doesn’t invalidate PGS as predictors, since AM-induced variance is real population variance. Plomin 2022. This is technically correct but sidesteps the question of why PGS predict — whether through direct genetic causation or through correlated environments created by assortatively-mating parents.
The genetic nurture revolution
Kong et al. (2018, Science) used 21,637 Icelandic probands with parental genotypes to compute polygenic scores from non-transmitted parental alleles. The non-transmitted PGS predicted offspring educational attainment at ~30% the magnitude of transmitted PGS — meaning parental genotypes shape children via environments they create, even for alleles never inherited. Kong et al. 2018. Okbay et al. (2022, EA4, Nature Genetics) confirmed this in 3 million people: within-family direct effects are roughly half the population-level PGS magnitude. Okbay et al. 2022. The implication: GWAS effect sizes for socially-valued traits are inflated by indirect/dynastic effects, and roughly half of what we used to call “genetic transmission” is actually environmentally mediated by genetically-similar parents.
2. Environmental Shaping: Real, But Smaller and Weirder Than Common Sense Suggests
The shared-vs-non-shared distinction is the single most disorienting finding for laypeople. Across hundreds of twin and adoption studies, the shared family environment (C) accounts for ~0% of variance in adult personality and most adult cognition (Plomin & Daniels 1987; Bouchard & McGue 2003). Parental warmth, parenting style, dinner conversations, books in the home — once genetic transmission is controlled, almost none of this leaves a measurable trace on adult personality. Important exceptions where C remains substantial: educational attainment (~20%), antisocial behavior, religiosity (~25%), political orientation (~20–30%), and childhood (but not adult) externalizing.
What “non-shared environment” actually is
Turkheimer & Waldron’s (2000) meta-analysis of measured non-shared environmental predictors found these accounted for only ~2% of variance in outcomes. Plomin’s recent verdict: non-shared environment is “real but largely random,” more akin to stochastic developmental noise — differential peer experiences, illness, idiosyncratic events, measurement error — than systematic experience.
The Equal Environments Assumption
Critics (Joseph, Charney; Fosse et al. 2015) note that MZ twins are treated more similarly than DZ twins. The central empirical defense: Kendler et al. (1993) showed misperceived-zygosity twins had phenotypic similarity tracking true zygosity. Kendler et al. 1993. MZ-reared-apart correlations (Bouchard’s Minnesota study) closely match MZ-reared-together correlations. Felson (2014) reanalysis: EEA is “not strictly valid, but bias is modest.” Modern SNP-based heritability estimates entirely bypass EEA and give somewhat lower but still substantial h². Verdict: EEA is approximately valid; bias is modest (~10–20% inflation) for most traits.
Environmental factors with robust causal effects on cognition
A small number of environmental insults have large, replicated, causal effects — typically asymmetric (removing severe deficits matters more than enrichment above normal):
- Lead exposure: Lanphear et al. (2005) pooled 7 prospective cohorts: blood lead 1→10 µg/dL → −6.2 IQ points, with steeper slope at low concentrations. Causal status: strong.
- Severe iodine deficiency: 8–12 IQ point cost; supplementation recovers ~8.7 points (Bougma 2013; Qian 2005). RCT-supported, globally replicated.
- Heavy prenatal alcohol: FAS produces mean IQ ~70; Mendelian-randomization confirms causation.
- Schooling: Ritchie & Tucker-Drob (2018) meta-analyzed 142 effect sizes / 600,000 participants across three quasi-experimental designs. Each year of education raises IQ by 1–5 points (mean ~3.4), persisting into old age. The most consistent durable IQ-raising intervention identified. Ritchie & Tucker-Drob 2018.
- Air pollution (PM2.5): ~−0.27 IQ points per 1 µg/m³ (Aghaei 2024); smaller per-unit than lead but exposure is widespread.
The Scarr-Rowe interaction (SES × heritability)
Turkheimer et al. (2003) reported that in impoverished families, IQ heritability was ~10% with shared environment ~60%; in affluent families this reversed. Turkheimer et al. 2003. Tucker-Drob & Bates (2016) meta-analysis: replicated in U.S. samples but absent in Western European/Australian samples — likely because more universal healthcare/education reduces environmental variance at the bottom. Status: real but context-dependent and U.S.-specific.
Parenting effects: the Harris correction, partially reversed
Judith Rich Harris (1995; The Nurture Assumption 1998) argued that within-normal-range parenting has minimal long-term effects on adult personality. The empirical core was correct: C ≈ 0 for adult personality. But Harris overstated her case. Korean-American adoption studies (Sacerdote 2007; Beauchamp et al. 2023) show real but modest causal effects of family environment on educational attainment, BMI, drinking, smoking — transmission coefficients ~25% of biological-family magnitude. Severe deprivation/abuse causes clear damage. The accurate position: within the normal Western range, parenting style has small effects on adult personality; family environment has measurable but modest effects on attainment outcomes; severe parenting variation matters substantially.
Neighborhood and peer effects
Chetty & Hendren (2018, QJE) used 5+ million U.S. cross-county movers with sibling fixed effects: each year of childhood exposure to a 1-SD better county raises adult income by ~0.5–0.7%. Chetty & Hendren 2018. Moving to Opportunity reanalysis (Chetty, Hendren & Katz 2016): children moving before age 13 had adult earnings 31% higher than controls. Place matters, but accumulates slowly across many years of exposure.
The Flynn Effect and its reversal
Flynn (1984, 1987) documented ~3 IQ points/decade gains across the 20th century. Causes contested: nutrition, schooling, infectious-disease reduction, test sophistication, smaller families — no single mechanism established. Bratsberg & Rogeberg (2018, PNAS) used Norwegian within-family conscript data to demonstrate that both the Flynn Effect and its post-1990s reversal are environmentally driven (visible within sibships, ruling out dysgenic/compositional explanations). Bratsberg & Rogeberg 2018. Similar declines now reported in Denmark, Finland, the Netherlands, France, the UK, and Germany. The reversal’s cause is unknown — this is one of the field’s most important open questions.
3. Gene-Environment Interplay: rGE Wins, Candidate-GxE Collapsed
Three types of gene-environment correlation
The Plomin/DeFries/Loehlin (1977) framework distinguishes passive rGE (parents transmit both genes and correlated rearing environment), evocative rGE (heritable child traits elicit specific responses), and active rGE / niche-picking (individuals select environments matching genetic propensities). Kendler & Baker’s (2007) systematic review shows essentially every measured environment is itself heritable (15–35%) — meaning observational claims like “parental warmth causes child outcomes” are confounded by passive rGE. The genetic nurture and within-family PGS results (Section 1) quantify this: population-level “genetic” prediction is roughly half indirect environmental effects of genetically-similar parents.
The candidate gene × environment collapse
Caspi et al. (2003, Science) reported that 5-HTTLPR short-allele carriers showed elevated depression risk under stress. The paper became one of the most cited in psychiatry (>9000 citations). It collapsed:
- Risch et al. (2009, JAMA): meta-analysis of 14 studies, N=14,250 — no evidence. Risch et al. 2009.
- Culverhouse et al. (2018): pre-registered collaborative meta-analysis, 31 datasets, N=38,802 — definitively no evidence.
- Border et al. (2019, Am J Psychiatry): examined 18 most-studied depression candidate genes in N up to 443,264. No clear evidence for any candidate gene polymorphism on depression. As a set, candidate genes were no more associated with depression than non-candidate genes. Border et al. 2019.
- Duncan & Keller (2011): 96% of novel candidate-GxE studies were significant; only 27% of replication attempts were. Duncan & Keller 2011.
MAOA × maltreatment (Caspi et al. 2002) is the partial exception that survived meta-analysis (Byrd & Manuck 2014) — modest male-specific interaction, but smaller than originally reported.
Differential susceptibility / orchid-dandelion
Belsky & Pluess (2009) reframed “risk alleles” as “plasticity alleles” — some individuals are more reactive to environments “for better and for worse.” Belsky & Pluess 2009. The theory is generative; the empirical record is mixed. Recent systematic reviews find that interactions between child characteristics and parenting rarely replicate across cohorts and developmental domains. Distinguishing differential susceptibility from diathesis-stress requires very large, preregistered samples. de Villiers et al. 2018.
Epigenetics: real biology, oversold psychology
DNA methylation, histone modifications, and non-coding RNA regulation are real, well-characterized mechanisms important in development. The controversy concerns whether environmentally-induced epigenetic marks are faithfully transmitted across generations in humans. They generally are not.
- Heard & Martienssen (2014, Cell): in mammals, two waves of near-complete epigenetic reprogramming erase most acquired methylation marks. Robust transgenerational epigenetic inheritance occurs in plants and C. elegans; in humans it remains largely speculative. Heard & Martienssen 2014.
- Dutch Hunger Winter (Heijmans et al. 2008): real within-individual epigenetic effect persisting decades, not evidence of transmission to grandchildren.
- Yehuda’s Holocaust FKBP5 study (2016): tiny sample (n=8 control parents), opposite-direction effects in parents vs. offspring, no germline measurement. Yehuda’s own group failed to replicate. The “trauma is inherited epigenetically” narrative is not supported by current evidence.
Critical periods: solid developmental neuroscience
Hensch (2005, Nat Rev Neurosci) provides a mechanistically rigorous account of cortical critical-period plasticity. Hensch 2005. GABAergic maturation (parvalbumin-positive interneurons) gates onset; perineuronal nets and myelin-associated inhibitors close periods. This represents the high end of how environmental experience shapes brain structure — genuine, replicated, and mechanistically understood.
4. Sex and Gender Differences: Large Where You’re Not Told They Are
Sex differences are one of psychology’s most ideologically distorted areas — distorted by both minimization and overstatement. The actual picture: small differences in average cognitive ability, large differences in interests and physical aggression, moderate-to-large multivariate personality differences, and a robust but mechanistically contested gender equality paradox.
Cognitive abilities
Mental rotation shows d ≈ 0.56–0.73 male advantage (Voyer et al. 1995), among the largest cognitive sex differences documented. Mean math performance: d ≈ 0.05–0.10 (Lindberg et al. 2010) — essentially no average difference. Writing: substantial female advantage. School grades favor girls overall (Voyer & Voyer 2014). At extreme tails (95th–99th percentile) males outnumber females ~2:1 in many countries — driven by slightly greater male variance (~3–15% higher) compounding at extremes.
Personality: univariate vs. multivariate framing
Univariate Big Five differences are moderate: women higher on Neuroticism (d ≈ 0.40) and Agreeableness (d ≈ 0.40). Del Giudice, Booth & Irwing (2012) computed multivariate Mahalanobis D = 2.71 on 16PF data from 10,261 Americans, implying ~10% overlap between male and female personality profiles. Del Giudice et al. 2012. Hyde’s (2005) “Gender Similarities Hypothesis” — most differences trivial or small — is mathematically compatible but tells a very different qualitative story. Both univariate and multivariate framings should be reported jointly; selective use is ideological.
Interests: the largest sex difference in psychology
Su, Rounds & Armstrong (2009, Psych Bulletin) meta-analyzed 503,188 people: the People-Things dimension d = 0.93, with engineering interest d = 1.11. Su et al. 2009. These are very large by psychological standards and the largest in the entire literature on psychological sex differences.
Aggression
Archer (2004): physical aggression d ≈ 0.40–0.60 male; trait anger near zero. Males commit ~95% of homicides globally. Archer 2004. Indirect/relational aggression: Card et al. (2008) found differences trivial (d < 0.10), challenging the “girls do indirect aggression equally” narrative.
The Gender Equality Paradox (replicated; mechanism contested)
A robust empirical pattern across at least four domains: personality, preference, interest, and depression-rate differences are larger in more gender-equal and wealthier countries.
- Schmitt et al. (2008): 55-nation Big Five study — differences largest in egalitarian Western cultures. Schmitt et al. 2008.
- Falk & Hermle (2018, Science): 80,000 adults, 76 countries — sex differences in 6 economic preferences positively related to GDP and gender equality.
- Stoet & Geary (2018): STEM Gender-Equality Paradox — more gender-equal countries had smaller female share of STEM graduates. A corrigendum addressed methods; the core correlation remained robust.
The correlation is robust. The causal mechanism — innate-expression release in wealthy environments vs. measurement artifacts vs. ecological confounds — is genuinely contested.
Mental health asymmetries
Depression female:male ≈ 2:1; anorexia ~10:1 female; ADHD diagnosis ~2–3:1 male; antisocial personality, substance use, completed suicide all male-skewed; autism ~3–4:1 male; schizophrenia roughly equal but more severe early-onset in males.
Biological mechanisms
CAH girls (prenatally elevated androgens) show masculinized toy preferences and play patterns (Kung et al. 2024 meta-analysis). Same-sex-typed toy preferences in vervet and rhesus monkeys parallel human findings, supporting partial biological mediation. Wood & Eagly’s social role theory faces empirical challenge from the gender equality paradox.
5. Cognitive Ability and Intelligence
The g-factor
Spearman’s 1904 finding of a positive manifold — every cognitive test correlates positively with every other — is arguably the most replicated finding in psychology. A first unrotated principal factor captures 40–50% of variance in any sufficiently broad battery. van der Maas et al. (2006) mutualism model offers an alternative: g may be an emergent network property of reciprocally beneficial cognitive processes during development, not a unitary biological cause. van der Maas et al. 2006. Most working researchers treat g as a robust statistical regularity whose causal architecture is unsettled.
Structure: CHC theory
Carroll’s (1993) three-stratum theory — g at top, ~8–10 broad abilities (Gf, Gc, Gv, Ga, Gs, Gsm, Glr, Gq, Grw), ~70+ narrow abilities — was integrated with Cattell-Horn into the Cattell-Horn-Carroll (CHC) framework, which underlies modern IQ tests.
Predictive validity
Schmidt & Hunter (1998): corrected GMA validity for job performance r ≈ 0.51. Sackett et al. (2022) argued corrections were too aggressive; re-estimate: r ≈ 0.31 uncorrected / ~0.42 corrected. GMA remains among the most predictive selection tools. Childhood IQ predicts educational attainment at r ≈ 0.50–0.70. Calvin et al. (2011) meta-analysis (1.1M, 22,453 deaths): each 1-SD higher childhood IQ → ~24% lower all-cause mortality. Calvin et al. 2011.
Lifespan stability
Lothian Birth Cohort: age 11 → age 90 corrected correlation r ≈ 0.67. Deary et al. 2013. About one-third of variance in mental ability at 90 is accounted for by ability at 11.
Group differences in test scores: the most distorted area
Roth et al. (2001) meta-analysis (N=6.2M): U.S. Black-White cognitive ability gap d ≈ 1.0 (~15 IQ points). Dickens & Flynn (2006): Black IQ rose 4–7 points relative to whites between 1972–2002 (about one-third of the gap). Dickens & Flynn 2006. The gap exists, has narrowed somewhat, and has not closed.
The mainstream contemporary position (Nisbett et al. 2012; Turkheimer, Harden & Nisbett 2017): within-group heritability does not license between-group inferences (Lewontin’s point); Martin et al. (2019, Nature Genetics) demonstrated PGS lose ~4.5x prediction accuracy in African-ancestry individuals due to differential LD and allele frequencies, meaning current PGS cannot validly compare mean genetic predisposition across continental ancestry groups. Mostafavi et al. (2020) showed PGS portability also breaks down within Europeans across SES strata.
The honest scientific position: gaps in test scores are real, partly narrowing, and their causes are not currently identifiable as genetic, environmental, or both — direct evidence is absent and mainstream geneticists treat the question as not currently answerable.
Distortion from the hereditarian direction: treating g-loadedness as evidence of genetic etiology (environmental causes can also be g-loaded); citing fringe admixture studies published in weak-peer-review venues; conflating absence of evidence with agnosticism. Distortion from the environmentalist direction: claiming gaps have closed when they only partly narrowed; dismissing IQ as “culturally biased” despite measurement-invariance evidence; overstating stereotype threat (Flore & Wicherts 2015 meta-analysis showed publication-biased modest effects).
Brain correlates
Brain volume × IQ: r ≈ 0.24 (Pietschnig et al. 2015, 2022). P-FIT theory (Jung & Haier 2007): intelligence supported by parieto-frontal network. Jung & Haier 2007.
Creativity and intelligence
The “IQ ≈ 120 threshold” hypothesis is largely disconfirmed (Weiss et al. 2020). Intelligence and creativity correlate ~r = 0.20–0.30 across the range. Openness to Experience is the personality trait most reliably correlated with creative achievement (~0.30–0.40).
6. Personality and Temperament
The Big Five (OCEAN) and HEXACO
The Big Five emerged from the lexical hypothesis. Heritability is ~40–60% per twin studies; SNP-h² is 8–18%. Nagel et al. (2018) identified 136 loci for neuroticism in 449,484 people. Nagel et al. 2018. The ReGPC consortium (2025) reports 703 loci for neuroticism in 1M+ participants. ReGPC 2025.
Roberts & DelVecchio (2000): rank-order stability rises from ~0.31 in childhood to ~0.74 by midlife (cumulative continuity). Roberts & DelVecchio 2000. Roberts et al. (2006) the maturity principle: mean-level increases in Conscientiousness, Agreeableness, and Emotional Stability with age, especially in young adulthood. Bleidorn et al. 2022 update.
HEXACO (Ashton & Lee): lexical studies in 12+ languages consistently yield six factors, the sixth being Honesty-Humility. H predicts integrity-related criteria incrementally over Big Five. Ashton & Lee 2008.
Temperament: the developmental foundation
Temperament research constitutes a parallel tradition to adult personality, focused on biologically-grounded individual differences emerging in infancy.
Rothbart’s model identifies three overarching dimensions: Surgency/Extraversion (activity, positive affect, approach), Negative Affectivity (fear, anger, sadness, discomfort), and Effortful Control (attentional regulation, inhibitory control, low-intensity pleasure). Effortful Control is particularly important — it is the self-regulatory component of temperament, developing primarily during ages 2–7 as the anterior attention network matures, and is a strong predictor of later externalizing problems, academic success, and conscience development.
Kagan’s Behavioral Inhibition (BI) framework focuses on extreme phenotypes: ~15–20% of infants show high-reactive patterns (vigorous motor activity and distress to novel stimuli at 4 months) who become behaviorally inhibited toddlers — cautious, avoidant with unfamiliar people and situations. BI maps approximately onto low Surgency + high Negative Affectivity (especially fear). Kagan’s longitudinal studies showed BI is moderately heritable (~50%), associated with higher resting heart rate and amygdala excitability, and predicts elevated risk for social anxiety disorder in adolescence (OR ~2–4). However, ~60% of high-reactive infants do not become clinically anxious adults — biology is a foundation, not a constraint.
Thomas & Chess’s (1977) “goodness of fit” model — later empirically supported — emphasized that temperamental difficulty per se doesn’t predict poor outcomes; the match between child temperament and environmental demands does.
The temperament → personality continuity is increasingly well-documented: infant Surgency maps onto adult Extraversion; infant Negative Affectivity onto Neuroticism; infant Effortful Control onto Conscientiousness. The mapping is imperfect — adult personality includes social-cognitive layers (identity, values, narrative) absent in temperament.
Cross-cultural universality
McCrae & Terracciano (2005): clean Big Five replication in 50 cultures. McCrae & Terracciano 2005. Gurven et al. (2013) challenged this with the Tsimane forager-horticulturalists, where the full Big Five did not robustly emerge. Gurven et al. 2013. Consensus: 3 factors (E, A, C) replicate cross-linguistically; the full Big Five replicates well in Indo-European languages; non-WEIRD samples sometimes show structural deviations.
Dark traits and the D factor
Paulhus & Williams (2002): the Dark Triad (Machiavellianism, narcissism, psychopathy). Buckels, Jones & Paulhus (2013) added everyday sadism. Moshagen, Hilbig & Zettler (2018) proposed the D factor — a general tendency to maximize individual utility while disregarding others — as the common core, mapping strongly onto low Honesty-Humility. Moshagen et al. 2018.
Person-situation debate: resolved
The Mischel (1968) critique — cross-situational consistency rarely exceeds r ≈ 0.30 — was resolved through aggregation, interactionism (Mischel & Shoda’s CAPS model), and Fleeson’s within-person variability framework. The modern consensus: persons, situations, and their interactions all matter.
Personality predicts outcomes as strongly as IQ and SES
Roberts et al. (2007) “The Power of Personality”: meta-analytic comparison shows personality effects on mortality, divorce, and occupational attainment are indistinguishable in magnitude from SES and cognitive ability effects. Roberts et al. 2007. Conscientiousness predicts mortality through health behaviors with large effect size. Bogg & Roberts 2004.
Recent theoretical developments
DeYoung’s Cybernetic Big Five Theory (2015): traits as parameters of a cybernetic goal-pursuit system. DeYoung 2015. Mõttus et al. (2017): “personality nuances” research argues item-level traits capture incremental valid variance below the facet level. Mõttus et al. 2017.
7. Neurodiversity and Psychopathology: Dimensional, Polygenic, Transdiagnostic
The genomic era has produced three conclusions that fundamentally reshape psychiatric nosology: all major psychiatric conditions are highly heritable, hyper-polygenic, and substantially genetically overlapping across diagnostic categories.
Headline findings by disorder
- Schizophrenia: twin h² ~80%; Trubetskoy et al. (2022, Nature): 287 loci; SNP-h² ~24%. Trubetskoy et al. 2022. Environmental risk factors: urban birth (~2× risk), high-potency cannabis (OR ~3.9), migration, obstetric complications.
- Bipolar: twin h² ~70–85%; Mullins et al. (2021): 64 loci; rg(SCZ,BD) ~0.7. Mullins et al. 2021.
- Major Depression: h² ~37%; Howard et al. (2019): 102 loci; SNP-h² ~9%. Howard et al. 2019. Strong rg with neuroticism (~0.7).
- ADHD: twin h² ~74%; Demontis et al. (2023): 27 loci. Demontis et al. 2023. Negative rg with educational attainment and IQ.
- Autism: twin h² ~80%; Grove et al. (2019): 5 common-variant loci plus substantial rare/de novo variants of large effect (CHD8, SCN2A, SYNGAP1). Grove et al. 2019. Common-variant PGS positively correlated with IQ and education; ID-comorbid autism (rare-variant-driven) negatively correlated.
Cross-disorder pleiotropy (with assortative mating caveat)
Brainstorm Consortium (2018, Science): substantial genetic correlations among psychiatric disorders. Cross-Disorder PGC (Lee et al. 2019, Cell): across 8 disorders, 109 pleiotropic loci, three clusters — compulsive, mood/psychotic, early-onset neurodevelopmental. Lee et al. 2019.
Critical caveat: Border et al. (2022, Science) showed that cross-trait assortative mating can generate spurious genetic correlations between phenotypes with entirely distinct genetic bases. Some fraction of reported psychiatric cross-disorder genetic correlations may reflect xAM rather than shared biology. The magnitude of this artifact is actively being quantified and represents a major revision in progress.
The p-factor
Caspi et al. (2014): a single p (general psychopathology) factor fit Dunedin cohort data better than three-factor models — analogous to g for cognitive ability. Caspi et al. 2014. Higher p associated with greater impairment, familiality, worse developmental histories. Replicated in dozens of samples. Interpretations contested: genuine common liability, statistical artifact of bifactor over-extraction, or a reflection of impairment/distress per se.
Dimensional alternatives: HiTOP and RDoC
HiTOP (Kotov et al. 2017): a quantitatively-derived dimensional alternative to DSM organized hierarchically. RDoC (NIMH 2009–): six dimensional neurobiologically-grounded research domains. Both converge with taxometric evidence (most psychopathology is dimensional, not taxonic) on the dimensional turn in psychiatric science.
Polygenic scores in clinics: not yet
Best PGS R² ~7–10% for schizophrenia. PGS alone does not outperform family history. PGS performance drops 50–70% in non-European-ancestry populations — a major equity and portability problem.
The neurodiversity framework: scientific–identity tensions
Coined by Singer (1998), the neurodiversity paradigm reframes autism, ADHD, dyslexia as natural variation rather than pathology. The framework has legitimate ethical force but operates in tension with deficit-oriented findings for severe presentations (profound autism with ID, epilepsy, self-injury). A defensible position recognizes both the reality of impairment at the severe end and the population-level continuous variation that grades into normality.
8. Key Researchers and Labs
| Researcher | Affiliation | Central contribution |
|---|---|---|
| Robert Plomin | King’s College London | Behavioral genetics synthesis; Blueprint; GPS |
| Eric Turkheimer | University of Virginia | Three Laws; Scarr-Rowe; philosophical foundations |
| K. Paige Harden | UT Austin | Genetic Lottery; causal inference with PGS |
| Avshalom Caspi / Terrie Moffitt | Duke / King’s | p-factor; Dunedin cohort; (and candidate-GxE) |
| Ian Deary | Edinburgh | Lothian Birth Cohorts; cognitive epidemiology |
| Elliot Tucker-Drob | UT Austin | Education-IQ meta-analysis; Wilson Effect mechanisms |
| Daniel Benjamin / SSGAC | UCLA | EA GWAS consortium; social-science genomics |
| Colin DeYoung | Minnesota | Cybernetic Big Five Theory; personality neuroscience |
| Jay Belsky | UC Davis | Differential susceptibility |
| Marco Del Giudice | UNM | Multivariate sex differences |
| Janet Hyde | Wisconsin | Gender similarities hypothesis |
| David Geary | Missouri | Sex differences in math/STEM |
| Brent Roberts | UIUC | Personality development; maturity principle |
| Alexander Young | UCLA | Genetic nurture; within-family methods |
| Richard Border | Harvard/UCLA | Candidate gene demolition; xAM |
| Peter Hatemi / John Hibbing | Penn State / Nebraska | Genopolitics; heritability of political attitudes |
9. The Integrated Picture: What Generates Psychological Variation
The model
A formal model of individual psychological variation should treat the person as the joint product of:
(a) A hyper-polygenic genome encoding thousands of small-effect predispositions (plus some rare large-effect variants in neurodevelopmental conditions). Twin h² for most traits falls in 0.40–0.80.
(b) Substantial gene-environment correlation through passive (parents transmit genes + correlated environments), evocative (child traits elicit responses), and active (niche-picking) channels. Roughly half of population-level PGS prediction reflects indirect/environmental mediation by genetically-similar parents, not direct genetic causation.
(c) Assortative mating inflating additive genetic variance, genetic correlations between traits, and PGS prediction accuracy. This is a recently-quantified source of systematic bias in nearly all genetic estimates.
(d) A small set of large-effect environmental insults — lead, severe iodine deficiency, heavy prenatal alcohol, severe deprivation — plus schooling (~3.4 IQ points/year). Effects are typically asymmetric: removing severe deficits matters more than enriching above-normal environments.
(e) Substantial stochastic developmental noise — the dominant source of the non-shared environment, which accounts for ~50% of personality variance and is not yet well-characterized mechanistically.
(f) Cultural/institutional contexts that modulate which genetic predispositions are expressed and rewarded (WEIRD effects, gender equality paradox, Scarr-Rowe interaction, Flynn Effect).
(g) Developmental unfolding across time — temperament in infancy (biologically grounded reactivity and regulation) becomes personality in adulthood (adding social-cognitive layers), with heritability increasing across the lifespan (Wilson Effect) and rank-order stability rising to ~0.74 by midlife.
Where political distortion is strongest, by direction
From the environmentalist/blank-slate direction: dismissing twin study validity wholesale; overstating Scarr-Rowe; promoting transgenerational epigenetic narratives that exceed evidence; dismissing IQ as culturally biased despite measurement-invariance findings; overstating stereotype-threat magnitudes; minimizing the gender equality paradox.
From the hereditarian direction: citing within-population heritability to license between-population genetic inferences; citing fringe admixture studies as if mainstream; treating g-loadedness of gaps as evidence of genetic etiology when environmental causes can also be g-loaded; ignoring the assortative mating and genetic nurture corrections to PGS.
From the “gender similarities” direction: selective citation of d ≈ 0.05 for math to imply no differences anywhere; obscuring multivariate D ≈ 2.71 with univariate framing; minimizing d ≈ 0.93 people-things interest differences.
From popular evolutionary psychology: treating dimensional differences as taxonic; extrapolating from small ds to categorical claims; overgeneralizing from specific tasks to broad domain claims.
Open questions worth modeling
- Mechanistic interpretation of PGS: Plomin’s “causal genetic” view vs. Turkheimer’s “weak genetic explanation” — genuinely open.
- Flynn Effect reversal: cause unknown; one of the most important open questions in differential psychology.
- Gender equality paradox mechanism: innate-expression release vs. measurement artifacts vs. wealth confounds — unsettled.
- Between-population cognitive differences: currently scientifically unanswerable (PGS portability too poor; cross-ancestry GWAS at scale don’t exist). Honest position: unresolved, not settled in either direction.
- The causal architecture of g: latent common cause vs. emergent network property (mutualism) — the positive manifold is not in dispute; what generates it is.
- What non-shared environment actually is: stochastic noise, epigenetic variation, immune/microbial variation, differential peer networks — largely uncharacterized despite accounting for ~50% of personality variance.
- Assortative mating correction magnitudes: how much do AM and xAM corrections change the substantive picture of genetic architecture and cross-trait pleiotropy? Active area of revision.
10. Load-Bearing Assumptions and Falsification Conditions
This section makes explicit which conclusions in this review depend on which assumptions, and what evidence would substantially revise or flip them. Ordered roughly by how much of the document’s picture collapses if the assumption fails.
Assumption 1: The twin method provides approximately valid variance decomposition
What depends on it: Nearly all h² estimates in Section 1’s table, the C ≈ 0 finding for adult personality, the Wilson Effect, the Scarr-Rowe interaction.
Status: Approximately valid. SNP-h² estimates (which bypass EEA entirely) give lower but still substantial heritability for every trait measured. MZ-reared-apart designs converge with MZ-reared-together. Felson (2014) estimates ~10–20% EEA-induced inflation, not enough to eliminate the core finding.
What would flip it: SNP-h² for psychological traits systematically converging on <0.05 (would suggest twin h² is mostly EEA artifact). Or: a large, well-powered MZ-reared-apart study finding IQ correlations <0.40 (current estimates ~0.70). Neither has occurred.
Robustness verdict: HIGH. The convergence of twin, adoption, and molecular methods on moderate-to-substantial heritability is the most replicated finding in the field.
Assumption 2: GWAS identifies real genetic signal (not just population structure and AM artifacts)
What depends on it: The entire PGS enterprise, genetic nurture estimates, cross-disorder pleiotropy findings, the “missing heritability” narrative.
Status: Substantially valid but with known inflation. Within-family PGS effects are non-zero for educational attainment (~half of population effects), meaning direct genetic signal exists. But the magnitude of AM and stratification inflation is still being quantified.
What would flip it: Within-family PGS effects for most traits converging on ~zero (would mean population-level PGS prediction is entirely indirect/environmental). Current evidence: within-family effects are reduced but clearly non-zero for EA, BMI, height; less well-characterized for personality and psychiatric traits.
Robustness verdict: MODERATE-HIGH for the existence of direct genetic effects; MODERATE for their precise magnitude, which is actively being revised downward.
Assumption 3: g is a real dimension of individual variation (not a measurement artifact)
What depends on it: The entire intelligence section (Section 5), predictive validity claims, group-difference discussions, the CHC structure.
Status: The positive manifold is among the most replicated findings in psychology. Whether g is a latent common cause or an emergent network property (mutualism) is unsettled, but both interpretations preserve g’s predictive validity and the meaningfulness of individual differences in general cognitive ability.
What would flip it: A sufficiently broad, well-constructed cognitive battery where the first principal component explains <15% of variance (would undermine the positive manifold). Or: successful interventions that consistently raise one cognitive ability while lowering others (would violate the manifold’s structure). Neither has been demonstrated.
Robustness verdict: HIGH for g as a statistical regularity with predictive validity. MODERATE for g as a unitary biological mechanism (mutualism remains a viable alternative).
Assumption 4: Sex-difference effect sizes from meta-analyses are not primarily measurement artifacts
What depends on it: The gender equality paradox, the claim that interest differences (d = 0.93) are among psychology’s largest, the multivariate personality finding (D = 2.71).
Status: Interest measures (Su et al. 2009) use well-validated instruments; the d = 0.93 holds across inventories and cultures. The Del Giudice multivariate D is sensitive to the number of variables included and the specific battery, though the qualitative finding (large multivariate difference despite moderate univariate ds) is robust across datasets. CAH and non-human primate evidence provides independent convergent support for biological mediation of interest differences.
What would flip it: A large cross-cultural study using behavioral (not self-report) interest measures finding d < 0.30 for people-things. Or: evidence that the gender equality paradox disappears when using non-self-report personality measures (reference-group effects could inflate self-report differences in egalitarian countries). Current evidence: Falk & Hermle (2018) used incentivized behavioral measures for some preferences and found the paradox held, but full behavioral replication across all domains is incomplete.
Robustness verdict: HIGH for the existence of substantial sex differences in interests and aggression. MODERATE for the precise magnitude of multivariate personality differences. MODERATE for the gender equality paradox’s causal interpretation.
Assumption 5: The candidate-GxE collapse generalizes — specific gene × environment interactions are mostly small or nonexistent
What depends on it: Section 3’s dismissal of 5-HTTLPR and similar findings, the shift toward polygenic approaches.
Status: For candidate genes, the collapse is definitive (Border et al. 2019). But this does not necessarily mean polygenic-score × environment interactions are also null. PGS × environment work is younger, uses better methods, and could in principle yield robust results.
What would flip it: Multiple large, pre-registered PGS × measured-environment studies showing robust, replicable interactions explaining >5% of variance. Current evidence: a few suggestive findings (PGS-for-education × compulsory schooling reforms) but nothing approaching the scale or replication needed for confidence.
Robustness verdict: HIGH for the candidate-gene collapse. LOW-MODERATE confidence in the broader claim that specific GxE interactions are generally small — this is an extrapolation from the candidate-gene failure, and the polygenic GxE literature is too young to draw strong conclusions.
Assumption 6: Cross-disorder genetic correlations reflect shared biology (pleiotropy)
What depends on it: The p-factor interpretation, HiTOP structure, the “dimensional turn” in psychiatry, transdiagnostic treatment rationales.
Status: Substantially challenged by Border et al. (2022). Cross-trait assortative mating can generate spurious genetic correlations between traits with entirely distinct genetic bases. The R² = 74% finding means most of the variance in genetic correlation estimates tracks spousal phenotypic correlations — though this does not prove all genetic correlations are spurious (some genuine pleiotropy surely exists).
What would flip it: Within-family designs showing that cross-disorder genetic correlations survive AM correction at >50% of current estimates. Or: identification of specific shared biological pathways (e.g., synaptic pruning variants affecting both SCZ and BD) that don’t depend on LD induced by AM.
Robustness verdict: MODERATE. The dimensional/transdiagnostic pattern is likely real but inflated. The magnitude of genuine pleiotropy vs. AM artifact is one of the field’s most active methodological debates.
11. Toward Topology: Structure for the Next Phase
This section identifies the natural graph/network structure embedded in this literature, to facilitate the transition from landscape analysis to formal topology mapping.
Natural node types
- Trait nodes: Cognitive abilities (g, Gf, Gc, Gv, Gs…), personality dimensions (Big Five/HEXACO factors and facets), temperament dimensions (Surgency, Negative Affectivity, Effortful Control), psychopathology spectra (internalizing, externalizing, thought disorder), interests (people-things, RIASEC), political/moral attitudes
- Mechanism nodes: Genetic architecture (common polygenic, rare large-effect, de novo), environmental factors (lead, iodine, schooling, deprivation, neighborhoods), developmental processes (critical periods, niche-picking, genetic nurture, AM), stochastic noise
- Method nodes: Twin studies, adoption studies, GWAS, PGS, within-family designs, Mendelian randomization, meta-analysis
- Population-level modifier nodes: SES (Scarr-Rowe), culture (WEIRD), gender equality index, historical period (Flynn Effect)
Natural edge types
- Genetic correlations (with AM caveat): e.g., rg(SCZ, BD) ≈ 0.7; rg(EA, IQ) ≈ 0.7; rg(neuroticism, MDD) ≈ 0.7
- Developmental continuity: temperament → personality (Surgency → Extraversion; Effortful Control → Conscientiousness)
- Causal environmental effects: lead → IQ (−6.2 pts per 10 µg/dL); schooling → IQ (+3.4 pts/year)
- Predictive validity edges: g → job performance (r ≈ 0.42); Conscientiousness → mortality; EA PGS → income
- Methodological dependency: twin h² → SNP-h² → PGS R² (each constraining the next)
- Taxonomic hierarchy: g → broad abilities → narrow abilities (CHC); p → spectra → subfactors → syndromes (HiTOP)
- Moderation edges: SES × heritability (Scarr-Rowe); gender equality × sex differences (GEP); age × heritability (Wilson Effect)
Key structural features for the graph
- Two parallel hierarchies (CHC for cognition, HiTOP for psychopathology) that share genetic correlations at the top level (g correlates with p inversely)
- A developmental cascade from temperament (infancy) through personality (adulthood) through outcomes (mortality, income, relationships), with heritability increasing and shared-environment decreasing across the lifespan
- A methodological funnel from twin estimates (broadest, highest h²) through molecular estimates (narrower, lower h²) through within-family estimates (narrowest, lowest but most causally clean)
- Cross-domain genetic correlations that form a web connecting cognition, personality, and psychopathology — but with the critical caveat that an unknown fraction may be AM artifact rather than biological pleiotropy
Highest-leverage next steps for topology phase
-
Build the trait correlation matrix: Assemble published genetic correlations (from LD Score regression / GWAS) among the ~20–30 most well-characterized traits spanning cognition, personality, and psychopathology. Annotate each with AM-corrected estimates where available. This matrix is the empirical backbone of the topology.
-
Map the developmental cascade: Create a directed graph from temperament → personality → outcomes with age-indexed heritability and stability coefficients as edge weights. This captures the time dimension that a static correlation matrix misses.
-
Formalize the variance decomposition: For each major trait, create a standardized decomposition: [direct genetic] + [genetic nurture/indirect] + [AM-induced] + [shared environment] + [measured non-shared environment] + [stochastic residual]. Where values are unknown, flag them explicitly. This is the generating function skeleton that the formalization phase will flesh out.
Dependency graph of the lit review. Three categories of high-stakes node (foundational cruxes / reframer nodes / logical guardrails) plus weakest links, four variant views, three Stage-3 options, an objections section, and a glossary. Updated through 2024-2025 literature on AM correction, within-family GWAS at scale, PGS portability, Scarr-Rowe collapse, GEP replication, and missing-heritability closure.
TLDR
The lit review documents what the science says about psychological variation. This topology asks a sharper question: what depends on what? Strip the field down to its load-bearing structure and the picture is surprisingly clean. Three foundational assumptions — that twin/adoption methods give approximately valid variance decomposition (A1), that GWAS signal reflects real genetic effects rather than population structure or assortative-mating artifact (A2), and that a general factor of cognitive ability g exists as a real dimension of individual variation (A3) — sit upstream of most of the empirical and synthesis nodes in the graph; if any one of them flipped, large regions would have to be rebuilt. Everything else is either an empirical claim resting on these foundations, a methodological prerequisite that lets the foundations be tested, a logical necessity that constrains how the empirical claims can be interpreted, or a generating mechanism that explains why the empirical pattern looks the way it does.
The high-stakes nodes split into three categories — keeping them separate is the single most useful conceptual move in this topology. Foundational cruxes (A1 twin validity, A2 GWAS signal real, A3 g exists) are the assumptions that, if falsified, force rebuilding regions of the picture. Reframer nodes (G2/E6 passive rGE / genetic nurture; G6/E7 cross-trait assortative mating) don’t break the picture if reversed — they change what it means; their magnitudes are being actively quantified and their precise share of population-level “genetic” effects is the field’s most consequential open quantity. Logical guardrails (L1 variance-ratio definition; L4 Lewontin firewall) cannot be falsified — they can only be ignored, which is exactly how most public-discourse misuse of the field proceeds. Conflating these three types under a single label of “important findings” is a major source of bad-faith debate.
The field’s weakest links are not where public discourse focuses heat. Mainstream contests over “is heritability real” target settled findings (A1+E1 are robust); the 2025 whole-genome-sequencing work (Wainschtein et al. 2025, Nature) now closes ~88% of the pedigree-based heritability gap, so the “missing heritability” critique is also substantially answered. The actual fragile zones in 2026 are: (a) the generalization from candidate-GxE failure to all-GxE-is-small — partially holding the null (Allegrini 2020 for education, 2025 systematic review for depression) but the literature is still too young for confidence; (b) Scarr-Rowe has weakened further — Ghirardi et al. 2024 found 39/42 PGI×SES interactions in the opposite (compensatory) direction, so “deprivation suppresses heritability” is now evidence-thin; (c) the polygenic-score → mechanism inference (Plomin’s “causal” view vs. Turkheimer’s “weak explanation” view) remains genuinely undecided; (d) the magnitude of AM-correction across psychiatric cross-disorder rg estimates is now being actively addressed — Ma, Wang, Border et al. 2024 (LAVA-Knock) is the first method to systematically reduce xAM-induced bias, with the field-wide answer likely in 2–3 years. The Flynn-reversal cause and the Gender Equality Paradox mechanism remain open mechanistic questions, but they are open in a different way — the empirical patterns themselves are robust; only the explanation is contested. The 2025 GEP systematic review (Herlitz et al.) actually strengthened the pattern across personality, verbal abilities, episodic memory, and negative emotions.
This topology is the input to model formalization (Stage 3). The cleanest formalization target is the variance decomposition equation: V(trait) = V(direct genetic) + V(genetic nurture / indirect) + V(AM-induced LD) + V(shared environment) + V(measured non-shared) + V(stochastic) + 2·Cov(genes, environment) + interaction terms — with each term parameterized by trait, age, and population context, and with the AM and rGE terms being where current methodological revision is concentrated. The four variant views below (Vulnerability / Flow / Minimal / Politicization) read the same graph through different lenses to make the formalization choices easier.
The graph
All ~50 nodes and their dependencies. Click a node for detail; drag to rearrange.· drag empty space to pan · scroll to zoom
Click a node for its claim and load-bearing weight; hover an edge for the relation type; drag to rearrange. The variant toggles read the same graph through different lenses (vulnerability, flow, minimal claim set, politicization).
How to read this graph
Every node in the lit review collapses to one of eight types. Edges between them carry one of seven relations. Together they make the structure inspectable.
Node types
| Code | Type | What it is |
|---|---|---|
| A | Foundational assumption | A claim the field cannot operate without; if false, large downstream regions collapse |
| M | Methodological prerequisite | A study design or estimation tool that must work for the empirical claims to be testable |
| E | Empirical claim | A specific measured finding with an effect size and replication status |
| L | Logical necessity | Follows from definitions or algebra; not empirically refutable |
| G | Generating mechanism | A causal process that explains a pattern (rGE, AM, niche-picking, critical periods) |
| S | Synthesis claim | An integrative statement combining multiple lower-level claims |
| O | Open question | Genuinely undecided with current methods or evidence |
| D | Distortion vector | Where motivated reasoning concentrates (typed by direction) |
Edge types
| Code | Edge | Meaning |
|---|---|---|
| dep | depends-on | If target collapses, source collapses |
| imp | implies | Logical implication |
| sup | empirically-supports | Evidence relation |
| conf | confounds / inflates | Artifact relationship (e.g., AM inflates rg) |
| mod | moderates | Changes magnitude (e.g., SES × heritability) |
| dev | develops-into | Temporal/developmental successor (temperament → personality) |
| corr | corrects | Within-family corrects between-family bias |
Weight scale (load-bearing weight, 1–5)
- 5 — crux node; collapse propagates across multiple sections of the lit review
- 4 — load-bearing within a section
- 3 — important but local
- 2 — corroborating
- 1 — decorative; could be removed without changing the picture
1. Node catalog
Each node carries: type code · weight · short claim · key citation · status. Status flags: ✓ (robust/replicated), ~ (partial/qualified), ? (contested), ✗ (refuted, kept as historical reference).
A — Foundational assumptions
| ID | Wt | Claim | Status |
|---|---|---|---|
| A1 | 5 | Twin/adoption methods provide approximately valid variance decomposition (EEA modestly violated but not fatally) | ✓ |
| A2 | 5 | GWAS signal reflects real genetic effects, not (only) population stratification or AM artifact | ✓ partial |
| A3 | 5 | A general factor of cognitive ability g is a real dimension of individual variation (positive manifold) | ✓ statistical / ? mechanism |
| A4 | 3 | Heritability findings apply to the population sampled, not to individuals or other populations (scope) | ✓ |
| A5 | 3 | Phenotypes are reliably and validly measurable across cultures and time | ~ |
| A6 | 3 | Most psychological variation is dimensional, not taxonic | ✓ |
M — Methodological prerequisites
| ID | Wt | Tool | Notes |
|---|---|---|---|
| M1 | 4 | Twin studies (MZ/DZ) at scale | Polderman 2015 meta: 14.5M pairs |
| M2 | 4 | Adoption studies, especially cross-cultural (Korean-American) | Sacerdote 2007; Beauchamp 2023 |
| M3 | 5 | GWAS at N ≥ 100k (ideally ≥ 1M for personality/EA) | Okbay 2022 (3M for EA) |
| M4 | 5 | Within-family designs (sibling FE, MZ-discordant, parent-offspring trios). Kong 2018; Okbay 2022 (EA, N=3M); Howe et al. 2022 Nature Genetics extended this to 178k siblings × 25 phenotypes — within-sibship estimates were systematically smaller than population estimates for height, EA, cognitive ability, depressive symptoms, smoking. The within-family approach is now mature beyond just educational attainment | ✓ |
| M5 | 4 | Polygenic scores (PGS) | Best R² ~0.16 for EA, ~0.10 for SCZ |
| M6 | 3 | Cross-trait LD-score regression for genetic correlations | Brainstorm 2018 |
| M7 | 3 | Mendelian randomization | For causal inference from observational data |
| M8 | 3 | Pre-registration & collaborative meta-analysis | Demolished candidate-GxE |
| M9 | 4 | Whole-genome sequencing (rare-variant capture). 2025 follow-up (UK Biobank ~500k, Wainschtein et al. 2025 Nature) captures ~88% of pedigree-based narrow-sense heritability across many traits (20% rare + 68% common variants). The “missing heritability” problem is now substantially resolved for many phenotypes | ✓ |
E — Empirical claims
Cognition / IQ:
| ID | Wt | Claim | Status |
|---|---|---|---|
| E1 | 5 | Mean trait heritability ≈ 0.49 across 17,804 traits (Polderman 2015) | ✓ |
| E2 | 5 | Shared environment C ≈ 0 for adult personality and most adult cognition | ✓ with exceptions (EA, religiosity, politics) |
| E3 | 4 | Wilson Effect: IQ heritability rises from ~0.20 (age 5) to ~0.80 (adulthood) | ✓ |
| E4 | 5 | Hyper-polygenic architecture: thousands of small-effect variants | ✓ (Turkheimer’s 4th Law, Chabris 2015) |
| E5 | 4 | Candidate-gene approach for psychiatric/personality traits failed (5-HTTLPR etc.) | ✗ original claims; ✓ collapse finding |
| E6 | 5 | Within-family PGS effects are ~½ population-level effects (genetic nurture) | ✓ for EA, BMI, height |
| E7 | 5 | Cross-trait assortative mating explains R²≈74% of variance in genetic correlation estimates | ✓ (Border 2022, Science) |
| E8 | 4 | Lead exposure 1–10 µg/dL → −6.2 IQ pts (Lanphear 2005) | ✓ |
| E9 | 4 | Each year of schooling adds ~3.4 IQ points, persisting into old age | ✓ (Ritchie & Tucker-Drob 2018) |
| E22 | 4 | Within-population heritability does not license between-population inference | ✓ logical |
| E23 | 4 | PGS prediction accuracy decays continuously along the genetic-distance continuum from training population (Pearson r = −0.95 between genetic distance and PGS accuracy across 84 traits, Ding et al. 2023, Nature). Reframes the older “discrete ancestry-group drop” picture (Martin 2019; Mostafavi 2020) | ✓ |
| E24 | 3 | Flynn Effect and its post-1990s reversal are both environmentally driven (within-sibship evidence) | ✓ pattern; ? cause |
| E25 | 2 | Scarr-Rowe: SES × heritability hypothesis (more genetic expression in higher-SES). Weakening further in 2024 — Ghirardi et al. 2024 found 39/42 PGI×SES interactions in education NEGATIVE (compensatory direction); only 1 significant positive. Pattern is now closer to “compensatory hypothesis holds, Scarr-Rowe fails” than to “context-dependent” | ✗ |
| E18 | 4 | Positive manifold: every cognitive test correlates positively with every other | ✓ |
| E26 | 3 | Childhood IQ → all-cause mortality: each 1-SD ≈ 24% lower mortality | ✓ (Calvin 2011) |
| E27 | 3 | Lifespan IQ stability: Lothian Birth Cohort age-11 → age-90 r ≈ 0.67 | ✓ |
| E28 | 3 | Severe iodine/alcohol/deprivation cause large asymmetric IQ effects | ✓ |
Personality / temperament:
| ID | Wt | Claim | Status |
|---|---|---|---|
| E29 | 4 | Big Five h² ≈ 0.40–0.60; cross-cultural replication for E/A/C | ✓ partial (Tsimane qualifier) |
| E30 | 4 | Cumulative continuity: rank-order stability rises to ~0.74 by midlife | ✓ |
| E31 | 4 | Maturity principle: mean-level ↑ in C, A, ES with age | ✓ |
| E32 | 3 | Temperament dimensions (Surgency, Negative Affectivity, Effortful Control) → adult personality | ✓ |
| E33 | 3 | Personality predicts mortality/divorce/income at magnitudes ≈ SES & cognition | ✓ (Roberts 2007) |
Sex differences:
| ID | Wt | Claim | Status |
|---|---|---|---|
| E10 | 4 | Multivariate Big Five sex difference: D ≈ 2.71 (~10% overlap) | ~ method-sensitive |
| E11 | 4 | People-things interest difference d ≈ 0.93 (largest in psychology) | ✓ |
| E12 | 3 | Mental rotation d ≈ 0.56–0.73 male advantage | ✓ |
| E13 | 3 | Math performance d ≈ 0.05–0.10 (essentially equal) | ✓ |
| E14 | 4 | Gender Equality Paradox: differences larger in egalitarian/wealthier countries. Herlitz et al. 2025 systematic review (54 articles, 27 meta-analyses) confirmed the pattern across personality, verbal abilities, episodic memory, and negative emotions — pattern replication has strengthened, not weakened | ✓ pattern; ? mechanism |
| E34 | 3 | Physical aggression d ≈ 0.40–0.60 male; ~95% of homicides male | ✓ |
| E35 | 2 | CAH girls show masculinized toy preferences; primate parallels | ✓ |
Psychopathology:
| ID | Wt | Claim | Status |
|---|---|---|---|
| E15 | 4 | All major psychiatric disorders highly heritable (h² 0.35–0.85) and hyper-polygenic | ✓ |
| E16 | 4 | Cross-disorder genetic correlations exist | ✓ existence; ? magnitude post-AM |
| E17 | 3 | A p factor (general psychopathology) fits cross-syndrome data | ✓ statistical; ? interpretation |
| E36 | 3 | Autism: common-PGS positively correlated with IQ; rare/de-novo drives ID-comorbid cases | ✓ |
| E37 | 2 | Critical-period plasticity (GABAergic, perineuronal nets) is mechanistically real | ✓ |
L — Logical necessities
| ID | Wt | Claim |
|---|---|---|
| L1 | 5 | Heritability is a population variance ratio; it does not partition individual phenotypes (mathematical form — A4 is the scope-of-claim sibling) |
| L2 | 4 | h² changes with environmental variance: hold genes constant, equalize environments → h² → 1 |
| L3 | 5 | Within-family designs control for between-family confounds (rGE, stratification, AM) |
| L4 | 5 | Within-population heritability provides no information about between-population mean differences (Lewontin). E22 in the empirical column is the applied form of this same point |
| L5 | 3 | Multivariate D ≥ max(univariate d) when component dimensions are positively correlated |
| L6 | 3 | Positive manifold permits both unitary-cause and emergent-network interpretations of g |
| L7 | 3 | Effect-size interpretation is scale-dependent (d=0.10 trivial in trait psychology, large in clinical) |
G — Generating mechanisms
| ID | Wt | Mechanism | Drives |
|---|---|---|---|
| G1 | 4 | Active rGE / niche-picking | E3 (Wilson Effect amplification) |
| G2 | 5 | Passive rGE. Wang 2021 / Isungset 2022 confirm indirect ≈ ½ direct genetic effect for EA. Nivard et al. 2024 found indirect genetic effects on offspring achievement extend beyond the nuclear family — dynastic / extended-family / community processes contribute, so the “parents transmit gene + correlated environment” framing understates the spread | E6 (genetic nurture); inflation of population-level h² |
| G3 | 3 | Evocative rGE | Heritability of “environments” (Kendler & Baker 2007) |
| G4 | 3 | Critical-period plasticity (GABAergic maturation) | Asymmetric environmental effects on early development |
| G5 | 4 | Assortative mating → LD induction | Inflates additive genetic variance V(A) at the population level (Yengo 2018: 14–23% for height); inflates SNP h² and PGS effect sizes. Counterintuitively biases Falconer twin h² downward (raises rDZ relative to rMZ); twin-vs-WF gap for socially-structured traits is dominated by genetic nurture / EEA, with AM partially offsetting. |
| G6 | 5 | Cross-trait AM → spurious genetic correlations | Confounds E16, E17 (p-factor) |
| G7 | 4 | Stochastic developmental noise | Dominant source of non-shared environment |
| G8 | 3 | Selection / niche construction across the lifespan | Bridges temperament → personality → outcome cascade |
S — Synthesis claims
| ID | Wt | Claim |
|---|---|---|
| S1 | 5 | ”Genes vs. environment” is the wrong frame; the system is tightly coupled (genome × rGE × AM × few large environmental insults × stochastic noise × culture × developmental unfolding) |
| S2 | 5 | Twin h² ≥ SNP h² ≥ within-family h² gradient quantifies AM/rGE/measurement inflation across estimation methods |
| S3 | 4 | Heritability ≠ destiny; high h² is compatible with large environmental shifts (height: h² ≈ 0.80, +10cm in a century) |
| S4 | 4 | Most “non-shared environment” is stochastic, not systematic — it accounts for ~50% of personality variance and is poorly characterized |
| S5 | 4 | Two parallel hierarchies (CHC for cognition, HiTOP for psychopathology) connected at the top by inverse g↔p genetic correlation |
| S6 | 4 | Developmental cascade: temperament (infant biological reactivity) → personality (adult social-cognitive layer added) → outcomes (mortality, attainment, relationships) with h² ↑ and shared-env ↓ across the lifespan |
O — Open questions
| ID | Wt | Question | Why it matters |
|---|---|---|---|
| O1 | 5 | Mechanistic interpretation of PGS: “causal genetic” (Plomin) vs. “weak explanation” (Turkheimer) | Determines what PGS prediction means |
| O2 | 3 | Cause of Flynn-effect reversal post-1990s | Empirical pattern robust (Bratsberg & Rogeberg 2018 within-sibship Norway). Mechanism still unsettled. Pietschnig et al. 2024 (Vienna 2005–2018 cohort) added a wrinkle: the positive manifold itself may be weakening — gains in some abilities aren’t tracking gains in others, suggesting the g-loading of the rise/fall is not constant. Hypothesized mechanisms (screens, reduced long-form reading, attention) circulate without empirical pinning |
| O3 | 4 | Causal mechanism behind Gender Equality Paradox | Innate-expression release vs. measurement artifact vs. confound — selection of explanation has political stakes |
| O4 | 5 | Between-population mean differences: any genetic component? | Currently scientifically unanswerable with available methods (PGS portability too poor, cross-ancestry GWAS at scale don’t exist). Honest position: unresolved, not settled in either direction |
| O5 | 3 | g architecture: latent common cause vs. emergent network (mutualism, van der Maas 2006) | Affects how interventions could in principle move g |
| O6 | 4 | What “non-shared environment” actually is: stochastic noise, immune/microbial, peer networks, epigenetic, measurement error | Largest unmodeled variance component in personality |
| O7 | 5 | Magnitude of AM-correction across the cross-disorder genetic correlation matrix | Active revision. Ma, Wang, Border et al. 2024 AJHG introduced LAVA-Knock — a local-genetic-correlation method that reduces xAM-induced bias. Methods to give the answer are now emerging, not just to flag the problem |
D — Distortion vectors (where motivated reasoning concentrates)
| ID | Direction | Targets | Failure mode |
|---|---|---|---|
| D1 | Blank-slate / environmentalist | A1, E1, E10–E14 | Dismiss twin studies wholesale; oversell transgenerational epigenetics; overstate stereotype threat; minimize sex differences via univariate-only framing |
| D2 | Hereditarian | L4, E22, E23, O4 | Ignore Lewontin; treat g-loadedness of gaps as evidence of genetic etiology; cite fringe admixture studies; ignore AM/rGE corrections to PGS |
| D3 | ”Gender similarities” minimization | E10, E11, E14 | Selective citation (math d=0.05) to imply no differences anywhere; obscure D=2.71 multivariate; minimize d=0.93 interest gap |
| D4 | Pop evpsych overgeneralization | E10–E14, A6 | Treat dimensional ds as taxonic; extrapolate small ds to categorical claims; overgeneralize from specific tasks to broad-domain claims |
2. Dependency cascade
The cascade reads from foundations up to synthesis, and from corrections back down to corrected claims.
Forward cascade (foundations → empirical claims → synthesis)
A1 ──dep──> M1, M2 ──sup──> E1, E2, E3, E25, E29
A2 ──dep──> M3, M5 ──sup──> E4, E6, E7, E15–E17, E22–E23
A3 ──dep──> E18 ──sup──> E26, E27 ──imp──> S5
A4 (scope) + L1 (form) ──guards──> interpretation of E1, E2, E3 and S3
A6 ──imp──> S5 (dimensional turn in psychiatry)
M9 ──corr──> E1 (closes missing-heritability gap)
M3 + M4 ──sup──> E6 (genetic nurture), E7 (xAM)
E5 (candidate-gene collapse) ──imp──> E4 (polygenic architecture confirmed by absence of large hits)
E1 + E2 + E3 + G1 ──imp──> S6 (developmental cascade)
E1 + E4 + G2 + G5 ──imp──> S2 (h² gradient by method)
E10 + E11 + E12 + E13 + L5 ──imp──> "small univariate, large multivariate" sex-difference picture
E14 + O3 ──imp──> mechanism-pending GEP
E15 + E16 + E17 ──imp──> S5 (HiTOP/p)
E22 + E23 + L4 ──imp──> O4 (between-pop unanswerable currently)
E1 + E2 + E4 + E6 + E7 + E8 + E9 + G1–G7 ──imp──> S1, S2 (integrated picture)
Backward / corrective cascade (newer evidence revises older claims)
G2 (passive rGE) ──corr──> E1 estimates (population-level overstates direct genetic)
G6 (cross-trait AM) ──corr──> E16 (some psychiatric rg's may be xAM artifact)
M4 (within-family) ──corr──> E6 magnitude (~½ of population PGS)
M8 (preregistration) ──corr──> E5 (collapsed candidate-GxE)
M9 (WGS) ──corr──> "missing heritability" interpretation
Distortion → target edges
D1 ──attacks──> A1, E1, E10, E11, E12, E14
D2 ──attacks──> L4, E23 (ignores), exploits A2 absent corrections from G2/G6
D3 ──attacks──> E10, E11 (selective univariate framing)
D4 ──attacks──> A6, L7
3. Where pressure concentrates
A common failure mode in this literature is to treat all high-stakes nodes as the same type of thing. They are not. The graph has three distinct categories of high-stakes node and one category of fragile claim — keeping these separate sharpens what the field actually needs to resolve.
3a. Foundational cruxes — falsification breaks regions of the picture
These are the empirical-or-methodological assumptions that, if wrong, force rebuilding large parts of the lit review.
A1 — Twin/adoption method validity. Carries Section 1 of the lit review; heritability-by-domain table; Wilson Effect. Robustness: HIGH (MZ-reared-apart, SNP-h² bypassing EEA, misperceived-zygosity all converge). Would flip if SNP-h² for psychological traits systematically converged on <0.05 — has not occurred.
A2 — GWAS signal is real (not artifact). Carries the PGS enterprise; genetic nurture estimates; cross-disorder pleiotropy; modern psychiatric genetics. Robustness: MODERATE-HIGH (within-family PGS effects are non-zero for EA, BMI, height — direct signal exists; AM/stratification inflation magnitudes still being quantified). Would flip if within-family PGS effects converged on zero across most traits.
A3 — g is a real dimension of cognitive variation. Carries Section 5 of lit review; predictive-validity claims; CHC structure; mortality/income predictions. Robustness: HIGH for g as a statistical regularity; MODERATE for g as unitary biological mechanism. Would flip if a broad cognitive battery had first-PC <15% or if interventions reliably moved one ability while lowering others. 2024 wrinkle: Pietschnig et al. 2024 reported the positive manifold may be weakening across recent cohorts — softly pressures A3 in a new way without refuting it.
3b. Reframer nodes — the answer is open and reshapes interpretation
These don’t break the picture if reversed; they change what the picture means. Their magnitudes are being actively quantified in 2024–2026 work. Conflating reframers with foundational cruxes is the most common conceptual error in pop-science treatments of this field.
G2 / E6 — Passive rGE / genetic nurture. Reframes the meaning of every population-level genetic estimate. Without G2, “genetic transmission” reads as direct biological causation; with G2, ~half is environmentally mediated by genetically-similar parents (Wang 2021 / Isungset 2022). Nivard et al. 2024 (Nat Hum Behav) showed indirect genetic effects extend beyond the nuclear family to dynastic / extended-family processes. The existence is robust; precise magnitude across all traits is still being quantified.
G6 / E7 — Cross-trait assortative mating. Reframes the cross-disorder genetic-correlation matrix and the p-factor’s interpretation. Border 2022 (Science) showed phenotypic cross-mate correlations explain R²=74% of variance in genetic-correlation estimates. Ma, Wang, Border et al. 2024 (LAVA-Knock) is the first method to systematically reduce xAM-induced bias. The share of any specific rg that is artifact vs. genuine pleiotropy is still pending.
3c. Logical guardrails — unfalsifiable but load-bearing for interpretation
These cannot be falsified — they are algebraic / definitional truths. They can be ignored, which is how most public-discourse misuse of the field happens.
L1 — Heritability is a population variance ratio, not an individual partition. Cannot be falsified. Public misreading of “70% heritable IQ” as “70% of any individual’s IQ comes from genes” is the failure of L1, not the science.
L4 — Within-population heritability does not license between-population mean inference (Lewontin firewall). Cannot be falsified — it is a logical/algebraic point. Can only be ignored. The empirical buttress today is E23 (PGS portability collapse along genetic-distance continuum, Ding 2023): even if you wanted to use within-pop methods to speak to between-pop differences, the methods don’t currently work.
3d. Decorative material (safe to compress)
Removable from the topology without changing the qualitative picture:
- E35 (CAH / primate toy preferences) — convergent evidence, not necessary
- E37 (specific GABAergic critical-period mechanisms) — biologically real, not load-bearing for the variation argument
- HEXACO Honesty-Humility specifics — incremental over Big Five
- Specific Dark-Triad subdimensions — D-factor synthesis (Moshagen 2018) carries more weight
- P-FIT brain network specifics — corroborate g but don’t establish it
- Yehuda Holocaust FKBP5 transgenerational findings — refuted/non-replicated; kept only as historical anchor for D1 distortion
- Specific candidate-gene findings (5-HTTLPR depression) — refuted; kept as historical anchor for the field’s methodological turn (M8)
4. Weakest links
These are the load-bearing pieces with the lowest current confidence. Targeted attack on any one would do the most damage to the integrated picture.
W1: Generalization from candidate-GxE failure to “all GxE is small” (E5 → broader claim)
Why fragile: The candidate-gene collapse is definitive. The extrapolation that polygenic-score × environment interactions are also small is an inductive leap, not a result. As of 2025, the picture is partially holding the null but not strengthening it. A 2025 systematic review of 56 PGS×E studies for depression found mostly null or small effects. A multivariable PGS×E study of educational achievement (Allegrini et al. 2020) found “no evidence that GxE effects significantly contributed to multivariable prediction.” UK Biobank work (2024) on distinct explanations of GxE shows that many apparent GxE signals are confounded by scale, ascertainment, or population structure. The candidate-gene-failure extrapolation is looking less like an inductive leap and more like a substantive empirical pattern — but the literature is still too young for a strong null.
Pressure test: Several large preregistered PGS×E studies finding interactions explaining >5% variance would substantially revise this corner of the picture.
W2: Scarr-Rowe (E25) — has substantially weakened since pass-0
Why fragile: The original meta-analytic picture was “replicates in US, fails in W. Europe / Australia” (Tucker-Drob & Bates 2016). Ghirardi et al. 2024 (Netherlands Twin Register, polygenic-index design across 42 PGI×SES interaction tests for educational outcomes) found 39/42 negative, 0 significant positive, 1 marginally significant positive — i.e., the opposite sign from Scarr-Rowe in most cases. The picture in 2026 is closer to “the compensatory hypothesis (more genetic expression in low-SES because constrained environments suppress non-genetic variance) is the better-supported pattern, at least for educational outcomes.” E25’s weight has been downgraded from 3 → 2 to reflect this. The narrative “deprivation suppresses heritability” — popular in policy discourse — is now evidence-thin.
W3: Plomin-vs-Turkheimer interpretation of PGS (O1)
Why fragile: Both views are compatible with current data. Determines what PGS means — direct biology vs. summary statistic of correlated environments. The field publishes ambiguously across both interpretations. Will likely be settled only by within-family-only PGS that are still well-powered.
W4: Magnitude of AM-correction across psychiatric cross-disorder rg matrix (O7) — methods now emerging
Why still fragile but improving: Border 2022 showed xAM explains R²=74% of variance in genetic-correlation estimates but didn’t prove all rg’s are spurious — some genuine pleiotropy surely exists. As of 2024, the field is moving from flagging the problem to building correction methods. Ma, Wang, Border et al. 2024 (American Journal of Human Genetics) introduced LAVA-Knock, a local-genetic-correlation method using knockoff inference to reduce xAM-induced bias; tested across 630 trait pairs in simulation and real GWAS, it substantially reduces but does not eliminate the bias. A 2024 study found AM genetic signatures across SCZ, BD, MDD, alcohol phenotypes, and Tourette syndrome — confirming xAM is not selective. What’s still pending: how much of the cross-disorder rg matrix and the p-factor genetic signal survives systematic application of AM-correction methods at scale. Likely answer in 2–3 years.
W5: Gender Equality Paradox mechanism (O3, E14) — pattern strengthened, mechanism still contested
Empirical pattern: more robust as of 2025. Herlitz et al. 2025 systematic review (54 articles, 27 meta-analyses, Perspectives on Psychological Science) found the paradox replicates across personality, verbal abilities, episodic memory, and negative emotions. Balducci et al. 2024 extended it to within-individual academic strengths cross-temporally. The “this won’t replicate” objection has weakened.
Mechanism: still contested. Three live candidates: (a) innate-expression release in resource-rich environments, (b) reference-group / self-anchoring artifacts in self-report (people compare to their gender peers, not to humans-in-general), (c) wealth/freedom confounds with gender equality. Behavioral / incentivized-measure replications (Falk & Hermle 2018 for economic preferences) cover only part of the domain. The decisive test — non-self-report behavioral replication across personality and interests — is still incomplete. Each candidate mechanism implies different normative conclusions, which is part of why this remains contested rather than resolved.
W6: Flynn-reversal cause (O2) — and a new wrinkle on the positive manifold
Why fragile: The pattern is environmentally driven (within-sibship, Bratsberg & Rogeberg 2018), so “dysgenic” explanations are out. But no mechanism (screen time, education quality, attention, nutrition, lead, microplastics) has been pinned down with within-cohort empirical work. Pietschnig et al. 2024 (Vienna 2005–2018 cohort) added a structural twist: the positive manifold itself may be weakening across cohorts — meaning the recent rise/fall is not uniformly g-loaded. If confirmed broadly, this softly pressures A3 (g exists as a stable dimension) — not refuting it, but suggesting its strength may be cohort-dependent. Still not load-bearing for the integrated picture, but interacts with A3 in a new way.
W7: A6 (dimensional vs. taxonic) at psychiatric extremes
Why fragile: Most psychopathology is dimensional (taxometric evidence is robust), but for severe early-onset autism with intellectual disability, rare large-effect variants (CHD8, SCN2A, SYNGAP1) drive a partly taxonic picture. The “all dimensional” framing oversells continuity at the severe tail.
5. Variant views
The same graph, read four ways.
Variant A: Vulnerability map — where does this break?
The vulnerability map is the union of the three foundational cruxes (§3a), two reframer nodes (§3b), two logical guardrails (§3c), and seven weakest links (§4). Together they describe the smallest set of pressure points whose movement would force restructuring of the integrated picture:
- Falsify A1: SNP-h² systematically <0.05 → twin-method discredited → Section 1 collapses
- Falsify A2: within-family PGS → 0 → modern psychiatric genetics collapses
- Falsify A3: positive manifold dissolves → Section 5 collapses
- Falsify G2: within-family PGS = population PGS → genetic nurture is null → Plomin direct-causal view wins (O1 resolves)
- Falsify G6 fully: AM correction barely changes rg matrix → cross-disorder pleiotropy is real
- Violate L4: cannot be falsified, only ignored — but its violation in public discourse is the largest single source of public confusion
If exactly one of these were to flip, the rebuild would be: A1→ rebuild Section 1 only; A2→ rebuild Sections 1, 3, 7 (~40% of lit review); A3→ rebuild Section 5 (~25%); G2/G6→ keep numbers, rewrite causal interpretation throughout.
Variant B: Flow map — how does causation propagate?
Causation in this system runs in two directions, both important.
Forward developmental flow (genome → outcomes):
Genome (polygenic + few rare large-effect)
│
├──> Temperament (infant biological reactivity: Surgency / NA / EC)
│ │
│ ├──> Active rGE / niche-picking ─────────┐
│ │ │
│ └──> Evocative rGE (eliciting responses) ┤
│ │
└──> Direct expression in brain development ─────┤
▼
Personality (adult)
│
├──> Attainment
├──> Relationships
├──> Health behaviors
└──> Mortality
Indirect / dynastic flow (parents’ genome → offspring environment → offspring outcome):
Parents' genome
│
├──> Parents' phenotype (income, vocabulary, parenting style, neighborhood choice)
│ │
│ └──> Offspring's rearing environment ────────┐
│ ▼
│ Offspring outcomes
│ ▲
└──> Transmitted alleles ───────────────────────────┘
The genetic-nurture finding (E6) says these two pathways have roughly equal magnitude for educational attainment. They are partially separable only via within-family designs (M4) or non-transmitted-allele PGS.
Cross-generational drift via assortative mating:
Mating choice (correlated on phenotype)
│
└──> LD induction among causal variants (G5)
│
├──> Inflated additive genetic variance
├──> Inflated h²
├──> Inflated cross-trait genetic correlations (G6)
└──> Inflated PGS prediction accuracy
Variant C: Minimal claim set — smallest set supporting the conclusion
The smallest collection of claims that yields the integrated picture (S1) is eight nodes:
- E1 — Mean trait h² ≈ 0.49 (heritability is real and substantial)
- E4 — Polygenic architecture (no master genes)
- E6 — Within-family PGS ≈ ½ population PGS (genetic nurture is real)
- E7 — Cross-trait AM is a major source of inflated genetic correlations
- E8 — A small set of large environmental insults have causal effects (lead, iodine, alcohol, deprivation, schooling)
- L4 — Within-pop ≠ between-pop (Lewontin)
- G7 — Stochastic developmental noise is the dominant source of non-shared environment
- A6 — Most psychological variation is dimensional, not taxonic
These eight together generate the qualitative integrated picture without requiring detailed effect-size tables, cross-cultural caveats, or specific candidate-gene history. The remaining ~50 nodes refine and corroborate but do not change the shape.
Variant D: Politicization map — where does motivated reasoning concentrate?
This is the variant most relevant to the topic framing (“a minefield of motivated reasoning on all sides”).
Distortion-to-target matrix:
| Distortion | Targets | Move | Counter-evidence |
|---|---|---|---|
| D1 Blank-slate | A1, E1, E10–E14 | ”Twin studies are flawed; differences are socialization” | SNP-h² (bypasses EEA), MZ-reared-apart, Su 2009 (d=0.93), CAH/primate convergence |
| D2 Hereditarian | L4, E22, E23, O4 | ”Group differences are genetic” | PGS portability collapse (E23); Lewontin (L4); cross-ancestry GWAS at scale don’t exist |
| D3 Gender-similarities | E10, E11, E14 | ”All differences are tiny (cite math d=0.05)“ | Multivariate D=2.71 (E10); people-things d=0.93 (E11); GEP (E14) |
| D4 Pop-evpsych | A6, L7, E10–E14 | ”Men are X, women are Y” (categorical from dimensional) | A6 (dimensional); L7 (effect-size context) |
Why all four distortions can target the same evidence base: the evidence base contains both large differences (people-things d=0.93) and trivial ones (math d=0.05) and strong heritability (h²=0.49) and large environmental insults (lead, schooling) and logical guardrails against between-group inference (L4). Any single-direction narrative requires selective citation. The integrated picture (S1) requires holding all of it at once.
Operational implication for the formalization stage: any model that only parameterizes the variance components without parameterizing the interpretation of those components will be silently captured by whichever distortion the reader is most prone to. The formal model needs to make L4, G2, G6, and the dimensional/taxonic distinction (A6) structurally visible, not just numerically present.
6. Topology → formalization handoff
What the next stage (model formalization) should pick up.
Ready for equations
-
Variance decomposition — fully specifiable now:
V(P) = V(A_direct) + V(A_indirect) + V(A_AM-LD) + V(C_residual) + V(E_measured) + V(E_stochastic) + 2·Cov(G,E) + V(GxE)
With each V parameterized by trait, age, population (US vs. Europe for E25), and method (twin / SNP / within-family). Cov(G,E) captures rGE; V(A_AM-LD) captures G5/G6; V(A_indirect) captures G2.
-
Method gradient (S2): twin h² ≥ SNP h² ≥ within-family h², with the gaps decomposable into AM, rGE, and rare-variant contributions. Parameterize as a function of estimation method.
-
Wilson-effect curve: h²(age) = a + b·log(age) or similar saturation form, with the slope driven by G1 (active rGE). Calibratable from Bouchard 2013 and Briley & Tucker-Drob 2013.
-
Multivariate sex-difference algebra: D² = (μ₁ - μ₂)ᵀ Σ⁻¹ (μ₁ - μ₂), with a worked example showing how D = 2.71 follows from moderate univariate ds and a positive-correlation covariance structure.
-
PGS-portability decay function: prediction accuracy as a continuous function of genetic distance from training population (Ding et al. 2023, Nature: r = −0.95 between genetic distance and accuracy across 84 traits).
Still at observation stage (formalization premature)
- O1 — Plomin/Turkheimer interpretation of PGS: not yet a formal disagreement, just a verbal one
- O3 — GEP mechanism: the algebra of “innate expression release” is not yet specified
- O6 — what non-shared environment is: no candidate decomposition
- O7 — share of cross-disorder rg that survives AM correction: empirical question pending — but methods are now emerging (LAVA-Knock); answer likely in 2–3 years, at which point this moves to “ready for equations”
Connection to adjacent topics in the LLM-iterate pipeline
This topology is the natural input to Parent-to-Child Transmission (planned topic). The genetic-nurture finding (G2/E6) and the dynastic-extension finding (Nivard 2024) are the empirical answers to “how much does parenting matter beyond genes” that the parent-child topic will need to build on. When that topic spins up, the variance decomposition equation here should be its starting point.
Less directly: the Evolution-Modernity Mismatch topic will lean on the GEP (E14/O3) and Flynn-reversal (O2) findings as evidence of environment-driven shifts in expressed psychological variation. Bedrock Generating Functions can read the variance decomposition itself as one such bedrock function.
7. Next moves — three options for Stage 3
The user picks one of these as the primary formalization target. Each leaves the others viable as later modules but shapes Stage 4 (data) differently.
Option A — Variance decomposition + method gradient (most central)
Build the central equation V(P) = V(A_direct) + V(A_indirect) + V(A_AM-LD) + V(C) + V(E_meas) + V(E_stoch) + 2·Cov(G,E) + V(GxE) parameterized by trait, age, population, and estimation method. Build a tool that takes a published h² estimate (twin, SNP, or within-family) and outputs a method-corrected estimate with explicit AM/rGE adjustment.
Pros: most central to the topic; directly answers “what generates psychological variation”; feeds Stage 4 cleanly (every term has published estimates somewhere). Cons: many parameters; risk of producing a calculator nobody uses without strong UI judgment. Stage 4 implication: pull h² estimates from PGC, SSGAC, GIANT consortia; calibrate the method-gradient term per trait class.
Option B — Multivariate sex-difference algebra (most pedagogically clean)
Formalize how moderate univariate Cohen’s ds combine into a large multivariate Mahalanobis D, with a worked Big-Five example showing how D ≈ 2.71 emerges from |d| ≈ 0.4 ds and a positive-correlation Σ. Build a dashboard letting the user dial univariate ds and the correlation matrix to see D move.
Pros: tightly scoped; resolves the single biggest framing trap in the GEP debate (univariate vs. multivariate framings of the same data); high pedagogical leverage. Cons: narrower than A; doesn’t engage the heritability core. Stage 4 implication: pull effect-size matrices from Del Giudice 2012 and Schmitt 2008 cross-cultural data; replicate D under different correlation structures.
Option C — PGS-portability calibration (most practically useful)
Turn Ding et al. 2023’s continuous decay finding into a usable accuracy estimator: enter an individual’s genetic distance from the PGS training population and get an accuracy-decay multiplier. Apply across the major trait PGSs (EA, SCZ, BMI, etc.).
Pros: directly addresses a real-world bias; smallest scope; ships fastest; useful even outside this project’s domain. Cons: less central to the heritability question; might fit better as a tool than a topic-stage. Stage 4 implication: pull cross-ancestry GWAS validation data from the All of Us / GenomeAsia / H3Africa consortia.
My recommendation: A as primary (most central to the topic’s stated purpose), with B as a stretch module if scope allows. C is high-value but might better live as a standalone tool promoted to /models later.
8. Objections to this topology (adversarial + steelman)
Four ways a careful reader could push back. The strongest version of each, then my response.
Objection 1 — Discrete typed edges falsify a continuous, magnitude-weighted, context-conditional system
Heritability is not “supported by” a twin study in the same binary way that a logical implication holds. The system is a tightly coupled developmental process; flattening it into nodes-and-arrows with discrete edge types loses information about magnitude, conditional dependence, and gradient relationships.
Response: Acknowledged, and intentional. The topology is the qualitative skeleton; edge weights and conditional dependencies are the job of Stage 3 (formalization), where each edge will be turned into a parameterized function. The graph’s value is not that it stands in for the full system but that it makes the structure visible cheaply enough that the formalization knows where to put the parameters.
Objection 2 — The crux/decorative split is editorial, not empirical
There is no algorithm that picks crux nodes; the choice depends on which failure modes you are worried about. A 1990s topology of this field would have crowned candidate-gene findings as cruxes. Naming A2 (GWAS signal real) a crux today is a judgment call about the field’s current methodological commitments — not an objective feature of the science.
Response: Correct. Cruxes are time-stamped. This topology is a 2026 snapshot. If the field shifts (post-AM-correction era, post-within-family-PGS-at-scale era) the crux set will shift — that is what the refinement passes are for. Use this as a current map, not an immutable structural claim.
Objection 3 — Calling L4 a “logical firewall” overstates the case
Lewontin’s 1970 argument has been challenged. Edwards (2003) “Lewontin’s Fallacy” showed that Lewontin’s specific quantitative point — that ~85% of human genetic variance is within rather than between populations — does not preclude reliable population-classification from genetic markers. Modern population genetics treats between-population genetic inference as more nuanced than the firewall framing suggests.
Response: The Edwards critique is real, but it addresses a different claim. Edwards refuted “you cannot reliably classify individuals into populations from genetic data.” The L4 firewall as I formulate it says “within-population heritability provides no information about between-population mean differences without strong auxiliary assumptions about shared causal architecture and equal environments.” Those are different propositions. PGS portability collapse (E23) is the contemporary empirical evidence that the auxiliary assumptions are not currently being met for psychological traits. The firewall framing survives the Edwards critique; its strength rests on the empirical PGS-portability finding, not on the original Lewontin variance argument alone.
Objection 4 — The Politicization variant is meta-commentary, not topology
The D nodes and attacks edges describe how people misuse the evidence base. That is epistemics or sociology of science, not structural topology of the field. A pure topology should omit them.
Response: Fair, and the inclusion is non-orthodox. It is justified here only by the topic framing — the user’s prompt explicitly described the field as “a minefield of motivated reasoning … where the actual generating functions are obscured by politics.” A topology of just the science would omit the D nodes; a topology that helps a reader navigate the field as it is actually encountered should include them. The D nodes will not be carried into Stage 3 formalization — they exist for navigation, not for downstream computation.
9. Glossary
For readers approaching this from outside the field. Terms appear throughout the lit review and topology; this is the lookup table.
| Term | Meaning |
|---|---|
| h² | Heritability — fraction of trait variance in a population attributable to genetic variation. A population statistic, not an individual one. |
| SNP | Single-nucleotide polymorphism — a single-base difference at a position in the genome where multiple variants exist in the population. |
| GWAS | Genome-wide association study — scans hundreds of thousands of SNPs against a measured trait, looking for statistical association. |
| PGS | Polygenic score — a per-individual sum of trait-associated SNPs weighted by their GWAS effect sizes. Used as a predictor. |
| LD | Linkage disequilibrium — non-random association between alleles at nearby loci, typically because they are inherited together. |
| AM | Assortative mating — partners resemble each other on a trait above chance. xAM = cross-trait AM (e.g., taller-than-average partners with more-educated-than-average). |
| rGE | Gene-environment correlation. Passive (parents transmit genes + correlated environment), evocative (heritable traits elicit responses), active (people select environments matching propensities). |
| GxE | Gene-environment interaction — the same genotype produces different phenotypes in different environments. |
| EEA | Equal environments assumption — the twin-method assumption that MZ and DZ twins are treated similarly enough that any extra MZ phenotypic resemblance reflects genetics, not differential treatment. |
| MZ / DZ | Monozygotic (identical, ~100% shared DNA) / dizygotic (fraternal, ~50% shared DNA) twins. |
| rg | Genetic correlation between two traits — how much the same genetic variants influence both. |
| WGS | Whole-genome sequencing — capturing every base in the genome, including rare variants GWAS misses. |
| g-factor | General factor of cognitive ability — the latent dimension behind the positive manifold (every cognitive test correlates positively with every other). |
| p-factor | Proposed general factor of psychopathology — analogous to g, derived from cross-syndrome correlations. |
| CHC / HiTOP | Cattell-Horn-Carroll cognitive-ability hierarchy / Hierarchical Taxonomy of Psychopathology (a dimensional alternative to DSM). |
| d (Cohen’s d) | Standardized mean difference between two groups, in standard-deviation units. Effect-size labels (small / medium / large) are scale-dependent — see L7 in the node catalog. |
| Mahalanobis D | Multivariate generalization of Cohen’s d — distance between two group means in the geometry of the trait space, accounting for correlation between traits. |
| Within-family design | Comparing siblings or MZ-discordant twins or parent-offspring trios within the same family — controls for between-family confounds (population stratification, AM, passive rGE). |
| Genetic nurture | Effect of parents’ genotype on offspring outcomes via the environment the parents create — including alleles the parent did not transmit. |
10. Stage_outputs convention reference
Raw working drafts from each LLM-iterate stage live at:
stage_outputs/<topic>/<stage>.md
Where <topic> is kebab-case (e.g., human-psych-variation) and <stage> is one of: lit-review, topology, model, data, build, writeup. Polished versions move into src/content/ai_research/<topic>/<stage>.mdx with proper frontmatter (title, description, date, status, refinementPass, refinementLog) once ready to publish on the site.
The interactive D3 graph for this topology lives at src/components/research/PsychVariationGraph.tsx and is mounted in src/content/ai_research/human-psych-variation/topology.mdx via client:load.
Generating function for human psychological variation. One equation per person; variance decomposition follows. Closed-form pieces: Crow–Felsenstein AM inflation, Wilson-Effect saturation, genetic-nurture additive split, multivariate sex-difference Mahalanobis D. Twin / SNP / within-family heritability are projections of the same decomposition. Interactive dashboard included.
TLDR
The topology answered “what depends on what?”. The formalization answers a sharper question: given a person, where does their phenotype come from in expectation? The answer is a single generating function that, once written down, dissolves several apparent paradoxes in the field — most importantly the gap between twin heritability, SNP heritability, and within-family heritability (they estimate different sums of the same underlying components, and the differences are informative).
The spine of this stage is one equation. Phenotype P for a person in a population is P = A_d + A_i + A_LD + C + E_m + E_s + I, with each term a contribution from a distinct mechanism: direct genetic effects from the person’s own transmitted alleles, indirect genetic effects from parental (and broader-family) genomes operating through the environment they create, assortative-mating-induced linkage among causal variants, residual shared environment, measured non-shared environment, stochastic developmental noise, and gene-environment interaction terms. Variance decomposition follows directly, and is block-orthogonal rather than fully orthogonal: V(P) = ΣV(component) + 2·Cov(A_d, A_i) + 2·Cov(A_d, E_m) + 2·Cov(A_d, C) + V(I). The cross-terms are the formal home of every gene-environment correlation finding in the literature; pretending they are zero is the most common modeling error. Three closed-form pieces drop out — the Crow–Felsenstein assortative-mating partition V(A_LD) = h²_obs · r_δ with r_δ = m·h²_obs (the dashboard partitions h² rather than inflating it; the equilibrium is reached in 5–10 generations of stable assortment), the Wilson-Effect logistic curve h²(t) = h²_∞ / (1 + exp(−k·(t − t_50))), and the method gradient that says twin h² ≥ SNP h² ≥ within-family h² with the gaps decomposable into AM-LD, indirect-genetic, and rare-variant pieces.
A second module handles the multivariate sex-difference algebra, because the single largest framing trap in this field is the gap between univariate Cohen’s d (typically 0.2–0.6 across personality dimensions) and the multivariate Mahalanobis distance D² = Δμᵀ·Σ⁻¹·Δμ (which can hit 2.7 when traits are weakly correlated and you stack 15 of them, as in Del Giudice 2012). The same data, two numbers, opposite-sounding stories — both correct. The formalization makes the bridge explicit so the reader can dial univariate d’s and inter-trait correlations and watch D move.
What this stage does not formalize: the Plomin/Turkheimer interpretation of polygenic scores (verbal disagreement, no candidate equation), the mechanism behind the Gender Equality Paradox (three live hypotheses with no shared formalism), and the magnitude of AM-correction across the full cross-disorder genetic-correlation matrix (active research, methods just emerging). These remain at the observation stage; premature math here would mask uncertainty rather than reduce it. The L4 Lewontin firewall is preserved as a structural property of the model: the entire generating function is within-population, and nothing in it licenses between-population mean inference.
Inputs
Variance decomposition
Method gradient
Assortative-mating partition
Wilson h²(t) is the AM-equilibrium population heritability V(A_AM)/V(P). The Crow-Felsenstein partition splits V(A_AM) into V(A_d) (clean direct, what within-family designs estimate) and V(A_LD) (population-level linkage among trait-relevant alleles induced by non-random mating). Note that classical twin h² (Falconer) is *biased downward* relative to V(A_AM)/V(P) by factor (1 − m_A) — AM raises DZ correlation relative to MZ correlation — but is typically inflated upward by EEA violations and genetic-nurture leakage, with the net effect for socially-structured traits being upward overall. V(A_i) is added on top as the variance contribution of genetic nurture; the gap between empirical twin h² and within-family h² for socially-structured traits is dominated by genetic nurture and EEA, not AM (see the model stage §2.2 caveat).
How to read this stage
The dashboard above is the artifact. Everything below is the spec.
In plain language: when researchers report a “heritability of 0.50,” what is being claimed is that if you took the variance in a trait across a population and asked how much of it tracks genetic differences, half of it does. It is a statement about the population’s variance, not about any single person, and not about between-population differences. It says nothing causal beyond that — high heritability is fully compatible with large environmental effects (height is ~80% heritable and has risen ~10cm in a century). Different methods estimating “heritability” answer slightly different questions: twin studies pick up the broadest definition, within-family designs the narrowest. The gap between them is informative.
The stage formalizes that picture by writing one equation per person, decomposing it into named pieces, and showing how each measurement method projects onto a different subset of the pieces. Three closed-form sub-equations follow (assortative-mating inflation, Wilson-Effect age curve, genetic-nurture split). A second module addresses the same algebra applied to group differences — most prominently sex differences, but the framework is general. The dashboard lets you turn the knobs and watch the consequences.
You can read this top-down (TLDR → equation → closed forms → boundary conditions) or bottom-up (play with the dashboard, then come back to the equations when something surprises you). Either order works. The cruxes section at the end (§12) is where the load-bearing assumptions live; if any one fails, parts of the picture have to be rebuilt.
1. Move I’m making
This stage is a decomposition + generating function + integration, in that order:
- Decomposition — orthogonalize phenotypic variance into mechanism-specific components, with explicit non-orthogonal
Cov(G,E)and interaction terms as the principled exceptions. - Generating function — write the per-person phenotype as a deterministic function of those components plus stochastic noise. The variance decomposition follows by taking
V(·)of the generating function. - Integration — show that twin, SNP, and within-family heritability estimators are projections of the same underlying decomposition onto different observable subspaces. The Wilson Effect, AM inflation, and genetic-nurture findings then read as motion of those projections, not as separate phenomena.
What’s not ready: anything in the topology marked O (open), and the polygenic-score causal-vs-summary debate, where the underlying disagreement isn’t yet a formal one.
2. The generating function
For a single person i in a population at developmental time t, sampled from a stable mating regime:
P_i(t) = A_d,i + A_i,i + A_LD,i + C_i + E_m,i + E_s,i + I_i + μ(t)
| Term | Mechanism | Source identity |
|---|---|---|
A_d | Direct genetic — additive effect of person’s own transmitted causal alleles, evaluated as if mating were random | Σ_k β_k · g_{ik} over causal SNPs k |
A_i | Indirect genetic (genetic nurture) — additive effect of parents’ (and extended-family) genotypes operating through the rearing environment | parents’ PGS × environmental transmission coefficient |
A_LD | Assortative-mating LD inflation — additional additive variance induced by linkage among causal variants from non-random mating | At AM equilibrium, V(A_d) + V(A_LD) = h²_obs; the partition is V(A_LD)/h²_obs = r_δ. |
C | Shared environment residual — environmental effects shared by siblings not already captured by A_i. Adult personality: ~0. Education / religiosity / politics: nonzero | |
E_m | Measured non-shared environment — identifiable causes (lead, schooling, head injury, peer composition, nutrition) | each enters with a measured causal coefficient, e.g. lead: β ≈ −6.2 IQ pts per 1–10 µg/dL |
E_s | Stochastic developmental noise — unmeasured non-shared variance: developmental contingencies, immune/microbial, microscale neural variation, measurement error | the unmodeled residual; ~50% of personality variance |
I | Interaction terms — G×E, G×G (epistasis), G×age. As of 2025 evidence, generally small at PGS-by-environment scale; large only at extreme environmental insults | residual non-additivity |
μ(t) | Population mean at age t — not a person-level term but the developmental trajectory the person grows through | calibrated to age-norm tables |
Why this form: this is the additive-decomposition default of quantitative genetics extended with the two corrections that the 2018–2025 literature has installed into the field — separating A_d from A_i (Kong 2018, Young 2022) and separating A_d from A_LD (Border 2022, Yengo 2018, Wainschtein 2025). Earlier formulations folded A_i into A_d and A_LD into A_d and got the wrong answer about how much of the population-level genetic signal is direct biological causation. The within-family literature is what made these terms separately estimable.
Scope note — scalar trait, not g-loaded vector: P_i(t) is written as a scalar for one trait at a time. For cognitive ability, this collapses an underlying multi-ability structure (g + specific abilities, the CHC hierarchy) into a single phenotypic measure. The collapse is faithful when reporting g-loaded composite scores (e.g., full-scale IQ), and reasonable for any single primary ability. It is not faithful when the question is “how much of A_d for cognition is g versus specific abilities” — that requires a multivariate extension where each ability gets its own decomposition and g enters as a latent common factor across them. The topology’s foundational assumption A3 (g exists as a real dimension) lives at this level: the model below operates inside a single ability/composite and inherits g as a property of which ability is being measured rather than as a structural component. For sex differences (Module B, §3.4), the multivariate extension is necessary by construction; that’s why it appears as a separate module.
2.1 Variance decomposition
Taking variance of the generating function and tracking the cross-terms:
V(P) = V(A_d) + V(A_i) + V(A_LD)
+ V(C) + V(E_m) + V(E_s)
+ 2·Cov(A_d, A_i) ← genetic nurture is correlated with direct effects (parents pass both)
+ 2·Cov(A_d, E_m) ← active rGE: people select environments matching propensities
+ 2·Cov(A_d, C) ← passive rGE residual (small once A_i is split out)
+ V(I)
The off-diagonal Cov terms are why “orthogonal decomposition” is the wrong frame for this system. The system is block-orthogonal: the additive components are roughly orthogonal to the residual environment but not to each other, and the cross-terms are the formal home of every gene-environment correlation finding in the literature. Pretending they’re zero is the single most common modeling error.
2.2 Heritability identities
Three quantities are estimable from data; each picks up a different subset of the variance terms. The mapping is more subtle than a casual reading of the literature suggests, and it is worth getting right because the public-discourse confusion about “twin studies overestimate” turns on this exact algebra.
The non-obvious point about twin h²: V(A_i) (genetic nurture) is shared identically by MZ and DZ co-twins, because they share the same parents. Under a correctly specified ACE model, this variance lands in C, not A. So a faithful classical twin model does not count genetic nurture as heritability. The empirical observation that twin h² > within-family h² (e.g., for EA: 0.40 vs ~0.15) is therefore not due to twin h² capturing A_i directly. It is mostly due to two model-misspecification leakages: the ACE assumption rDZ_A = 0.5 fails under assortative mating (true sibling additive correlation under AM is 0.5·(1+r_δ)), and the assumption that genetic nurture’s contribution is fully shared between siblings can fail if parents differentially treat MZ vs DZ pairs.
| Estimator | What it estimates (correctly specified) | Practical leakages |
|---|---|---|
Twin h² (classical ACE: 2·(rMZ − rDZ)) | V(A_d) + V(A_LD) | Under unmodeled AM, some V(A_i) and V(C) bleed into A. Empirically, classical twin h² for EA exceeds within-family by ~0.20–0.25. |
| SNP h² (GREML, LDSC on population GWAS) | V(A_d, common) + V(A_LD, common) + V(A_i, common)·attenuated | Population GWAS effect sizes β_pop = β_d + k·β_i (where k is the AM coupling between transmitted and non-transmitted alleles), so SNP h² is inflated by some V(A_i), but attenuated relative to the full V(A_i) because k < 1. Excludes rare variants. |
| WGS h² (Wainschtein 2025) | SNP h² + V(A_d, rare) + V(A_LD, rare) | Closes the rare-variant gap; same A_i contamination as SNP h² unless within-family. |
| Within-family h² (sib-FE, MZ-discordant, parent-offspring trios) | V(A_d) | Removes A_i and A_LD cleanly; leaves direct additive only. With WGS: V(A_d) + V(A_d, rare). |
This is the method gradient (S2 in the topology):
twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h²
The gaps are not measurement error. They are the data’s way of telling you how much of “heritability” is structural (AM-LD), how much is environmental-via-parents (A_i), and how much depends on rare variants common-variant arrays cannot tag.
For educational attainment in 2025: classical twin h² ≈ 0.40, common-variant SNP h² ≈ 0.20–0.25, WGS h² ≈ 0.30 (with rare-variant contribution), within-family additive ≈ 0.15.
Important caveat on Falconer’s AM bias (added in pass 6 after a reviewer correction). The “What it estimates” column above is exact only under random mating. Under positive AM, Falconer’s formula 2·(rMZ − rDZ) is biased downward by factor (1 − m_A) where m_A ≈ m·h² — fraternal twins share more than 50% of trait-relevant alleles because their parents are genetically more similar than chance, raising rDZ relative to rMZ and shrinking the formula’s output. So Falconer estimates [V(A_d) + V(A_LD)] · (1 − m_A) / V(P), not V(A_AM)/V(P) directly.
This matters for interpreting the gap. When classical twin h² > within-family h² for socially-structured traits (EA: 0.40 vs 0.15), the gap is dominated by other classical-ACE biases — primarily the equal-environments assumption (MZ co-twins are treated more similarly than DZ co-twins, inflating MZ correlation) and genetic-nurture leakage (V(A_i) leaks into A under model misspecification rather than landing cleanly in C) — partially offset by the AM downward bias. AM is a real phenomenon at the population level (Crow-Felsenstein V(A_LD) inflation; see §3.1 below) but it does not, on net, drive the twin-vs-within-family gap. The dominant inflation source is genetic nurture and EEA, with direct empirical anchors in Kong 2018 (non-transmitted PGS effect = 29.9% of transmitted for EA) and Okbay 2022 EA4 (within-family direct ~50% of population PGI). Within-family designs control for AM, EEA, and genetic nurture simultaneously.
This is the single calculation a careful reader of “twin studies vs molecular studies” headlines should be able to do. The numbers don’t disagree; they answer different questions.
3. Closed-form pieces
Three components admit clean equations. The rest are calibrated empirically.
3.1 Assortative-mating inflation (Crow–Felsenstein)
There are two ways to use the AM-inflation formula, and they answer different questions.
Forward problem (rarely the relevant one): given the random-mating heritability h²_rm of a trait, what is the equilibrium heritability after stable AM? The answer is a fixed-point coupling r_δ = m · h²*, V_A* = V_A / (1 − r_δ), h²* = V_A* / (V_A* + V_E), reached in ~5–10 generations of stable assortment (Crow & Felsenstein 1968). One-iteration approximation: r_δ ≈ m · h²_rm, inflation ≈ 1 / (1 − m·h²_rm).
Inverse / partition problem (what the dashboard does): given the AM-equilibrium population additive variance V(A_AM)/V(P) = h²_obs, partition it into the random-mating-equivalent direct component V(A_d) and the AM-induced LD inflation V(A_LD):
r_δ ≈ m · h²_obs
V(A_d) = h²_obs / (1 + r_δ + r_δ² + …) = h²_obs · (1 − r_δ)
V(A_LD) = h²_obs − V(A_d) = h²_obs · r_δ
The Wilson curve gives h²_obs(t) directly, so the partition uses r_δ = m · h²_obs(t) with no iteration needed. This is what the dashboard implements. Pass-2 versions of the dashboard erroneously inflated h²_obs on top of itself, pushing twin h² above 1.0 at high parameter values; pass-4 corrected this.
Note on what h²_obs should represent here (added in pass 6 after a reviewer correction). The partition formula is a clean population-level decomposition of V(A_AM). Different estimators recover V(A_AM)/V(P) with different biases: SNP-based heritability (GREML / LDSC on unrelated individuals) recovers it approximately unbiased; classical twin h² (Falconer) recovers V(A_AM)/V(P) · (1 − m_A) — biased downward by AM, partially offset upward by EEA violations and genetic-nurture leakage. The dashboard’s Wilson-fit twin estimates conflate these biases. The partition formula’s empirical validation is the match against SNP-based AM-LD estimates: Yengo 2018 measures V(A_LD)/V(A) for height at 14–23% empirically, matching the formula’s prediction of m·h² = 20%. For socially-structured traits where Falconer twin h² is itself substantially inflated by EEA + genetic nurture, applying the partition formula to twin h² over-attributes the partition share to AM relative to its true population-level magnitude.
Worked anchors:
- Educational attainment with
m ≈ 0.4,h²_obs ≈ 0.40(twin) →r_δ ≈ 0.16,V(A_d) ≈ 0.34,V(A_LD) ≈ 0.06. Caveat: theh²_obs ≈ 0.40here is the Falconer twin estimate, which is a biased proxy for the AM-equilibrium V(A)/V(P); applying the partition to the SNP-based estimate (~0.13) would give a smaller absolute V(A_LD). - Height with
m ≈ 0.25,h²_obs ≈ 0.85→r_δ ≈ 0.21,V(A_d) ≈ 0.67,V(A_LD) ≈ 0.18— matches the 14–23% empirical “AM-inflated” share Border et al. and Yengo et al. report (this is the trait where the partition’s empirical validation is cleanest, because Falconer-bias vs SNP-h² discrepancies are smaller for height than for socially-structured traits).
Cross-trait AM (m_xy ≠ 0) extends the same logic to off-diagonal entries of the genetic-covariance matrix and is the formal reason E7 finds R² = 0.74 between phenotypic-cross-mate correlations and genetic-correlation estimates. The cross-trait AM result (Border 2022) survives independently of the within-trait Falconer-bias issue: it’s about between-trait LD inflating reported genetic correlations between disorders, which is empirically validated and not in dispute.
3.2 Wilson-Effect saturation curve
Heritability of cognitive ability rises with age because active rGE (G1) compounds: as children gain agency, they select environments matching their genetic propensities, amplifying genetic variance and shrinking shared environment. The empirical age curve from Bouchard 2013 and Briley & Tucker-Drob 2013 is sigmoidal — slow rise in early childhood, fastest gain in late childhood / early adolescence, saturation in late adolescence. A logistic gives a clean three-parameter fit:
h²(t) = h²_∞ / (1 + exp(−k_h · (t − t_50)))
With h²_∞ ≈ 0.80, t_50 ≈ 9 years (age at half-asymptote), and k_h ≈ 0.30/year: h²(5) ≈ 0.19, h²(10) ≈ 0.46, h²(15) ≈ 0.69, h²(25) ≈ 0.79. These match Bouchard’s anchors within ~3 percentage points across the full developmental range.
(Earlier passes used a saturating-exponential h²_∞ − (h²_∞ − h²_0)·exp(−k·t), which rises too fast at the young end — it produced h²(5) ≈ 0.52 for cognition vs the empirical ~0.20. The logistic form is the smallest functional change that fits the empirical sigmoidal pattern.)
The shared-environment trace runs an inverse path with a non-zero asymptote, since shared environment for cognition does not actually drop to zero in adulthood (~0.05 plateau is well-attested):
c²(t) = c²_∞ + (c²_0 − c²_∞) · exp(−k_c · t)
Cognition: c²_0 ≈ 0.50, c²_∞ ≈ 0.05, k_c ≈ 0.15/year. For Big Five personality, c²_∞ ≈ 0 is appropriate (shared family environment effectively vanishes for personality by adulthood). For educational attainment and religiosity, c²_∞ ≈ 0.10–0.15 should be substituted — these are exception traits where shared environment persists throughout life.
Both formulas are phenomenological — the parameters are not derived from a deeper model. They are calibration knobs for the dashboard.
3.3 Genetic-nurture decomposition (additive form)
Define g_T as the offspring’s transmitted-allele PGS and g_NT as the parental non-transmitted-allele PGS. Then:
A_d = β_d · g_T
A_i = β_i · g_NT
Empirically (Kong 2018, Wang 2021, Okbay 2022, Howe 2022):
β_i / β_d ≈ 0.3 – 0.5 (educational attainment)
β_i / β_d ≈ 0.0 – 0.1 (height, BMI)
β_i / β_d ≈ 0.4 – 0.6 (cognitive performance)
β-level vs variance-level. The ratio β_i/β_d quoted above is at the regression-coefficient level. Translating to a variance contribution requires squaring (for the pure variance term) and an explicit cross-term:
V(A_i) = β_i² · V(g) = (β_i/β_d)² · V(A_d)
2·Cov(A_d, A_i) = 2·k · β_d · β_i · V(g) = 2·k · (β_i/β_d) · V(A_d)
Where k is the AM-induced correlation between an offspring’s transmitted-allele PGS and the parental non-transmitted-allele PGS. Under random mating k ≈ 0 (Mendelian segregation makes them independent). Under stable AM, k > 0 because spousal phenotypic correlation creates correlation between mom-transmitted alleles and dad-non-transmitted alleles (and vice versa); for AM-strong traits (EA, height) k is empirically in the 0.1–0.5 range, depending on the strength and stability of assortment.
This means 2·Cov(A_d, A_i) is not generally larger than V(A_i). For EA with β_i/β_d ≈ 0.4, V(A_d) ≈ 0.15, and a moderate k ≈ 0.2: V(A_i) ≈ 0.024, 2·Cov ≈ 2·0.2·0.4·0.15 = 0.024. Total genetic-nurture variance contribution ≈ 0.048 — modest, on the same order as V(A_i) itself.
The dashboard displays only the pure V(A_i) = (β_i/β_d)² · V(A_d) slice as a clean variance bucket. The cross-term 2·Cov(A_d, A_i) is the leakage path that makes empirical twin h² (under unmodeled AM) exceed the dashboard’s “Twin h² (ACE)” output. It is acknowledged in the help text rather than allocated to a separate bar segment, partly because k is poorly constrained empirically and partly because adding a cross-term slice would over-clutter the visualization without changing the qualitative picture.
The relation V(A_i) + 2·Cov(A_d, A_i) ≈ V_PGS,population − V_PGS,within-family is approximate but useful — it turns “missing heritability after within-family correction” from a puzzle into an order-of-magnitude measurement. The exact RHS is (β_i/β_d) · V(A_d) · (2k + (β_i/β_d)), which depends on k and degrades to small values when AM is weak.
3.4 Multivariate sex-difference algebra (Module B)
For a trait vector x with covariance matrix Σ and group means μ_F, μ_M, the multivariate effect size is the Mahalanobis distance:
D² = (μ_F − μ_M)ᵀ · Σ⁻¹ · (μ_F − μ_M)
For uncorrelated traits with equal univariate effect sizes |d|, D² = n·d² so D = d·√n. For correlated traits, the inverse covariance structure either amplifies or shrinks D depending on whether sex-difference vectors are aligned with high-variance or low-variance directions of Σ.
Worked example. Take 15 personality dimensions (16PF), univariate |d| ≈ 0.5 on average, with positive inter-trait correlations averaging ρ ≈ 0.20. Then approximately:
D² ≈ d² · 1ᵀ · Σ⁻¹ · 1
≈ d² · n / (1 + (n − 1)·ρ̄) if Σ has a constant-correlation structure
≈ 0.25 · 15 / (1 + 14·0.20)
≈ 0.25 · 3.95
≈ 0.99
D ≈ 1.0
The equicorrelated approximation undershoots Del Giudice 2012’s reported D = 2.71, and this gap is informative rather than a bug. To recover 2.71 in the equicorrelated form would require average univariate |d| ≈ 1.3, far above what 16PF or NEO papers report at the observed level. What Del Giudice actually did was use multigroup latent-variable modeling with measurement-error disattenuation: he corrected each factor’s d for unreliability and then computed D on the latent (true-score) means. Disattenuation magnifies effect sizes when reliability is well below 1.0, and aggregating across 15 factors then compounds the magnification. The honest summary is: at the level of observed (raw, pre-disattenuation) measurement, multivariate sex-difference D for personality is ~1.0–1.5; Del Giudice’s 2.71 is the latent-true-score analogue.
The intuition behind the algebra still holds: if men and women differ on dimensions that are weakly correlated with each other, every dimension contributes independent information, and D grows with √n. If they differ on highly correlated dimensions, the differences carry redundant information and D plateaus. But the gap between observed and disattenuated D is itself a substantive piece of the field’s debate — and worth flagging rather than papering over.
Why this matters for distortions. D3 (the “gender similarities” framing) cites univariate d ≈ 0.05 for math performance and reads it as evidence of broad similarity. D4 (pop-evpsych framing) cites multivariate D ≈ 2.71 and reads it as evidence of broad difference. Both citations are correct. The bridge equation shows that they are about different objects: a single dimension vs. a 15-dimensional space. Anyone who hasn’t internalized this algebra can be silently captured by either framing.
The algebra is general — not just for sex. D² = (μ_A − μ_B)ᵀ · Σ⁻¹ · (μ_A − μ_B) applies to any two-group comparison: sex, age cohort, occupational sample, clinical vs. control, urban vs. rural — anywhere group means are reported on a multivariate panel. The module is presented in sex-difference language because that is where the framing trap concentrates, but readers thinking about other group comparisons can use the same dashboard. The L4 firewall (§5.2) does not block this generalization at the within-population level; it only blocks the leap from within-population variance/distance estimates to between-population causal claims. A descriptive D between two samples is fine; a causal interpretation of that D requires assumptions the model does not provide.
3.5 PGS portability decay (deferred)
Topology Variant C: accuracy(distance) calibration from Ding et al. 2023 (r = −0.95 between genetic distance and PGS R² across 84 traits) is a clean candidate for closed-form. Deferred to a future tool because it sits at the population-genetics boundary rather than the within-population generative process this stage formalizes. Listed as a follow-up.
4. Composing the parts: anchors the dashboard preserves
The dashboard above stitches §3.1, §3.2, and §3.3 into one panel — sliders for trait class, age, m, β_i/β_d, and rare-variant share; outputs the variance decomposition and the three method-specific h² numbers. Four sanity-check anchors hold under the calibrated defaults:
- IQ at age 5 (cognitive): h²(5) ≈ 0.18, V(C) ≈ 0.26, V(A_d) ≈ 0.17. Matches Bouchard 2013.
- IQ at age 25 (cognitive, m=0.4, β_i/β_d=0.4): h²(25) ≈ 0.79, V(A_d) ≈ 0.54, V(A_LD) ≈ 0.25, V(A_i) ≈ 0.09, V(C) ≈ 0.06. Within-family h² (= V(A_d)) ≈ 0.54 — about a third more than the often-quoted EA within-family of 0.15, because cognition is a higher-h² trait than education.
- Big Five across adulthood: h² ≈ 0.45, V(C) ≈ 0, V(E) ≈ 0.55. Effectively flat from age 5 onward.
- Variance budget closes: V(A_d) + V(A_LD) + V(A_i) + V(C) + V(E) = 1.0 by construction. Twin h² never exceeds the Wilson asymptote.
These are the calibration targets. The biggest non-obvious one is anchor 4: the previous dashboard pass had the variance budget overflow under default parameters (twin h² > 1.0 at age 25 with m=0.4), which was a real bug. The current partition h²_obs = V(A_d) + V(A_LD) keeps the budget bounded by construction.
For traits the dashboard does not have a dedicated class for (educational attainment, height, religiosity, political affiliation), the user can approximate by choosing the closest class and adjusting sliders. EA-like behavior emerges from cognitive with m=0.4 and a mental note that h²(25) for EA is closer to 0.40 than 0.79 — i.e., the dashboard’s cognitive class is calibrated to IQ, not EA.
5. Boundary conditions and where the model breaks
The generating function is correct only inside its scope. Five boundaries are explicit:
-
Severe psychiatric tail. The hyperpolygenic
A_d = Σ β_k g_{ik}form assumes thousands of small effects. For early-onset autism with intellectual disability, single rare variants (CHD8, SCN2A) can carry effects of d > 1.0. The decomposition still works component-by-component butA_dbecomes dominated by a small number of large-effect alleles — effectively Mendelian rather than polygenic. The model should either widen its prior on individualβ_kor hand off to a separate Mendelian module at the tail. -
Between-population mean differences (L4 firewall). Every term in the generating function is defined within a population at a stable mating regime. The model is structurally silent on between-population means: there is no
μ_popterm to compare. ComputingD² = (μ_pop1 − μ_pop2)ᵀ Σ⁻¹ (μ_pop1 − μ_pop2)is mathematically possible but requires assumingΣ_pop1 = Σ_pop2and equal causal architecture across populations — neither of which is empirically supported (Ding 2023’s PGS-portability collapse is the empirical evidence that the assumption fails). This is the L4 / Lewontin firewall encoded directly into model scope. -
Severe environmental insults.
V(I)(interaction) is small at PGS-by-environment scale but large when environments cross threshold (lead, alcohol, severe deprivation, iodine). The additive decomposition under-fits at thresholds. Use the model in the normal range; switch to an explicit threshold-effect model at the extreme. -
Non-equilibrium AM. The Crow–Felsenstein formula assumes AM has reached equilibrium. For populations under rapidly changing assortment regimes (e.g. rapid shifts in educational stratification), the inflation factor is en-route to the equilibrium value, not at it. Use the formula as an upper bound under those conditions.
-
Individual-level inference (L1).
V(A_d)is a population variance. For a single person,A_dis a realization, not a partition. Statements like “70% of this individual’s intelligence is genetic” do not type-check against the model. The dashboard exposes population variance only.
6. Distortion-aware reading
Each component of the decomposition has a public-discourse failure mode. The model’s job is to make the failure visible, not to suppress it.
| Component | Common misreading | What the model says |
|---|---|---|
V(A_d) (high) | “Genes determine outcomes” | Population variance. Says nothing about a specific person’s prospects. |
V(A_i) (large) | “Family environment doesn’t matter” | The opposite: this term is family environment, mediated by parental genotypes that correlate with parental phenotypes. |
V(A_LD) | Usually invisible to public discourse | Inflates V(A) at the population level by ~10–25% via AM-induced LD between trait-relevant alleles (Yengo 2018: 14–23% for height, matching the formula prediction). Does NOT on net inflate Falconer twin h² — AM actually biases Falconer downward, partially offsetting other classical-ACE biases (see §2.2 caveat). |
Cov(A_d, E_m) (active rGE) | “People shape their environments” → therefore environments don’t matter | They matter — the covariance term is their effect, just non-orthogonal to genes. |
| Twin h² ≥ within-family h² | ”Twin studies overestimate” | They estimate a different quantity (population additive variance vs. direct effect). Both are real. |
Multivariate D large | ”Sexes are categorically different” | D is a distribution distance; individuals across the distributions still overlap substantially. Dimensional, not taxonic. |
Univariate d small | ”Sexes are essentially the same” | True for the dimension cited, false in the multivariate space. |
D1 and D2 (the two heaviest distortions) both operate by selecting a subset of these readings. The model doesn’t resolve the political dispute, but anyone running the dashboard should be able to see why each side is technically correct about the term they’re highlighting and incomplete about the rest.
7. Adversarial + steelman
Four objections to the formalization itself. The strongest version of each, then the model’s honest response.
Objection 1 — Variance bookkeeping is not a causal model
The decomposition partitions variance into named components, but it never specifies why A_d produces phenotype P rather than the reverse. A regression coefficient β_d from a within-family GWAS is not a causal effect; it is a statistical association under specific design assumptions. Calling the decomposition a “generating function” is false advertising — it generates expected variance given parameters, not actual phenotype given a causal mechanism.
Steelman: This is the strongest objection because it is the same disagreement that drives O1 (Plomin vs Turkheimer). The model accommodates both readings rather than picking one: under the Plomin reading, β_d is a causal coefficient and the decomposition is generative in the strong sense; under the Turkheimer reading, β_d is a regression coefficient that happens to be unbiased under within-family identification, but the underlying biology is unspecified. Both readings predict the same variance budget, which is why the data hasn’t yet decided between them.
Response: Acknowledged. The model is more accurately described as a conditional variance generating function — given parameters, it generates the expected variance pattern. The causal interpretation of those parameters is exactly what’s contested, and the decomposition’s value is precisely that it lets both interpretations be expressed in a shared language. Stage 4 (data) is where the disagreement gets sharper: the test is whether β_d, within-family moves under environmental intervention. Plomin predicts no, Turkheimer predicts yes, and the model can express both predictions cleanly.
Objection 2 — ACE assumptions are unrealistic enough that “twin h²” is not really estimating anything physical
EEA fails (MZ co-twins are treated more similarly than DZ); shared-environment effects vary by zygosity; non-shared environment for siblings is correlated with shared parental treatment. Stack the violations and the entire ACE framework is just a parametric reparameterization of the data, not an estimation of underlying components.
Steelman: Joseph and Richardson’s critique of behavior genetics rests partly on this argument: the assumptions that make twin h² meaningful are violated enough to make the resulting numbers epistemically empty. The strongest version isn’t that twin studies are “wrong” but that they’re under-determined — multiple causal worlds produce the same rMZ and rDZ patterns.
Response: Partially conceded. Classical ACE is under-determined and the assumption-violation issue is real. However, two empirical findings constrain the under-determination: (a) SNP-based heritability (which uses unrelated individuals and bypasses EEA entirely) recovers a substantial fraction of twin h² across major traits — for height about 60% with common SNPs alone (rising to ~80% with whole-genome sequencing that captures rare variants), for cognitive ability about 25–40%, for educational attainment about 30–50%. The fraction varies by trait but is consistently non-trivial — the EEA-bias-only explanation for twin h² is empirically untenable; (b) MZ-reared-apart studies (Bouchard 1990 and updates) reproduce the basic Wilson Effect pattern with EEA structurally absent. The model takes twin h² as an upper bound on direct + indirect additive variance, not as a precise estimate. The method gradient is what makes the imprecision survivable: comparing twin to within-family bounds the gap.
Objection 3 — Additive form misses dominance and epistasis
Dominance variance V_D and epistatic variance V_I (gene-gene interactions) are real and measurable. Twin studies fitting ADE models routinely find non-trivial V_D. The additive-only generating function is a simplification that loses information.
Steelman: For some traits — height (V_D ≈ 0), educational attainment (V_D ≈ 0–0.05) — dominance is small and the additive simplification is fine. For others — psychiatric disorders, where ADE models often outperform ACE — non-additive variance is potentially substantial. The additive-only model is not “wrong” so much as inappropriate for that subset of traits.
Response: Concede the scope limit. The generating function as written is for traits where polygenic-additive architecture dominates (which is most psychological traits, per Hill, Goddard & Visscher 2008’s argument that even where dominance exists, additive variance often captures most of the variance because of allele-frequency distributions). For severe psychopathology and other traits with substantial V_D, an extended model would replace A_d with A_d + D_d and add Cov(A_d, D_d) cross-terms. The dashboard does not currently expose this; the prose acknowledges the boundary in §5.
Objection 4 — Multivariate D conflates measurement structure with reality
Mahalanobis D depends on Σ. Σ is the within-sex covariance of measured traits, which depends on which traits you measure, how you measure them, and how the population varies. Different measurement panels produce different Ds for the same underlying difference. The disattenuated D = 2.71 from Del Giudice is not a property of human nature; it’s a property of the 16PF + the U.S. sample + the latent-variable model.
Steelman: This is correct and underappreciated. D is not a population parameter in the way that μ_F − μ_M is. It is a model-relative summary statistic. Two researchers using different but equally defensible measurement panels can produce D values that differ by a factor of two or more.
Response: Conceded fully. The multivariate-D module’s value is comparative, not absolute. It tells you given a measurement structure, how multivariate aggregation magnifies the apparent sex difference relative to any single dimension. The module’s pedagogical purpose is to show why the same data (panel of dimensions) supports both “small per-dimension differences” and “large multivariate distance” — neither claim is wrong, but neither is the whole answer. The dashboard surfaces the dependency on Σ via the ρ̄ slider so users can see how D moves under different correlation structures.
8. Open questions that the model exposes (Stage-4 inputs)
The formal apparatus makes four open questions sharper than verbal discussion alone:
-
O1 (PGS interpretation). The decomposition treats
β_d · g_Tas a direct genetic term. Plomin’s “PGS is a real biological cause” reading takesβ_das a structural causal coefficient. Turkheimer’s “PGS is a summary of correlated environments” reading saysβ_dis contaminated by uncontrolledCov(A_d, E_m). The two interpretations make different predictions about howβ_dshould change under environmental intervention. Stage 4 question: for traits with large enough within-family GWAS, doesβ_d, within-familymove under intervention (schooling reform, nutrition shifts) the way Plomin predicts (it shouldn’t) or the way Turkheimer predicts (it should)? -
O3 (Gender Equality Paradox). The multivariate algebra in 3.4 shows that
Ddepends on the inter-trait correlation structureΣ. IfΣdiffers between high-equality and low-equality societies,Dwill differ even if univariateμ_F − μ_Mdifferences are fixed. Stage 4 question: doesΣ(the personality covariance matrix itself) change across societies, or only the means? This is a different empirical question than “are the differences innate.” -
O6 (what
E_sactually is). The model treats stochastic developmental noise as an unmodeled residual. As Stage 4 data accumulates, candidates (immune/microbial, peer-network, epigenetic, measurement error) can be peeled off intoE_mand the residualE_sshould shrink. Stage 4 question: how much of the current ~50% personalityE_scan be moved intoE_mgiven current measurement panels? -
O7 (cross-disorder rg post-AM correction). Module 3.4’s bridge between cross-trait phenotypic correlations and genetic correlations under AM (Border 2022, LAVA-Knock 2024) gives a formal correction. Stage 4 question: applied at scale to the full psychiatric-disorder rg matrix, what fraction of the cross-disorder genetic correlations survive the correction?
The two questions deferred from Section 1 (PGS portability and the GEP causal mechanism) are not sharpened by the model — they require new measurement, not new math.
9. Handoff to Stage 4 (data pipeline)
The model defines five parameter sets that Stage 4 needs to populate:
| Parameter | Source | Trait coverage |
|---|---|---|
β_d, β_i | Within-family GWAS (Howe 2022, Okbay 2022) | EA, height, BMI, cognitive ability, depressive symptoms, smoking — extending |
m (cross-spouse phenotypic correlation) | UK Biobank, HUNT, MoBa | EA, height, BMI, cognition, neuroticism — well-covered |
h²(t) calibration | Bouchard 2013, Briley & Tucker-Drob 2013 longitudinal twin | Cognition (well-covered); personality (sparse); psychopathology (very sparse) |
Σ for sex-difference module | Del Giudice 2012, Schmitt 2008, Kaiser 2020 | 16PF, NEO, Big Five |
share_rare | Wainschtein 2025 | Height, EA, several psychiatric — extending |
The single highest-value Stage-4 deliverable: a per-trait table of (twin h², SNP h², WGS h², within-family h², m, β_i/β_d) at adulthood, ideally with cohort-by-age stratification. Most of the components already exist in published consortium summaries; the table is mostly aggregation, not new analysis.
10. Connection to adjacent topics
-
Parent-to-Child Transmission (planned). The
A_iterm is the formal answer to “how much does parenting matter beyond genes for outcomes that look genetic.” That topic should adopt this generating function as its starting point and refineβ_iby domain (cognition vs. personality vs. health behaviors) and by mechanism (vocabulary input, expectation-setting, neighborhood selection). The Nivard et al. 2024 finding — that indirect genetic effects extend beyond the nuclear family — impliesβ_ishould be further decomposed into a parent-level term and a dynastic/extended-family term. -
Evolution-Modernity Mismatch (planned). The
μ(t)population-mean trajectory is the formal home of secular shifts (Flynn rise, Flynn reversal, age-of-puberty drift). Within-cohort within-sibship designs are the cleanest separator of genuine environmental shifts inμ(t)from compositional or selection artifacts. The Pietschnig 2024 finding that the positive manifold itself may be weakening across recent cohorts suggestsμ(t)is not a one-dimensional curve but a moving structure of which abilities are gaining or losing — which the current scalar form does not capture.
(A connection to a planned “Bedrock Generating Functions” topic was floated in pass 1 but dropped — the analogy was real but too loose to do useful work here, and any cross-domain claim should live in that topic’s own formalization rather than be asserted from this one.)
11. Glossary (formalization-specific additions)
This section’s symbols are listed in the order they appear in the generating function. The lit-review and topology glossaries cover the field-level terminology (h², SNP, GWAS, PGS, AM, rGE, GxE, etc.) and are not duplicated here.
| Symbol / term | Meaning |
|---|---|
P_i(t) | Phenotype of person i at developmental age t. Scalar (for one trait at a time); see §2 scope note for the multi-ability extension. |
A_d | Direct additive genetic component — Σ_k β_k · g_{ik} over causal SNPs the person inherits, evaluated as if mating were random. |
A_i | Indirect additive genetic component (genetic nurture) — additive effect of parents’ (and extended-family) genotypes operating through the rearing environment. |
A_LD | AM-induced LD inflation — additional additive variance from non-random mating creating linkage among causal variants. |
C | Shared-environment residual not already absorbed by A_i. |
E_m / E_s | Measured non-shared environment (lead, schooling, etc.) / stochastic developmental noise (the unmodeled residual). |
I | Interaction terms: G×E, G×G (epistasis), G×age. |
μ(t) | Population-mean trajectory at age t (developmental norm, not a person-level term). |
β_d / β_i | Direct / indirect genetic regression coefficients on phenotype, estimated from within-family / parental-genotype designs. |
g_T / g_NT | Polygenic score from offspring’s transmitted alleles / parents’ non-transmitted alleles. |
m | Cross-spouse phenotypic correlation (assortative-mating strength on the measured trait). |
r_δ | Cross-spouse correlation in additive genetic value; = m · h²_obs at AM equilibrium. The dashboard uses this directly (no fixed-point iteration) since Wilson h²(t) is already the equilibrium quantity. |
k | AM-induced correlation between transmitted and non-transmitted alleles within parents; appears in the genetic-nurture variance identity (§3.3). |
V_A* | Additive genetic variance at AM equilibrium; V_A* = V_A / (1 − r_δ) in the Crow–Felsenstein form. The dashboard observes V_A* directly via h²_obs and uses the formula to partition it into V(A_d) and V(A_LD). |
h²(t) | Heritability as a function of age; logistic form h²_∞ / (1 + exp(−k·(t − t_50))) in §3.2. Earlier passes used a saturating exponential which fit the asymptote but rose too fast in childhood; the logistic is the smallest functional change that captures the empirical sigmoidal pattern. |
Σ | Trait-level (within-sex or within-group) covariance matrix used in multivariate-D calculation. |
Mahalanobis D | Multivariate generalization of Cohen’s d: √(Δμᵀ Σ⁻¹ Δμ). |
ρ̄ | Average inter-trait correlation in Σ; the equicorrelated approximation collapses D² to d²·n/(1+(n−1)ρ̄) (§3.4). |
block-orthogonal | Decomposition where major components are orthogonal to the residual environment but cross-terms within components (e.g. Cov(A_d, A_i)) are explicit, not zero. |
method gradient | The relationship twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h² driven by which components each estimator includes. |
12. Cruxes for this model
The topology had cruxes for the field. This stage’s cruxes are different — they are the load-bearing assumptions of the formalization itself. If any one flips, the model needs to be restructured.
| Crux | Load-bearing claim | What would flip it |
|---|---|---|
| C1 | Within-family GWAS effect estimates are an unbiased estimate of β_d. The whole A_d / A_i separation depends on this. | A demonstration that within-family designs have a systematic confound (e.g., differential parental treatment that correlates with offspring genotype) that biases β_d by more than ~10%. So far Howe 2022 / Okbay 2022 within-sibship GWAS are mutually consistent and consistent with trio-based estimates, suggesting the confound is bounded. |
| C2 | AM equilibrium has been reached or is close enough that the partition relation r_δ = m·h²_obs holds. | A demonstration that recent population-scale shifts in assortment (educational stratification expansion since 1970, online dating since 2010) have moved populations far from equilibrium for psychologically-relevant traits — at which point the observed r_δ would lag the formula’s prediction. Currently no direct evidence the partition is mis-calibrated; would require longitudinal m-by-cohort data. |
| C3 | Hyperpolygenic architecture: A_d = Σ β_k g_{ik} over thousands of small effects, no single locus dominates. | Discovery that for a major psychological trait class, ~5–10 large-effect variants account for >50% of V(A_d). Currently true only for the severe psychiatric tail (autism with ID, severe schizophrenia spectrum), where the model already concedes scope (§5.1). Would generalize to mainstream cognition only if a CRISPR-era discovery overturned the polygenic consensus. |
| C4 | A_d, A_i, A_LD are jointly identifiable given the available designs. | A demonstration that twin/SNP/within-family/WGS estimators are not sufficient to disentangle all three (e.g., that AM-LD and rare-variant contributions are mutually confounded in a way no current design can break). This would force collapsing the decomposition or treating one component as a residual. Active concern: rare-variant heritability in WGS may itself be inflated by AM-LD among rare variants, which would muddy C4. |
| C5 | Equicorrelated Σ is a useful approximation for the multivariate sex-difference module. | A demonstration that real personality covariance matrices have block-structured (or low-rank) Σ that produces qualitatively different D from the equicorrelated approximation. Already partially true: 16PF has known higher-order factor structure, which is why the equicorrelated approximation undershoots Del Giudice’s latent-variable result. Crux holds in a weakened form: equicorrelated is useful pedagogically but not quantitatively for high-dimensional panels. |
The most consequential of these is C4. If A_d, A_i, and A_LD cannot be jointly identified by current designs, the variance decomposition reduces to a coarser partition (genetic-additive vs everything else), and the field-level dispute about how much “genetic” effect is environment-mediated remains parametrically unresolvable rather than just empirically pending.
Empirical pipeline that confronts the model's eight testable predictions with currently-published consortium estimates. Seven hold cleanly (AM partition, Wilson curve, multivariate-D gap, PGS portability decay, xAM inflation, environmental causes, G×E interaction-conditional); one (the cross-paper method gradient) is mixed in an informative way. Curated CSVs (downloadable) + Python pipeline + interactive findings panel.
TLDR
This stage takes the model’s eight concrete predictions about how human psychological variation breaks down — how much of trait-variance is genetic-direct vs. genetic-via-parents vs. assortative-mating-induced vs. measured-environment vs. gene-environment-interaction — and confronts each one with currently-published consortium numbers. Seven predictions hold cleanly. One — that the four standard heritability estimators (twin, whole-genome-sequence, common-SNP, within-family) should line up in a strict numeric ordering — is mixed across published papers because each paper uses different cohorts and methods, but holds within any single paper that runs the comparison properly. That “mixed” verdict turns out to be informative rather than a model failure: it tells you the cross-paper landscape is noisier than a literal subtraction of estimates suggests.
Headline empirical findings: assortative mating (people pairing with partners of similar traits) creates linkage between trait-relevant alleles, contributing a Crow-Felsenstein V(A_LD)/V(A_AM) share of ~20% for height, ~22% for educational attainment, ~36% for schizophrenia, ~33% for ADHD, ~36% for autism (the AM-strong psychiatric block; affective disorders sit lower at ~6–14%). These percentages are population-level decompositions of V(A) at AM equilibrium — not “fraction of twin h² explained by AM.” Falconer’s classical twin formula is itself biased downward by AM, and for socially-structured traits the empirical gap between twin h² and within-family h² is dominated by genetic nurture and equal-environments-assumption violations, not AM-induced LD (see §2 H2 caveats for the corrected interpretation). Heritability of cognitive ability rises from ~20% in early childhood to ~80% in adulthood along a logistic curve fitted to Bouchard 2013’s seven anchor points within 1.8 percentage points. Multivariate sex-difference effect sizes are large (16PF Mahalanobis distance D = 2.7) when computed at the latent-variable level with measurement-error disattenuation, but only D ≈ 1 at the raw observed level — the entire “Mars-and-Venus” framing trap lives inside that disattenuation correction, not inside the multivariate algebra. Polygenic scores trained on European-ancestry data lose ~37%, ~50%, and ~78% of their accuracy in South Asian, East Asian, and African ancestry samples respectively (Martin 2019), consistent with Ding 2023’s independent continuous-distance result of Pearson r = −0.95 across 84 traits. Cross-trait assortative mating accounts for ~74% of the variance in reported psychiatric cross-disorder genetic correlations (Border 2022, 132 trait pairs). The small set of measured environments with replicated causal effects on cognition is asymmetric: severe insults (lead, fetal alcohol, deprivation, malnutrition) cost 10–30 IQ points, while enrichment above normal yields at most a few points per intervention. And gene-by-environment interaction (V(I)) shows the classic Scarr-Rowe pattern of higher heritability at higher SES only in US samples (Tucker-Drob & Bates 2016 meta-analysis: a’ = 0.074, p < .0005); equity-buffered W. European / Australian samples show no such interaction (a’ = −0.027, n.s.) — the cross-national heterogeneity is exactly what the model predicts under “V(I) is small at typical environmental variance, larger at extreme tails.”
The pipeline is intentionally small. Seven curated CSVs (one per data type, every cell source-cited), a single ~350-line Python script that produces every chart on this page, dependencies pandas + numpy + scipy. Inputs are downloadable from /data/human-psych-variation/. Stage 5 (build) consumes the CSVs directly. What the pipeline does not answer: whether polygenic scores measure direct biological causation or correlated environments (the Plomin–Turkheimer dispute, undecidable without a within-family environmental intervention no group has run); the mechanism behind the Gender Equality Paradox (needs cross-society multivariate panels that don’t exist at scale); and the full assortative-mating-corrected psychiatric genetic-correlation matrix (active research, not yet pipeline-runnable from public summary statistics).
A few terms
The data stage inherits the model formalization’s vocabulary. If you arrived here without reading the model stage, the terms below cover what’s used in the prose:
- Heritability (h²). The fraction of variance in a trait, across people in a population, that tracks genetic differences. A population statistic, not an individual one — saying “IQ is 70% heritable” does not mean 70% of any one person’s IQ is genetic.
- Twin h², SNP h², WGS h², within-family h². Four ways to estimate heritability, each picking up a slightly different slice of the underlying genetic variance. Twin: from MZ vs. DZ similarity. SNP: from GWAS effect sizes on common variants only. WGS: SNP plus rare variants. Within-family: from sibling differences, controls for parental environment.
- Assortative mating (m). The correlation between partners on a trait — partners are similar on educational attainment (m = 0.55), height (m = 0.24), political views (m = 0.58). The model’s claim is that AM creates linkage between causal genetic variants, inflating measured h² by a calculable amount.
- Polygenic score (PGS). A weighted sum of risk alleles per person, used to predict the trait. PGS R² is the variance the score explains in a held-out sample.
- Mahalanobis D. The multivariate analogue of Cohen’s d for sex (or any group) differences across multiple correlated measurements.
- V(E_m). The model’s variance bucket for measured non-shared environment — exposures with named causal coefficients (lead, schooling, iodine, etc.).
- V(I). The model’s variance bucket for interaction effects: gene × environment, gene × gene (epistasis), gene × age. The model’s specific claim is V(I) is small at typical PGS-by-environment scale but larger when environmental variance includes extreme tails — tested in H8 below.
- Scarr-Rowe interaction. The hypothesis (founded in Turkheimer 2003’s US data) that IQ heritability is lower in low-SES families than in high-SES families. Tucker-Drob & Bates 2016 meta-analyzed it and found the pattern replicates in US samples but vanishes in W. European / Australian samples. The cross-national heterogeneity is the H8 test of V(I).
H1. Method gradientmixed
The model predicts twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h². Across 15 traits with ≥2 estimators, the strict ordering holds for 9 (all 2-estimator rows where twin > SNP); fails for 6 (all rows with 3+ estimators). The pattern of failure is informative: SNP h² is consistently lower than within-family h² for socially-stratified traits, because LDSC misses the rare-variant share that within-family designs capture through transmission.
Each row plots the published estimates for one trait on the 0–1 h² scale. Larger dot = larger-N or older estimator (twin); smaller dots = newer methods. The grey bar spans min(observed) to max(observed) — its length is the cross-paper noise. Sienna dot at the trait label = predicted ordering holds; muted dot = ordering fails (informative pattern, not model failure). The "violations" you see (e.g., height WGS=0.68 below within-sibship=0.78) are cross-paper / cross-method differences, not bugs in the model: Wainschtein 2022 used N=25k unrelated EUR with WGS-GREML; Howe 2022 used N=178k siblings with sib-regression. The clean within-paper test (Howe 2022 alone, population vs. within-sibship on the same sample) holds in the predicted direction across all seven AM/IGE-strong traits the model singles out.
How to read this stage
The panel above is the artifact. The prose below is the spec.
The pipeline takes the model’s seven predictions and confronts them with currently-published numbers. Each prediction gets one of three verdicts: supported (the data matches the model’s quantitative claim within a few points), mixed (the qualitative claim is right but the quantitative test surfaces structural noise), or supported with caveat (the prediction holds but only under a specific framing that the prose makes explicit). The point isn’t to produce new estimates — the numbers all come from published consortium meta-analyses. The point is to align them in one place so the model’s predictions can be tested cleanly, and to flag where the literature is good enough vs. where the field hasn’t yet collected what the model would need.
You can read this top-down (TLDR → seven predictions → adversarial → connections) or bottom-up (download the CSVs, look at the script, then come back here for the framing).
1. Pipeline architecture
Seven curated CSVs in public/data/human-psych-variation/ (downloadable from the live site, tracked in git):
| File | Rows | Purpose |
|---|---|---|
heritability_estimates.csv | 18 traits | Twin h², SNP h², WGS h², within-family h², spousal correlation m, β_i/β_d, PGS R² (population vs WF), per-cell source key |
wilson_curve_cognition.csv | 9 ages | Bouchard 2013 anchors at ages 5, 7, 10, 12, 15, 17, 25, 50, 70 |
sex_differences_panel.csv | 7 panels | Per-panel univariate d̄, ρ̄, n_dimensions, observed D, disattenuated D — Hyde 2008, Su 2009, Schmitt 2008, Del Giudice 2012, Kaiser 2020, Ritchie 2018 |
pgs_portability.csv | 13 rows | PGS R² ratio (relative to European training) by target ancestry × trait, with genetic distance |
environmental_effects.csv | 10 exposures | Per-exposure causal effect sizes on cognition: lead, schooling, iodine, FAS, PM2.5, deprivation, malnutrition, breastfeeding, adoption, parenting |
gxe_interactions.csv | 7 rows | Tucker-Drob & Bates 2016 meta-analysis a’ by region (US vs non-US), Turkheimer 2003 anchors, German replication |
sources.csv | 23 papers | Full citation, DOI/URL, what each paper is used for |
A single Python script (pipeline.py) reads the inputs, computes derived quantities (AM partition, Wilson logistic fit, equicorrelated D, PGS portability slope, genetic-nurture variance contribution, environmental-effect summary), and writes:
out/method_gradient.csv— per-trait alignment with deltasout/am_partition.csv—r_δ, V(A_d), V(A_LD) per traitout/genetic_nurture.csv— V(A_i) and cross-term per traitout/sex_diff.csv— equicorrelated D per panelout/findings.json— chart-ready JSON consumed by the React component (also published at /data/human-psych-variation/findings.json)out/findings_table.md— markdown audit table of the seven predictions
Dependencies: pandas, numpy, scipy. No web fetches, no external services, no individual-level genetic data. Reproduces in under 1 second on a laptop.
2. Seven predictions, seven tests
H1 — Method gradient (mixed)
Claim. twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h² per trait, with gaps decomposing into AM-LD, indirect-genetic, and rare-variant contributions.
Result. Across 15 traits with at least two published estimators, the strict ordering holds for 9 (all 2-estimator rows where twin h² > SNP h²) and fails for 6 (all 3-estimator rows). Every failure is the same: SNP h² is lower than within-family h² for socially-stratified traits — for height, SNP h²=0.50 vs. within-sibship h²=0.78; for EA, SNP h²=0.13 vs. within-sibship h²=0.15; for IQ adult, SNP h²=0.20 vs. extrapolated WF h²=0.50. This is not a model failure but a structural property of LDSC: it captures common-variant additive variance in unrelated populations and undercounts the rare-variant share, while within-family designs capture rare variants implicitly through transmission. The model’s V(A_d) is naturally higher than what SNP h² estimates.
Within a single paper, the prediction holds cleanly. Howe 2022 (N=178,086 siblings) is the only published study that runs population vs. within-sibship GWAS on the same sample. Their Figure 4 shows population effects exceed within-sibship effects for height, EA, age at first birth, # children, cognitive ability, depressive symptoms, smoking — exactly the seven traits the model singles out as having non-trivial indirect-genetic contributions.
What this teaches. “twin h² > within-family h²” is the canonical robust finding (always holds). “SNP h² between twin and within-family” is a methodological artifact when applied across papers — the right cross-check is twin vs. within-family directly, leaving SNP h² as a third estimator that answers a slightly different question (common-variant only).
H2 — AM partition (supported)
Claim. V(A_LD) = m·h² with the AM equilibrium reached.
Result. Predicted V(A_LD) shares of observed h²: educational attainment 22%, height 20%, BMI 12%, schizophrenia 36%, ADHD 33%, autism 36%, bipolar 14%, MDD 6%, IQ adult 35%. Height matches Yengo 2018’s reported empirical 14–23% range; EA matches Border 2022’s qualitative “substantial fraction” finding.
The psychiatric numbers were corrected in pass 4. Pass-1/2/3 used m=0.30 for schizophrenia, ADHD, and autism (cited as “Nordsletten 2016 imputed” without verified value). Nordsletten 2016 actually reports tetrachoric spousal correlations greater than 0.40 for all three disorders — moving these from m=0.30 to m=0.45 lifts their predicted V(A_LD) share from ~24% to ~36% of h². This is a real and substantively different reading: about one third of the additive genetic variance for severe psychiatric conditions is structural assortative-mating-induced LD rather than independent direct biological signal. The model’s prediction stands; the data is more dramatic than pass-1 numbers showed.
Caveats. The Crow–Felsenstein partition assumes AM equilibrium. For traits under rapid assortment shifts (EA post-1970), this is approximate. The IQ adult prediction (35%) sits at the upper end and may overshoot — Horwitz 2023’s IQ partner correlation r=0.44 comes from a small (N=5,672) meta-analytic sample. For psychiatric disorders, “spousal correlation” is a tetrachoric across a binary diagnosis, which behaves differently than a continuous-trait partner correlation under the same equilibrium assumption — the prediction is qualitatively right but quantitative precision is lower.
A reviewer correction added in pass 7. The framing “structural assortative-mating-induced LD” implied that AM is the source of the gap between Falconer twin h² and within-family h² for socially-structured traits. This is incorrect: Falconer’s 2·(rMZ − rDZ) is itself biased downward by AM (under positive AM, fraternal twins share more than 50% of trait-relevant alleles, raising rDZ relative to rMZ). The empirical gap between twin h² and within-family h² for socially-structured traits is dominated by genetic nurture and equal-environments-assumption violations, partially offset by AM’s downward bias on Falconer. The formula V(A_LD) = m·h² is mathematically valid as a Crow-Felsenstein population-level decomposition of V(A) at AM equilibrium — Yengo 2018’s empirical 14–23% V(A_LD)/V(A) for height matches the formula prediction at the population level — but it does NOT predict the twin-vs-within-family gap, and the percentages reported above (“22% of h² for EA” etc.) should be read as population-level V(A_LD)/V(A_AM) shares, not as “fraction of twin h² explained by AM.” The cross-trait AM result (Border 2022, H6 below) is independent of this issue and stands as reported.
H3 — Wilson logistic curve (supported)
Claim. h²(t) = h²_∞ / (1 + exp(−k·(t − t₅₀))) for cognitive ability across age.
Result. Fitted to Bouchard 2013 anchors:
h²_∞ = 0.81
t_50 = 9.0 years
k = 0.27 / year
Max residual: 1.8 percentage points (at age 12). The earlier saturating-exponential form (Stage 3 pass 2) had max residual 32 pp at age 5. The logistic is the smallest functional change that matches the empirical sigmoidal pattern, and the fitted parameters are within sampling noise of the model’s prior values (h²_∞=0.80, t_50=9.0, k=0.30).
H4 — Equicorrelated D vs disattenuated D (supported with caveat)
Claim. Equicorrelated D² = d̄²·n / (1 + (n−1)·ρ̄) is a pedagogical anchor; the gap to disattenuated D is exactly the latent-variable correction.
Result. For Del Giudice 2012’s 16PF panel (n=15, d̄=0.50, ρ̄=0.18): equicorrelated D = 1.03; disattenuated D = 2.71. Ratio: 2.6×. The equicorrelated approximation is quantitatively wrong for high-dimensional disattenuated panels — but not because of an algebra error. The 2.6× factor is the disattenuation correction: latent-variable modeling magnifies effect sizes by ~1/√reliability per factor before aggregation.
For the public-discourse framing trap (univariate d small vs. multivariate D large), this means: the gap exists at both observed and latent levels (D=1.03 vs d=0.05 is already a 20× scale-up). Disattenuation pushes it further. Both Hyde 2005 (“similarities hypothesis”) and Del Giudice 2012 (“Mars and Venus”) are correct about their respective objects of measurement.
H5 — PGS portability decay (supported)
Claim. PGS accuracy decays with genetic distance from the training population.
Result. Ding et al. 2023 reports Pearson r = −0.95 between continuous PCA-based genetic distance and PGS R² across 84 traits (their analysis on individual-level UK Biobank + ATLAS data, N≈524k, which we don’t have access to). Independent categorical-ancestry estimates corroborate the trend: Martin et al. 2019 reports relative-accuracy reductions of 37%, 50%, and 78% in South Asian, East Asian, and African ancestries vs. European training; per-trait, Okbay 2022 EA4 reports near-zero EA-PGS accuracy in African samples; Yengo 2022 reports height-PGS accuracy at 10–20% of European levels in non-European ancestries; Trubetskoy 2022 reports schizophrenia-PGS accuracy at ~30% in African samples. The pipeline aggregates these per-ancestry literature anchors into one panel and computes a slope as a sanity check that the literature is internally consistent (Pearson r = −0.99 on 11 anchored rows). This is not an independent replication of Ding 2023 — those rows are themselves drawn from primary papers — but it is a defensible visualization of the convergent empirical pattern.
Why this matters for the L4 firewall. The model’s between-population scope restriction is structurally argued: there is no μ_pop term in the generating function. The empirical evidence for why the restriction matters is the portability decay — the same SNP “effect sizes” do not estimate the same causal coefficients in different populations. Causal architecture is not portable; descriptive variance partitions arguably are, but not for cross-population mean comparisons.
H6 — Cross-trait AM inflation (supported)
Claim. Cross-trait assortative mating accounts for a substantial fraction of reported psychiatric cross-disorder genetic correlations.
Result. Border 2022 (UK Biobank N=40,697 spousal pairs, 132 trait pairs): R² = 0.7432 (95% CI: 0.67–0.82) between phenotypic cross-mate correlations and reported genetic correlations. Across 6 psychiatric disorders × 5 generations: average xAM share γ̂ = 0.29. Anxiety × MDD: γ̂ = 0.21 (95% CI: 0.17–0.25). AUD × schizophrenia: γ̂ = 0.83 (95% CI: 0.59–1.24).
Interpreting γ̂. The γ̂ statistic is the ratio of the xAM-alone-projected genetic correlation to the empirical genetic correlation. A value near 1 is consistent with xAM accounting for the entire reported rg — it does not prove xAM is the cause, since alternative causal architectures (genuinely shared biology with the same effect-size profile) could produce the same ratio. But γ̂ values bounded well below 1 require an additional shared-biology contribution beyond what xAM alone can explain. The Border result is therefore a pressure-test: if reported cross-disorder rg estimates were entirely about shared biology, γ̂ would be small; the average γ̂ = 0.29 with significant pair-level variance shows the literature’s cross-disorder rg estimates carry an xAM contribution that is empirically non-trivial and pair-specific.
Implication. The within-trait V(A_LD) term is the within-trait analogue of cross-trait xAM. Same operation (LD created by non-random mating among causal alleles); they show up in different summary statistics.
H7 — Environmental causes (supported)
Claim. The model’s V(E_m) term — variance contribution of measured non-shared environment — is non-empty: a small set of exposures have large, replicated, causal effects on cognitive outcomes.
Result. Per-exposure effect sizes:
| Exposure | Effect on IQ | Source | Design |
|---|---|---|---|
| Schooling, per year | +1 to +5 pts (mean +3.4) | Ritchie & Tucker-Drob 2018 (600k participants, 3 designs) | Quasi-experimental meta |
| Breastfeeding (PROBIT RCT) | +3.2 pts | Kramer 2008 (N=17,046) | Cluster RCT |
| Within-Western-normal parenting | ~0 to +1 pts | Plomin & Daniels 1987 meta | Within-family twin |
| PM₂.₅, per 1 µg/m³ | −0.27 pts | Aghaei 2024 meta | Observational meta |
| Lead, blood 1→10 µg/dL | −6.2 pts (CI −8.6 to −3.8) | Lanphear 2005 (N=1,333, 7 cohorts) | Pooled longitudinal |
| Iodine, severe deficiency | −10 pts (recovers +8.7 with supplementation) | Bougma 2013 | Observational + RCT |
| Adoption: high → low SES | −12 pts | Capron & Duyme 1996 (N=38) | Natural experiment |
| Severe psychosocial deprivation | −15 pts | Nelson 2007 BEIP (N=136) | Natural experiment |
| Severe chronic malnutrition | −15 pts | Grantham-McGregor 2007 | Observational |
| Prenatal alcohol (full FAS) | −30 pts | Streissguth 2004 | Observational + MR |
Asymmetry is the headline finding. Removing severe insults (lead, malnutrition, deprivation, FAS) recovers double-digit IQ points; enrichment above normal (better parenting, breastfeeding) yields single-digit gains at most. The variance-share interpretation V(E_m)/V(P) depends on each exposure’s prevalence in a given population — sparse-but-large exposures (FAS, severe deprivation) contribute little to population variance despite large per-person effects, while moderate-but-common exposures (variable schooling quality, low-grade lead) contribute more. This is why the high-h² findings of behavior genetics coexist with large environmental effects without contradiction: heritability is a population-variance statistic, individual environmental effects can be enormous, and most populations have already removed the worst tails.
H8 — G×E interaction (V(I) bucket) — supported conditional
Claim. The model’s V(I) term — variance contribution of gene-environment interaction — is small at typical PGS-by-environment scale but larger when environmental variance is wide enough to include extreme tails.
Result. Tucker-Drob & Bates 2016 meta-analyzed 43 effect sizes across 14 independent studies (24,926 twin / sibling pairs, ≈50,000 individuals) testing the Scarr-Rowe Gene × SES interaction on intelligence. Their Purcell-biometric-model coefficient a' represents the expected change in the additive genetic regression on intelligence per SD of SES. Reported numbers:
| Sample | a’ | SE | Significance | N pairs |
|---|---|---|---|---|
| US-pooled | +0.074 | 0.020 | p < 0.0005 | 11,340 |
| Non-US-pooled (W. Europe / Australia) | −0.027 | 0.022 | p = 0.22 (n.s.) | 13,586 |
| Overall pooled | +0.029 | 0.019 | p = 0.14 (n.s.) | 24,926 |
Plus the founding observation from Turkheimer 2003: IQ heritability h² ≈ 0.10 in low-SES US families, rising to h² ≈ 0.72 in high-SES US families. And independent null replication in Germany (Spengler 2018: a’ = −0.01, n.s.).
Interpretation. The cross-national heterogeneity is the empirical confirmation of the model’s “extreme-environment-threshold” reading. US samples have wider environmental tails — extreme low-SES exists in larger numbers, with worse low-SES conditions, than in W. European or Australian welfare-state samples. The model predicts V(I) shows up exactly where the low-SES tail is wide enough to include genuine environmental constraint that suppresses genetic expression. Equity-buffered samples truncate that tail; the interaction shrinks toward zero. The verdict is “supported conditional” because the prediction is conditional on environmental variance: the same model that predicts a’ ≈ 0.074 in US samples predicts a’ ≈ 0 in equity-buffered samples, and both predictions match.
Caveat. The Scarr-Rowe finding is itself contested in the literature. Several individual replications have been null even within US samples (e.g., Hanscombe 2012); the pooled US a’ = 0.074 is moderate but not large. The model claim “V(I) is small at typical PGS-by-environment scale” is most supportable; the stronger claim “G×E reliably appears at extreme tails” is supportable but with wider error bars than H1–H7.
3. Headline numbers
| Statistic | Value | Source |
|---|---|---|
| Mean h² across human traits | 0.49 | Polderman 2015 (17,804 traits, 14.5M twin pairs) |
| Non-transmitted EA-PGS effect | 29.9% of transmitted | Kong 2018 (N=21,637) |
| EA4 within-family direct effect | ~50% of population PGI | Okbay 2022 (N=3M) |
| Height WGS h² | 0.68 (SE 0.10) | Wainschtein 2022 (N=25,465) |
| WGS captures of pedigree h² | 88% | Wainschtein 2025 (N=347,630, 34 traits) |
| Spousal correlation EA | 0.55 | Horwitz 2023 (N≈1.9M pairs) |
| Spousal correlation political | 0.58 | Horwitz 2023 |
| Spousal correlation IQ | 0.44 | Horwitz 2023 (N=5,672 pairs) |
| Cross-trait AM inflation R² | 0.74 (CI: 0.67–0.82) | Border 2022 (132 pairs) |
| Avg psychiatric γ̂ (xAM share) | 0.29 | Border 2022 |
| Wilson curve h²_∞ (cognition) | 0.81 (fit) | Pipeline fit to Bouchard 2013 |
| Wilson curve t_50 (cognition) | 9.0 years (fit) | Pipeline fit |
| 16PF Mahalanobis D observed | 1.03 | Equicorrelated approximation |
| 16PF Mahalanobis D disattenuated | 2.71 | Del Giudice 2012 |
| PGS R² ~ genetic distance | r = −0.95 (continuous) | Ding 2023 (84 traits, 524k indivs) |
| PGS accuracy in AFR vs EUR | 22% relative (78% reduction) | Martin 2019 (across-trait avg) |
| Lead 1→10 µg/dL → IQ | −6.2 pts | Lanphear 2005 |
| Schooling/year → IQ | +1 to +5 pts | Ritchie & Tucker-Drob 2018 |
| G×SES (US) | a’ = +0.074 (p < .0005) | Tucker-Drob & Bates 2016 (43 effects, 25k pairs) |
| G×SES (non-US) | a’ = −0.027 (n.s.) | Tucker-Drob & Bates 2016 |
| Turkheimer 2003 IQ h² range | 0.10 (low SES) → 0.72 (high SES) | Turkheimer 2003 |
4. Analytical choices
The pipeline has six judgment calls. Each is flagged in the script as # ASSUMPTION:. The most consequential:
- Twin h² as h²_observed for AM partition. Twin h² is closer to the AM-equilibrium quantity than SNP h². For traits without twin estimates we fall back to SNP h².
- AM equilibrium assumption. The Crow–Felsenstein partition assumes mating regimes are stable. For EA (post-1970 educational expansion) this is approximate.
- k ≈ 0.5·m for the genetic-nurture cross-term. The AM-coupling parameter k is empirically 0.1–0.5 for AM-strong traits; we interpolate.
- Equicorrelated Σ for multivariate D. Real personality covariance matrices have hierarchical structure; the equicorrelated approximation is pedagogical, not quantitative for high-dimensional panels.
- PGS portability linear in genetic distance. Ding 2023 reports a strong linear correlation. For genetic distances near zero the relationship may be non-linear. Our 5-trait curated panel is small.
- Within-family h² for IQ extrapolated. No within-family GWAS h² has been published for cognitive ability at the same scale as Howe 2022’s other traits. We extrapolate from EA’s WF h² and the EA-IQ rg.
5. What the pipeline does not deliver
Three open questions from the model’s §8 list are not sharpened by this stage, despite being framable:
- O1 — PGS interpretation (Plomin/Turkheimer). The decisive test is whether within-family β_d moves under environmental intervention. No paper has the design — Sacerdote 2007 Korean adoption comes closest but predates within-family GWAS. Status: open.
- O3 — Gender Equality Paradox. Tests whether multivariate sex-difference D depends on Σ-by-society in addition to μ-by-society. Stoet & Geary 2018 / Schmitt 2008 give univariate cross-cultural d’s; the multivariate piece requires Σ-by-society panels that do not yet exist at scale. Status: likely answerable in the next 5 years.
- O7 — xAM-corrected full psychiatric rg matrix. Border 2022 establishes the principle on 6 disorders. Applied at scale to the full PGC cross-disorder matrix, the corrected rg’s are likely smaller — but no group has done the correction systematically. Status: active research.
For these three, the Stage-4 honest answer is “the pipeline frames them but doesn’t resolve them.”
6. Adversarial + steelman
Four objections to the pipeline. The strongest version of each, then the honest response.
Objection 1 — This is variance bookkeeping, not new analysis
The pipeline arranges other people’s published estimates in a table and runs simple closed-form computations on top. It does not produce new heritability estimates, does not analyze raw data, and does not test causal mechanisms. Calling it “an empirical pipeline” overstates what is actually a literature-alignment exercise.
Steelman. True at the bookkeeping level. A real empirical pipeline would pull GWAS summary statistics, run LDSC against multiple traits, replicate Howe 2022’s within-sibship analysis on UK Biobank data, and compute fresh AM-LD partition estimates per trait. That requires individual-level genetic data we do not have access to and would not be appropriate to ship from a content site.
Response. Conceded as a scope restriction. The pipeline’s value is at the meta-level: it confronts the model’s predictions with the literature that already exists and surfaces what does and does not match. Three contributions are genuinely new even at this scale: (a) per-trait AM-partition predictions computed at the granularity of single traits with current Horwitz 2023 m-values, which Border 2022 / Yengo 2018 framed only at the single-trait level; (b) the equicorrelated-D vs. disattenuated-D bridge that locates the entire Hyde-vs-Del-Giudice gap quantitatively in the disattenuation correction; (c) the explicit reframing of H1 as “within-paper holds, cross-paper noisy” with the structural reason. None of these required new data analysis, but none were available in one place before.
Objection 2 — The CSV is too small to support strong claims
18 traits is a small panel. The headline-sounding patterns (e.g., “the AM partition holds across AM-strong traits”) rest on roughly six traits. A bigger panel might tell a different story.
Steelman. True for any single trait — the AM partition prediction for IQ adult lands at the upper end of the empirical range and could be wrong. For the multivariate-D module, only one panel (16PF Del Giudice) drives the pedagogical claim; the same algebra on a different instrument might give a smaller disattenuation gap.
Response. The headline patterns are robust within the curated traits and consistent with primary-literature meta-analyses (Polderman 17,804 traits, Border 132 pairs, Horwitz 22 traits + 133-trait UK Biobank scan). Adding another 50 traits would not change the qualitative result for H2 or H6 because those rest on consortium meta-analyses not single-CSV cells. The single-CSV results are calibration checks, not new estimation. Where the pipeline does need more data — H5 portability with 13 hand-curated rows — this is flagged explicitly as Objection 4 below.
Objection 3 — Border 2022 is a single high-profile paper with significant methodological pushback
Resting H6 on a single 2022 paper from one group is fragile. xAM as a confounder of psychiatric cross-disorder rg has been proposed by other authors (Howe 2024, Cai 2025 commentary) but Border’s specific R²=0.74 figure and the 5-generation-equilibrium assumption it depends on have been pushed back on. The “γ̂ averages 0.29” claim depends on a specific xAM dynamics model.
Steelman. Conceded. R²=0.74 may shrink under different equilibrium assumptions; γ̂ values for specific pairs may move under alternative AM models. The aggressive interpretation (“xAM accounts for ~30% of psychiatric rg”) is doing motivated work in the discourse and would benefit from independent replication by groups outside the Border / Keller cluster.
Response. The model’s H6 prediction does not depend on Border’s specific γ̂ values — it depends on the qualitative claim that cross-trait AM affects rg estimates non-trivially. That qualitative claim has independent support: Howe 2022’s within-sibship estimates of EA-BMI rg attenuate to near-zero, Yengo 2018 establishes within-trait AM-LD inflation for height, and the within-trait V(A_LD) prediction (H2) is tested independently from any cross-trait psychiatric finding. The data.mdx prose treats Border 2022 as suggestive about the magnitude rather than dispositive. This was strengthened in pass 2 — the γ̂ wording is now “consistent with xAM accounting for X%” rather than “X% caused by xAM.”
Objection 4 — H5 PGS portability is circular as a test
Pass 1 framed H5 as “replicating Ding 2023’s r = −0.95 on a curated 5-trait panel and getting r = −0.98.” That was circular: the curated rows were themselves rough approximations of Ding’s continuous-distance pattern, so the resulting slope was internal to the curation, not an independent test.
Response (pass 3 fix). The CSV was refactored to use named per-ancestry literature anchors instead — Martin 2019 across-trait averages (37%/50%/78% accuracy reduction in SAS/EAS/AFR vs. EUR), Okbay 2022 EA in AFR (relative R² ~10%), Yengo 2022 height in AFR (~20%), Trubetskoy 2022 SCZ in AFR (~30%). The pipeline still computes a Pearson r on this aggregated panel (now r = −0.99), but the prose now describes it honestly as “internally consistent literature-anchored trend, consistent with Ding 2023’s independent continuous-distance result,” not as a replication. The strong empirical claim — that PGS accuracy collapses across ancestry distance — rests on Ding 2023’s primary analysis, with Martin 2019 / Okbay 2022 / Yengo 2022 / Trubetskoy 2022 as independent corroboration on different cohorts and methods.
7. Connection to model cruxes
Three of the model’s five cruxes (§12) are partly tested by the pipeline:
- C1 (within-family GWAS unbiased) — relied upon throughout. Consistent with within-paper agreement across Howe 2022, Okbay 2022, Kong 2018.
- C2 (AM partition formula) — partly tested by H2; predictions match Border 2022 / Yengo 2018 within a few points across AM-strong traits.
- C5 (equicorrelated Σ as useful approximation) — partly tested by H4; equicorrelated undershoots disattenuated D by 2.6× for the 16PF panel. Crux holds pedagogically but not quantitatively at high n — same caveat the model already flags.
Cruxes C3 (hyperpolygenic architecture) and C4 (joint identifiability of A_d/A_i/A_LD) are not tested by the pipeline.
8. Connections to other work
To the model dashboard (/ai-research/human-psych-variation/model). The dashboard’s default parameters were set by the model formalization’s priors. Several should be updated from the data stage’s anchors: spousal correlations for cognitive (m=0.40 → keep, Horwitz IQ=0.44 confirms), personality (m=0.15 → keep, Horwitz neuroticism=0.11 close), psychopathology (m=0.20 → upward to 0.30 for SCZ specifically). The Wilson logistic parameters in the dashboard already match the data-stage fit (h²_∞=0.80 vs. fitted 0.81, t_50=9 exact, k=0.30 vs. 0.27); the tiny discrepancy can either drift the dashboard to the fitted values or note it explicitly.
To the planned parent-to-child transmission topic. The V(A_i) data here directly feeds that topic. Howe 2022’s within-sibship analysis is the canonical empirical anchor for indirect genetic effects across the seven traits the model singles out (height, EA, age at first birth, # children, cognitive ability, depressive symptoms, smoking). The Kong 2018 non-transmitted-PGS finding (29.9% of transmitted for EA) and the Okbay 2022 EA4 within-family attenuation (~50% of population PGI) are the two anchor numbers the parent-to-child topic should adopt as starting input.
To the planned evolution-modernity-mismatch topic. The Wilson curve fit here is the developmental-age analogue of generation-scale changes the mismatch topic will need to address. Pietschnig 2024’s finding that the positive manifold itself may be weakening across recent cohorts implies μ(t) is not a one-dimensional trajectory but a moving structure of which abilities are gaining or losing. The data stage’s logistic captures developmental motion within a single cohort; the mismatch topic will need to extend it to cross-cohort drift.
9. Stage-5 handoff
The Stage-5 build artifact should be a public-facing tool that:
- Lets a visitor pick a trait and see the per-trait variance decomposition (twin h², SNP h², WGS h², WF h², m, V(A_LD), V(A_i), and the relevant V(E_m) exposures) in a single panel.
- Surfaces the H1 mixed result honestly: within-paper Howe 2022 chart vs cross-paper alignment.
- Implements the Mahalanobis-D module with the disattenuation toggle so users can see the framing trap directly.
- Shows the environmental-effects table with prevalence-weighted variance-share estimates per population (this is the stage-5-specific extension — none of the existing tools do this).
- Cites a source for every number with a link to the relevant paper.
Inputs are at /data/human-psych-variation/. Stage 5 can either re-run pipeline.py at site-build time or freeze findings.json as a static asset.
10. Pipeline cruxes
The model stage’s §12 listed five load-bearing assumptions of the formalization. The pipeline has its own load-bearing assumptions — places where if the assumption fails, specific findings have to be rebuilt. Five matter most.
| Crux | Load-bearing claim | What would flip it |
|---|---|---|
| D1 | The published estimates I’m citing are correctly extracted from primary sources. ~12 of the highest-uncertainty values were web-verified directly from the cited paper or a PubMed Central mirror; the rest rest on training-time recall plus the cited paper’s existence. | A spot-check of the curated CSV against the supplementary tables of any individual paper finds a meaningful discrepancy (>1 SE on the cited estimate). Most of the H2/H3/H6 verdicts would shift correspondingly. |
| D2 | Twin h² is a usable proxy for h²_observed in the AM partition. The Crow–Felsenstein formula V(A_LD) = m·h² assumes h² is the AM-equilibrium quantity; twin h² is the closest readily-available estimate. | A demonstration that twin h² systematically over- or under-estimates the AM-equilibrium h² for the trait class (e.g., if classical ACE leakage from V(A_i) into A is consistently 5+ percentage points). The H2 partition shares would all shift by a similar fraction. |
| D3 | The equicorrelated approximation captures the qualitative multivariate-D framing trap. The pedagogical claim is “stacking weakly-correlated dimensions makes D grow with √n;” the quantitative claim at high-dimensional disattenuated panels is acknowledged not to hold. | A demonstration that real personality covariance matrices have block-structured Σ such that even the qualitative claim fails for the public-discourse-relevant case (16PF / Big Five). H4 would need a worked-example refit using a non-equicorrelated Σ. |
| D4 | Cross-paper alignment of estimators (twin/SNP/WGS/WF) is structurally noisy enough that within-paper tests are required for clean inference. This is the framing for H1’s “mixed” verdict. | A within-paper study that runs all four estimators on the same sample and finds the strict ordering fails. To my knowledge no such study exists; if one publishes and the ordering breaks, H1’s “mixed-but-informative” reading collapses to “wrong.” |
| D5 | Per-ancestry PGS-portability anchors from Martin 2019 / Okbay 2022 / Yengo 2022 / Trubetskoy 2022 are concordant with Ding 2023’s continuous-distance result. Without individual-level data we cannot compute the continuous-distance slope ourselves; we are taking concordance on faith. | A reanalysis of the cited papers’ public summary statistics that finds substantially different per-ancestry decay rates than the headline reports. H5’s “consistent with Ding 2023” framing would weaken to “qualitatively matches but quantitatively in dispute.” |
The most consequential is D1 — every other crux assumes the underlying CSV cells are correct. The web-verification round in pass 1 reduced this risk for the dozen highest-stakes numbers; the rest is a calibrated bet on training-time recall and would benefit from a future pass that audits each cell against its primary source.
A reader's tool for the psychology of individual differences. Pick a trait, see the three plain-language buckets (direct genes / family setup / environment + chance) instead of the V(A_d)/V(A_LD)/V(A_i) decomposition. Plus the four motivated-reasoning traps the field gets caught in, the asymmetric environmental-effects finding, three "heritability ≠ destiny" misreadings, and a seven-bullet take-away. Translates the formalization and data pipeline into something a non-specialist can actually use.
TLDR
The model formalization produced one equation per person and seven variance components. The data pipeline produced eight tested predictions and seven downloadable CSVs. Both are correct, both are useful for someone who already speaks the vocabulary, and neither does what the topic statement asked: produce something useful for someone who wants to understand how and why people differ without being captured by motivated reasoning from any direction.
This build is that translation layer. It collapses the seven-component variance decomposition into three plain-language buckets — direct genes, family setup, environment + chance — picks the ten traits a reader most likely cares about, and for each one shows the bucket breakdown, the key environmental levers (when relevant), and the two specific ways the most common political readings of that trait go wrong. Plus four secondary views: the four motivated-reasoning traps the field gets caught in (with what each side cites correctly and ignores), the asymmetric environmental-effects finding (severe insults cost 10–30 IQ points; enrichment above normal yields a few at most — the single most useful action-oriented insight), three “heritability ≠ destiny” misreadings with worked examples, and a seven-bullet take-away that holds up across mainstream behavior genetics in 2026.
If you want to engage with the math, the model stage has the parametric dashboard and the data stage has the prediction-by-prediction empirical tests. This page is for the reader who wants to come away knowing what to actually believe.
Pick a trait
Cognitive ability — adults
Why people differ in cognitive ability as adults is mostly genetic at the population level — but a sizeable chunk of what twin studies count as 'genetic' is actually the family setup parents create, not direct biological causation.
Why people differ — three buckets
The slice that's actually direct biological causation. What within-family designs (sibling-fixed-effect, MZ-discordant, parent-offspring trio GWAS) recover after stripping out parental environment and assortative-mating-induced linkage.
Most of this bucket is genetic nurture — parents who pass on cognitive-ability variants also create environments correlated with those variants (vocabulary, books, expectations, peer-group selection). Classical twin models cannot easily separate this from direct biological causation. Within-family GWAS for cognition recovers ~0.50, substantially below twin h² of 0.79; the gap is dominated by genetic-nurture leakage. About ~5% is residual shared family environment that persists into adulthood. Assortative mating (m=0.44 for IQ) does inflate population-level V(A) via LD but biases Falconer's twin formula downward, partially canceling rather than adding to the gap.
Most of this small bucket is unmeasured developmental noise. Identified large levers (severe deprivation, lead, fetal alcohol syndrome) account for almost no population variance in modern Western samples because their prevalence is now low.
Severe negative levers (when present)
- Prenatal alcohol (full FAS)−30 IQ ptsStreissguth 2004
- Severe deprivation (Romanian orphanages)−15 IQ ptsNelson 2007 BEIP
- Lead, blood 1→10 µg/dL−6.2 IQ ptsLanphear 2005
Positive levers
- Schooling, per year+1 to +5 IQ ptsRitchie & Tucker-Drob 2018
- Within-Western-normal parenting~0 to +1 IQ ptsPlomin & Daniels 1987
What environmentalist readings get wrong here
'Heritability is just methodological artifact' is not what the evidence shows. SNP-based heritability bypasses twin-design assumptions and recovers most of twin h²; adoption studies converge on similar numbers. The signal is real. But citing 0.79 as if it means 'genes determine 79% of cognitive ability' confuses a population-variance ratio with an individual partition. Both moves drop information.
What hereditarian readings get wrong here
Citing 0.79 to argue 'environment doesn't matter much for cognition' ignores that ~37% of the 'genetic' bucket disappears when you switch to within-family designs. The direct-biological component is closer to ~50%, and the gap to twin h² is dominated by genetic nurture and equal-environments-assumption violations rather than direct biological causation.
Take away
About half of why adults differ in cognitive ability is direct genetic effect; another ~35% is the family setup that genetically-similar parents create around their kids; ~15% is everything else. The interesting policy levers are at the tails (preventing severe insults like lead, malnutrition, fetal alcohol, and severe deprivation), not at the middle (parenting style within Western normal).
How to use this
The default view is trait lookup. Pick a trait — adult cognitive ability, schizophrenia, height, political orientation — and see the three-bucket breakdown plus the trait-specific traps and take-away. Most readers should start there, then move through the four secondary views in order.
A few framing notes:
The three buckets are not orthogonal categories of cause. They are three plain-language groupings of the seven model-formalization variance terms (A_d, A_LD, A_i, C, E_m, E_s, I). Direct genes is the within-family direct-effect slice — the part that is unambiguously direct biological causation. Family setup combines AM-induced LD, genetic nurture, and residual shared environment — all the things that get counted as “genetic” in twin studies but are not direct biological causation. Environment + chance combines measured non-shared environment and stochastic developmental noise. The split is pedagogical; the model shows the underlying seven-term decomposition.
The four-traps view is opinionated in a way the other views are not. The label “trap” assumes that motivated reasoning is what produces these positions, which is not entirely fair — most people cite the evidence they have seen and have not personally vetted what they have not seen. The integrated reading at the bottom of each trap card is the closest the artifact comes to a normative claim about how the field should be read, and it is not algorithmically derivable from the data alone. If you disagree with one of the integrated readings, the topology stage has the underlying graph.
The asymmetry finding is the most action-relevant single insight in the topic. If you only take one thing away from this work, take that one — the population-level cognitive levers run almost entirely through preventing severe insults, not through optimizing within normal. Most parental anxiety and policy expenditure on enrichment is misallocated relative to where the empirical effect sizes are.
The seven take-aways are calibrated to be the things a behavior-geneticist in 2026 would actually defend in a public talk. Finer-grained claims (specific magnitudes per trait, mechanism per finding, what polygenic scores measure causally) sit downstream of these and are more contested.
What this is not
It is not a prediction tool. There is no model that takes your demographics, your parents’ phenotypes, or your DNA and outputs a predicted trait value. The science does not currently support that for psychological traits, and the data stage shows why — polygenic scores trained on European-ancestry data lose 30–80% of their accuracy across other ancestries, and within-family direct effects are often less than half of population-level prediction.
It is not policy advice. The asymmetry finding has clear implications for cognitive intervention (lead remediation has higher effect-per-dollar than enrichment programs), but turning empirical asymmetries into policy involves trade-offs the science does not adjudicate.
It is not a complete picture. Three open questions named in the model stage (the Plomin/Turkheimer dispute about what polygenic scores measure, the mechanism behind the Gender Equality Paradox, the magnitude of assortative-mating correction across the full psychiatric cross-disorder rg matrix) are not answered here because the field has not answered them. The honest reading is “we don’t know yet”; the artifact does not pretend otherwise.
Connection to the rest of the pipeline
The trait-lookup numbers are computed directly from public/data/human-psych-variation/heritability_estimates.csv (the Stage-4 input), with the H2 partition (V(A_LD) = m·h²) and the genetic-nurture identity (V(A_i) = (β_i/β_d)² · V(A_d)) applied as in the model formalization §3.3 pass 5. The asymmetry view’s exposure list comes from environmental_effects.csv (the H7 input). The four-traps view materializes the topology stage’s Variant D distortion-to-target matrix (D1–D4) into reader-facing cards.
A future stretch would promote some of this to /dashboards/human-psych-variation as a public dashboard that lets the visitor enter their own per-trait estimates and see the buckets recompute. That is one of the planned dashboard slots in the site PRD but is out of scope for the first build.
Long-form synthesis of the whole pipeline. What the science actually says about how and why people psychologically differ — written for an educated lay reader, with acronyms defined and the public-discourse traps spelled out. About 4,500 words.
TLDR
Behavior genetics has now had about fifty years of twin studies, twenty years of genome-wide DNA work, and the past five years of within-family designs that strip the structural inflation out of older “genetic” estimates. The science has converged on a picture of why people psychologically differ — and almost nobody in public describes it accurately. The headline finding is that heritability is real, replicated, and substantial across most psychological traits — but a sizable fraction of what gets called “genetic” in twin studies is actually environmental in origin, mediated through parents who transmit both the alleles AND the correlated rearing environment (a phenomenon called genetic nurture). Direct biological causation is genuine and important; it’s also typically smaller than the headline numbers suggest, especially for socially-structured traits like educational attainment, where the cleanest estimate of direct genetic effect is about one-third of what classical twin studies report.
Two findings should change how a non-specialist thinks about this field. First, environmental effects are dramatically asymmetric: severe insults — lead exposure, fetal alcohol syndrome, severe deprivation, malnutrition — each cost ten to thirty IQ points; enrichment above the modern Western normal range yields a few points at most. The big policy and parenting levers are at the negative tail (preventing severe insults), not at the middle (optimizing within normal). Second, high heritability is fully compatible with large environmental change at the population level: average adult height has risen about ten centimeters in a century at a within-cohort heritability of ~0.85, and average IQ rose roughly 25–30 points across mid-20th-century cohorts in most measured populations (with plateaus and partial reversals in some countries from the 1990s onward) at a within-cohort heritability of ~0.80. “Heritable” does not mean “fixed.”
Public discourse on this field is captured by four motivated-reasoning patterns: the blank-slate environmentalism that dismisses heritability as methodological artifact, the hereditarianism that treats genetic effect as biology-as-destiny and licenses between-population inference, the gender-similarities framing that cites small per-dimension sex differences while ignoring large multivariate ones, and the pop-evolutionary-psychology overreach that treats dimensional differences as categorical. Each cites real evidence and ignores real evidence. The honest reading requires holding all of it at once. The actionable layer is then narrower than any of the four traps imply: protect against severe environmental insults; do not over-invest in within-normal optimization; expect heritable traits to be substantially heritable but not fixed; do not extrapolate within-population variance ratios to between-population mean inferences.
The field is not done. Three real open questions remain — what polygenic scores actually measure causally, the mechanism behind the Gender Equality Paradox, and the magnitude of assortative-mating contamination across the psychiatric cross-disorder correlation matrix — and this writeup says so where it should rather than papering over them. The companion explorer lets you pick any of two dozen traits and see the variance breakdown; the model and data stages have the math and the empirical tests.
1. Why this field is a minefield
The question “why do people psychologically differ from each other” is one of the most heat-attracting questions in the social sciences, for reasons that have nothing to do with the science and everything to do with what a clean answer would license. Each direction of motivated reasoning has something at stake. People with a blank-slate intuition fear that admitting heritable differences exist licenses fatalism, eugenics, or political programs they find abhorrent. People with a hereditarian intuition fear that denying heritable differences licenses bad social policy, distorts family-formation incentives, or papers over evidence they consider straightforwardly true. People with a gender-similarities intuition fear that framing sex differences as substantial licenses sexism. People with a pop-evolutionary-psychology intuition fear that minimizing sex differences abandons what they consider robust biological reality.
What complicates the conversation further is that the evidence base contains material that supports each of these positions in some form — and that’s not a contradiction, it’s the natural shape of the data. Heritability is real (good for hereditarians); a lot of “genetic” effect dissolves under within-family designs (good for blank-slaters); single-dimension sex differences are typically small (good for similarities-framers); aggregated multivariate sex differences are large (good for evolutionary-psychology framers). The mistake every direction makes is selective citation: cite the evidence that supports your reading, ignore the evidence that doesn’t. The integrated reading is harder to load-bear but it’s the only one that actually fits the data.
The pipeline behind this writeup — five earlier stages of progressively more rigorous analysis — converges on a picture that is more nuanced than any single-direction narrative but that is nonetheless reasonably definite. There are things the field knows. There are things the field does not know but is converging on. There are things the field does not currently have the methods to know, and we should say so when that’s the case.
2. The vocabulary
Heritability, written h², is the single most-misunderstood statistic in this field. It is the fraction of variance in a trait, across people in a population, that tracks genetic differences between those people. It is a population-level statistic, not an individual partition. Saying “IQ is 70% heritable” does not mean “70% of any one person’s IQ is genetic.” It means “across this population, 70% of why people differ in IQ tracks genetic differences.”
The cleanest way to internalize this distinction: imagine 100 plants of identical genotype, raised in identical pots. The heritability of their height in this population is 0%, because all the variation between them comes from environmental factors (sun angle, water, soil chemistry). But for any single plant, asking “how much of its height is genetic” is meaningless. The genotype set the type of plant; the environment did the growing; neither percentage applies. Heritability is about the spread, not about any individual value. This applies with equal force to cognitive ability, personality, height, or anything else.
Within-population heritability also says nothing about between-population mean differences. If you plant the same genetic mix of corn in fertile soil and depleted soil, the within-each-plot heritability of height can be high (variation tracks genetics within each soil), while the difference between plot means is entirely environmental (the soil). The within-plot heritability tells you nothing about why the plot means differ. This is the Lewontin firewall, named after the geneticist who first laid it out cleanly in 1970, and it is a logical/algebraic point — not an empirical claim that can be falsified.
A few more terms before we go further:
- Genome-wide association study (GWAS): a study that scans hundreds of thousands or millions of single-letter DNA variants — single nucleotide polymorphisms or SNPs — looking for statistical association with a measured trait.
- Polygenic score (PGS): a per-person sum of trait-associated SNPs, weighted by their estimated effect sizes from a GWAS. Used as a predictor.
- MZ and DZ twins: identical (monozygotic, ~100% shared DNA) and fraternal (dizygotic, ~50% shared DNA). Comparing how much more similar identical twins are than fraternal twins is the classical engine behind heritability estimates.
- Assortative mating (AM): the phenomenon where partners resemble each other on a trait above chance. Educational attainment shows the strongest non-attitudinal AM signal at a partner correlation of about 0.55 (Horwitz 2023); political orientation shows the strongest of any trait at 0.58.
- Gene-environment correlation (rGE): the phenomenon where genes and environments are not independent of each other. Passive rGE: parents transmit both genes and a correlated environment to offspring. Evocative rGE: heritable traits elicit certain reactions from others. Active rGE: people select environments matching their genetic propensities.
- Educational attainment (EA): years of schooling completed. Used in this field as a measurable proxy for life outcomes that involve cognitive and conscientiousness loadings.
- Within-family designs: comparing siblings, MZ-discordant twins, or parent-offspring trios within the same family. These control for between-family confounds — primarily the parental-environment effects mediated through shared parental genes (genetic nurture), plus assortative-mating-induced linkage at the population level — and produce the cleanest estimate of direct genetic effect.
With these in hand, the rest of the writeup should be readable.
3. Seven big ideas
Seven findings are robust enough that a careful reader should walk away believing them. The contested questions in this field sit at finer-grained resolutions; these seven are field-level consensus.
3.1 Heritability is real, replicated, and substantial
Across 17,804 traits measured in 14.5 million twin pairs across 2,748 publications (Polderman et al. 2015), the mean trait heritability is about 0.49. SNP-based heritability methods, which use unrelated individuals and bypass the assumptions twin studies make about twin environments being equally similar, recover a substantial fraction of twin-based heritability across major traits — about 60% for height with common SNPs alone (rising to ~80% when whole-genome sequencing captures rare variants), about 25–40% for cognitive ability, and about 30–50% for educational attainment. The fraction recovered is highest for traits with the simplest genetic architectures (height, BMI) and lowest for socially-structured traits where assortative mating and parental environments contribute substantially to twin estimates. Adoption studies — where children are reared by parents they share no genes with — recover heritability estimates broadly consistent with twin and SNP-based methods. Within-family GWAS, which compare siblings or trios and control for shared parental environment, find non-zero direct genetic effects across a range of traits including educational attainment, body mass index, height, and cognitive ability.
The “twin studies are bunk” position does not survive contact with the cumulative evidence. Heritability is real. The methodological critiques have force at the margin but cannot account for the convergence across designs.
3.2 But it’s a population statistic, not an individual partition
This was covered in section 2 but bears repeating because the failure to internalize it is the single most consequential public-discourse error about this field. “70% heritable” means “70% of why people differ in this population is genetic.” It does not mean “70% of any one person’s value is genetic.” Treating it as an individual partition produces nonsense in both directions: it overstates determinism for the hereditarian reading and overstates plasticity for the environmentalist reading. There is no individual percentage decomposition of “this person is X% genetic and Y% environmental.” That number does not exist.
3.3 Roughly 8% to 60%+ of “genetic” effect is structural inflation, depending on trait
This is the finding that most reshapes the picture once you know it, and it is the finding most absent from popular coverage. Twin studies measure resemblance between MZ and DZ twins and translate it into a heritability estimate using assumptions about how genes and environments combine. For socially-structured traits, this estimate substantially overstates the direct-biological-causation slice. The dominant reason is genetic nurture: parents who pass on certain alleles to their children also create environments correlated with those alleles — vocabulary, books, expectations, neighborhood choice, schooling. Classical twin models cannot easily separate this environmental contribution from direct genetic causation, because the genetic-nurture component is shared identically by MZ and DZ co-twins (they share parents) and tends to leak into the additive genetic variance estimate. Within-family designs strip it out by comparing siblings within the same family.
The empirical evidence is concrete and direct. Kong et al. 2018 (Science) compared the predictive power of parents’ transmitted polygenic scores (the alleles the offspring actually inherited) to parents’ non-transmitted polygenic scores (the alleles the offspring did not inherit but the parents still acted on environmentally) for educational attainment. The non-transmitted-allele effect was 29.9% of the transmitted effect — direct evidence that “genetic” prediction for socially-structured traits is partly mediated by parents’ environmental behaviors that correlate with their alleles. Okbay et al. 2022 EA4 (N=3M) showed the within-family direct effect for educational attainment is roughly half the population-level polygenic-score effect; the other half is environmental contamination via the home.
For educational attainment, the canonical twin-based heritability is ~0.40, while within-sibship heritability (Howe et al. 2022, the largest within-family study to date with 178,000 sibling pairs) is ~0.15. The 0.25 gap is dominated by genetic nurture, plus other classical-twin-design biases like the equal-environments assumption (MZ co-twins are treated more similarly than DZ co-twins, which inflates the MZ-DZ correlation gap that twin h² is computed from).
A second mechanism — assortative mating — is also real and worth understanding, but its effect on twin estimates is more counterintuitive than is sometimes claimed. People pair with similar partners (educational attainment shows the strongest non-attitudinal AM signal at m=0.55; political orientation shows the strongest of any trait at m=0.58), and this creates linkage between trait-relevant alleles in offspring (Yengo et al. 2018 estimates 14–23% inflation of population-level additive genetic variance for height). But the effect on Falconer’s classical twin formula 2(rMZ − rDZ) runs in the opposite direction from genetic nurture’s effect: under positive AM, fraternal twins share more than 50% of trait-relevant alleles (because their parents are more genetically similar than under random mating), which raises DZ correlation relative to MZ correlation and biases the formula downward. So while AM is a real source of LD inflation in the population’s V(A), it does not on net inflate the twin-vs-within-family gap — that gap is dominated by genetic nurture and EEA violations, and AM partially cancels rather than adds to them. (This is a subtle technical point that is genuinely confused in popular writing on the topic; the cross-trait variant of AM does inflate reported genetic correlations between disorders, which is the Border 2022 result, but that is about between-trait LD, not within-trait twin-h² estimates.)
Within-family designs are not assumption-free either — they assume siblings receive equally similar parental treatment and equally similar non-genetic exposures, which is approximately but not exactly true. But they remove the largest twin-design biases (the equal-environments assumption, genetic-nurture confounding, and AM-related complications) simultaneously, and across the published within-family studies the direct-effect estimates are mutually consistent across cohorts and methods. Treating within-family h² as the cleanest current estimate of direct biological causation is a defensible operational choice, not a perfect one.
The size of the twin-vs-within-family gap varies dramatically by trait. For height, where within-sibship heritability (0.78) is essentially as high as twin heritability (0.85), the structural-inflation share is small (~8%). For socially-structured traits like educational attainment, it’s large — the cleanest direct-biology estimate is about three-eighths of the twin-based number, meaning more than half of “genetic” effect on EA in twin studies is actually environmental in origin via genetic nurture. None of this means the underlying biology isn’t real — it means the headline numbers from older twin studies overstate the direct-causation slice for socially-structured traits, and the within-family literature is what made the correction possible.
3.4 Environmental effects are real and asymmetric, with insults dominating
Heritability findings and large environmental effects coexist without contradiction, and the way they coexist is dramatically asymmetric. The environmental effects on cognitive ability that have been measured most cleanly are these:
-
Severe insults: prenatal alcohol (full fetal alcohol syndrome, FAS) costs about 30 IQ points; severe deprivation in early childhood (the Romanian-orphanage cohort) costs about 15; severe chronic malnutrition costs about 15; adoption from a high-SES (socioeconomic status) family into a low-SES family costs about 12; severe iodine deficiency costs about 10; lead exposure (going from blood lead 1 to 10 µg/dL) costs about 6.
-
Within-normal enrichment: an additional year of schooling adds 1–5 IQ points (mean ≈ 3.4 in Ritchie & Tucker-Drob 2018’s meta-analysis of 600,000 participants); breastfeeding adds about 3 in the PROBIT randomized trial; parenting variation within the Western normal range adds roughly 0–1.
The asymmetry is the lesson. Removing severe insults recovers double-digit IQ points; enrichment above the Western normal range yields a few points at most. This is why the high-heritability findings of behavior genetics and the existence of large environmental effects are not contradictory: heritability is a population-variance statistic, and in any modern population that has already removed the worst environmental tails, most remaining variance is genetic — not because environment doesn’t matter, but because you already removed the environmental factors that mattered most. The variance contribution of fetal alcohol syndrome to a Norwegian sample’s cognitive variance is small not because FAS doesn’t matter for the affected child (it matters by 30 points) but because almost no Norwegian children have it.
For policy this means the highest-effect-per-dollar interventions are at the negative tail: lead remediation, iodine fortification, fetal-alcohol prevention, basic nutrition, schooling access. For parents this means anxiety about “optimizing” within normal is mostly misallocated: the big lever is preventing severe insults, not perfecting parenting style. The explorer’s “Asymmetry” view renders the full exposure list as a single forest plot sorted by effect size, with implications broken out for parents and policy.
3.5 Heritability is developmental, not static — the Wilson Effect
The cognitive-ability heritability number cited in popular coverage — “IQ is 70-80% heritable” — is the adult number. In children, heritability is much lower. Heritability of cognitive ability rises along a smooth logistic curve from about 0.20 at age five to about 0.80 in adulthood, an empirical pattern called the Wilson Effect after the developmental psychologist who first described it. Bouchard 2013 fit this curve to seven anchor ages and recovered the parameters cleanly: heritability is about 0.20 at age 5, 0.46 at age 10, 0.69 at age 15, and 0.79 at age 25.
The mechanism is not that genetic effects “turn on” with age. It is that shared family environment dominates in childhood and gets crowded out as children gain agency over their own environments. A small child’s reading material, schooling, and peer group are mostly chosen for them by their parents. A teenager’s are mostly chosen by themselves — and the choices they make track their genetic propensities, amplifying the apparent genetic signal (a phenomenon called active gene-environment correlation). The same genome that produces ~20% heritability at age five produces ~80% heritability at age twenty-five not because the genes have done more, but because the environment has shifted from imposed to self-selected.
The implication is that childhood is environmentally most malleable. The same environmental shift produces a much larger effect on a five-year-old than on a twenty-five-year-old, because the child has not yet shifted into self-selected environment mode. Severe environmental insults landing during developmental windows (lead poisoning at age 2, severe deprivation at age 4) leave permanent marks; the same insults landing on adults are smaller in effect. Conversely, “remediation” interventions that work well on children frequently fail on adults because the developmental window has closed. The asymmetric environmental-effects finding from the previous section is largest in early childhood and shrinks across the life course. (Compare child vs. adult cognitive ability in the explorer to see the bucket shift in concrete numbers.)
3.6 High heritability is fully compatible with large environmental shifts
The Wilson Effect is the within-life-course version of a more general truth: heritability is context-dependent. The same shape shows up across cohorts.
The cleanest demonstration is height. Within any modern Western country, about 85% of why adults differ in height tracks genetic differences. Average adult height has risen about ten centimeters in a century — entirely from environmental change (nutrition, infection control, prenatal care). The same heritability that “shows height is genetic” coexists with one of the largest environmental shifts in any biological trait. The within-cohort heritability and the between-cohort secular rise are not in conflict; they answer different questions.
The same logic applies to cognitive ability. The Flynn Effect raised average measured IQ by roughly 25–30 points across mid-20th-century cohorts in most measured populations (Pietschnig & Voracek 2015 meta-analysis: ~2.3 IQ points per decade across 105 samples), in populations whose within-cohort IQ heritability remained in the 0.7–0.8 range. The pattern has slowed and partially reversed in some countries from the 1990s onward, the cause of which is itself an open question — but the same-genes-different-environment-different-mean pattern is the lesson. Smoking shows the same pattern: heritability of smoking initiation is about 0.50 within any modern cohort, and US adult smoking prevalence fell from about 42% in 1965 to about 12% today — a roughly 70% reduction over sixty years from taxation, public-smoking restrictions, and shifting norms. Heritable does not mean fixed. This is one of the most important things to internalize about this field, and one of the things most consistently mishandled in public coverage.
3.7 Within-population heritability does not license between-population claims
This is the Lewontin firewall, and it is unfalsifiable — a logical/algebraic point, not an empirical claim. Within-population heritability provides no information, by itself, about whether between-population mean differences have a genetic component. The math literally does not connect the two quantities.
The empirical buttress to the logical point is that polygenic scores — the molecular-genetics tool that would in principle let researchers ask the between-population question — lose accuracy when applied across ancestries, and the loss is substantial. Martin et al. 2019 reports relative-accuracy reductions of 37% in South Asian, 50% in East Asian, and 78% in African ancestries compared to European training, averaged across major traits. Ding et al. 2023 (Nature, 84 traits, 524,000 individuals) extended this finding to a continuous distance scale and found a Pearson correlation of −0.95 between genetic distance from the European-ancestry training population and PGS prediction accuracy. The same SNP “effect sizes” do not estimate the same causal coefficients across populations. The methods that would license a between-population genetic comparison demonstrably do not work across populations as currently constructed.
The honest position on between-population mean differences: in 2026, the science is not currently equipped to answer the question in either direction. People who claim it has been answered, in either direction, are over-claiming relative to what the methods can do.
4. The four motivated-reasoning traps
The pipeline’s topology stage maps four directions of public-discourse motivated reasoning explicitly. Each cites real evidence; each ignores real evidence; each can be steel-manned into a more defensible position that mostly aligns with the integrated reading the science actually supports.
The blank-slate / pure-environmentalist position claims that psychological differences are mostly socialization, that twin studies are flawed, and that heritability is a methodological artifact. Cited correctly: the equal-environments assumption in twin studies is partially violated, adoption studies have selection effects, cultural variation in trait expression is real, stereotype threat exists. Ignored: SNP-based heritability bypasses the twin-design assumptions and recovers most of twin h² across major traits; adoption studies converge on similar estimates; within-family GWAS finds non-zero direct genetic effects; severe psychiatric conditions show heritability of 0.79–0.80 across cultures. The integrated reading: the methodological critiques have force at the margin but cannot account for the convergence across designs. The honest version of this position survives: “population-level genetic variance ratios are real, but they don’t license the moves people make from them — individual partition, between-population inference, fixed-trait reading.” That’s true, and is exactly what the science says when stated carefully.
The hereditarian position claims that differences are mostly genetic, that group disparities reflect underlying biology, and that environment is overrated. Cited correctly: mean trait heritability is 0.49 across 17,804 traits, twin studies replicate, GWAS hits replicate, within-family designs find non-zero direct effects. Ignored: 30–60% of “genetic” effect for socially-structured traits is structural inflation rather than direct biology (with educational attainment specifically over 60%); PGS portability collapse blocks between-population inference empirically; the Lewontin firewall blocks it logically; high heritability coexists with large environmental shifts (height +10 cm, IQ +25-30 points across mid-20th-century cohorts); severe environmental insults each cost double-digit IQ points; cross-trait assortative mating accounts for ~74% of variance in reported psychiatric cross-disorder genetic correlations (Border 2022). The integrated reading: heritability is real and substantial, the within-population claim survives, but the move to “between-population means are genetic” is blocked twice (logically and empirically), and the move to “fixed at individual level” is blocked by the asymmetric environmental-effects finding and the Wilson Effect (heritability is developmental). The honest version: “within-population genetic variance is real and substantial, period.” Which is true.
The gender-similarities (single-dimension) framing claims that sex differences are tiny, citing math performance d ≈ 0.05 and similar small per-dimension effects. Cited correctly: math, verbal, and many specific cognitive-task differences are small; Hyde 2005’s similarities hypothesis is empirically supported for most single dimensions. Ignored: the people-things interest difference is d ≈ 0.93, one of the largest effect sizes in psychology (Su 2009, N = 503,000); aggregated across 15 personality dimensions with realistic inter-trait correlations, the multivariate Mahalanobis distance between male and female means is D ≈ 1.0 at the observed level and D ≈ 2.7 at the latent (measurement-error-corrected) level — large by any standard; the Gender Equality Paradox (Herlitz 2025 systematic review) finds differences are larger in more egalitarian societies, which is hard to reconcile with pure-socialization predictions. The integrated reading: both Hyde 2005 and the multivariate-D literature are correct about different objects. On any single dimension, sex differences are small. Aggregated across many weakly-correlated dimensions, the multivariate distance is large. Both halves are true; the trap from each side picks one and ignores the other.
The pop-evolutionary-psychology overreach claims that “men are X, women are Y,” that differences are categorical and evolved, and that they predict at the individual level. Cited correctly: multivariate D ≈ 2.7, people-things d ≈ 0.93, cross-cultural replication of mean differences, biological-developmental data (girls with congenital adrenal hyperplasia show masculinized toy preferences). Ignored: psychological variation is dimensional, not taxonic — there are no two clean categories; distribution overlap at D = 1.0 is ~60%, at D = 2.7 still ~18% — “categorical” is the wrong shape; effect-size labels are scale-dependent; Mahalanobis D is a model-relative summary statistic that depends on which traits are measured. The integrated reading: aggregate sex differences are real and large, but “categorical” misrepresents the shape, individual prediction from group membership is poor, and the headline D depends on the measurement panel. The honest version: “aggregate multivariate sex differences are substantial, individual prediction from sex alone is weak.” Which is true, and which undermines the categorical reading.
The lesson across all four traps: they each work by selective citation. The integrated picture requires holding all of it at once — large heritability and large structural inflation, small per-dimension sex differences and large multivariate ones, high within-population heritability and a logical block on between-population inference. Any single-direction narrative is structurally incomplete. The explorer’s “Four traps” view has the full cited / ignored / integrated breakdown for each direction with trait-specific applications.
5. What’s still open
The field is not done. Three real open questions remain, and the writeup is more honest if it names them than if it papers over them.
What polygenic scores actually measure causally. The Plomin / Turkheimer dispute. Plomin’s reading: a within-family-validated polygenic score is a real biological cause. Turkheimer’s reading: even a within-family PGS is a summary of correlated environments and biological factors that the design can’t fully separate. Both readings predict the same variance budget, which is why the data hasn’t yet decided between them. The decisive test would be a within-family experiment that perturbs the environment and watches whether the PGS coefficient moves the way Plomin predicts (it shouldn’t) or the way Turkheimer predicts (it should). No such study has been run at scale. Until one is, this question is open.
The mechanism behind the Gender Equality Paradox. The empirical pattern — sex differences in personality, interests, and several other domains being larger in more gender-egalitarian societies — has strengthened across multiple replications (Herlitz et al. 2025 systematic review). Three live mechanism candidates: (a) innate-expression release in resource-rich environments (the “constraints removed” reading), (b) reference-group / self-anchoring artifacts in self-report measurement (people compare to their gender peers, not to humans-in-general, more in egalitarian societies), (c) wealth and freedom confounds that correlate with gender-equality indices. The pattern is robust; the mechanism is not.
The full assortative-mating-corrected psychiatric cross-disorder correlation matrix. Border et al. 2022 (Science) showed that cross-trait assortative mating accounts for about 74% of variance in reported psychiatric cross-disorder genetic correlations across 132 trait pairs in UK Biobank. Applied at scale to the full Psychiatric Genomics Consortium cross-disorder matrix, the corrected correlations would likely shrink, and some “shared underlying biology” claims about the p-factor and cross-disorder pleiotropy would weaken. As of late 2024 a method (LAVA-Knock; Ma, Wang, Border et al.) has emerged that systematically corrects for this. The full re-analysis is active research and likely answerable in 2-3 years.
A handful of other questions sit in the same “framable but not yet answerable” category — what “non-shared environment” actually is at the mechanism level, the cause of the Flynn Effect’s recent reversal in some cohorts, whether the positive manifold of cognitive ability is itself shifting across cohorts (Pietschnig 2024). The pipeline’s lit review and topology cover them in more detail.
6. What this means for action
The most action-relevant single insight in the topic is the asymmetry of environmental effects. Most parents and most policy operate as if the asymmetry runs the other way — as if optimizing within normal is where the leverage is. The data says it isn’t.
For parents. The big levers are at the negative tail. Prevent severe insults: lead exposure (still meaningfully present in some housing stock), prenatal alcohol (fetal alcohol exposure produces effects of about 30 IQ points), severe early malnutrition, untreated iodine deficiency, severe deprivation. Within the Western normal range, additional optimization of parenting style, enrichment activities, and educational supplements yields a few IQ points at most. The empirical literature finds the within-family contribution of “what parents do” to adult personality is essentially zero, and the within-family contribution to adult cognitive ability is small relative to direct biology and to schooling itself. Anxiety about “optimizing” within normal is mostly misallocated. This is not a license to be neglectful — neglect is itself a severe insult — but it is a license to relax about whether one is doing exactly the right enrichment activity. The big things are protecting against severe insults and ensuring schooling. (See the explorer’s child cognitive ability trait for the variance breakdown that supports this — at age 5 the family bucket is ~52% of variance and shrinks to ~34% by adulthood, while the actionable environmental tail concentrates in severe insults.)
For policy. Lead remediation, iodine fortification, fetal-alcohol prevention, basic nutrition, schooling access are the highest-effect-per-dollar cognitive interventions ever measured. Universal pre-K and similar middle-of-the-distribution interventions show genuine but smaller effects. Programs targeted at “enrichment above normal” generally do not move long-term outcomes at meaningful effect sizes. Public-health interventions on smoking show the analogue at the behavioral level: tobacco taxation, public-smoking bans, age-of-first-availability laws cut US adult smoking prevalence by ~70% over sixty years despite a within-cohort heritability of 0.50 — environmental change at population scale is not blocked by within-cohort heritability.
For individuals. The within-individual story splits cleanly along trait-class lines. Traits with moderate heritability and large environmental + chance contribution — depression, anxiety, neuroticism-related affect, self-control, subjective wellbeing — move at clinically meaningful effect sizes under behavioral or pharmacological intervention. CBT moves anxiety and depression at d ≈ 0.7 vs. control. Mindfulness, exercise, and behavioral activation move neuroticism-related outcomes modestly. Social connection, meaning, and physical activity move wellbeing baselines persistently. (See the anxiety, depression, and subjective wellbeing trait pages for the breakdowns.) Traits with high direct-genetic heritability and small environmental + chance contribution — adult cognitive ability, height, schizophrenia, autism — show much smaller within-individual responsiveness to intervention once the developmental window has closed. Cognitive ability post-adolescence does not move much from intervention; height post-adolescence doesn’t move at all; schizophrenia and autism are responsive to treatment in symptom management but not in underlying load. The “biology is destiny” framing is wrong (you have substantial behavioral leverage on the moderate-heritability traits); the “I can rewrite myself with willpower” framing also exceeds what the literature supports (the high-heritability traits don’t move much). The honest middle is trait-specific: know which side of this split your trait of interest sits on before deciding how much effort to invest.
7. Closing
The science of psychological variation is in better shape than its public discourse. Within the field, behavior geneticists, social-genomics researchers, and developmental psychologists have substantially converged on the picture this writeup describes. Outside the field, almost every direction of motivated reasoning continues to cite the slice of evidence it likes and ignore the rest. The earlier stages of the pipeline — the lit review, the topology, the model formalization, the data pipeline, and the interactive explorer — carry the technical detail behind every claim above.
What I would most want a reader to walk away with: a calibrated humility about what is known, a clean separation between what the science says and what motivated reasoning loads onto it, and the asymmetry finding. If the choice is between “I leave knowing the field is full of contested empirical claims” and “I leave knowing severe environmental insults are the big lever and within-normal optimization is mostly noise,” the second is more useful. The data supports both.