Psychology of Individual Differences

What the science actually says about psychological variation — heritability, environmental shaping, gene-environment interaction, sex differences, cognitive capacity. A minefield of motivated reasoning where the actual generating functions are obscured by politics.

The first topic taken end-to-end through the LLM Iterate pipeline. Six stages, completed: lit review (the landscape), topology (dependency graph of what depends on what), model (variance decomposition + closed-form pieces + interactive dashboard), data (eight predictions tested against published consortium estimates), build (interactive explorer with 24 traits), writeup (long-form synthesis for an educated lay reader).

The headline finding is that heritability is real, replicated, and substantial across most psychological traits — but a sizable fraction of what gets called “genetic” in twin studies is actually environmental in origin, mediated through parents who transmit both the alleles AND the correlated rearing environment (a phenomenon called genetic nurture). Direct biological causation is genuine and important; it’s also typically smaller than the headline numbers suggest, especially for socially-structured traits like educational attainment, where the cleanest estimate of direct genetic effect is about one-third of what classical twin studies report.

If you want the full synthesis in prose, read the writeup — it’s the canonical end-to-end piece, written for an educated lay reader with all acronyms defined. If you want a hands-on tool, the explorer lets you pick any of two dozen traits and see the variance breakdown in three plain-language buckets (direct genes / family setup / environment + chance), plus the four motivated-reasoning traps the field gets caught in and the asymmetric environmental-effects finding. The model and data stages have the formal math and the empirical tests behind those numbers; the topology and lit review document the dependency structure and the underlying literature.

What the science actually shows about psychological variation — heritability, environmental shaping, gene-environment interaction, sex differences, cognitive capacity. The post-2010 genomic era confirmed mid-20th-century behavior genetics while demolishing the candidate-gene paradigm; assortative mating and genetic nurture are actively rewriting older interpretations.

TLDR

Virtually every measured psychological trait is moderately to substantially heritable, hyper-polygenic, and shaped by environments that are themselves partly genetic in origin. The post-2010 genomic era confirmed the core findings of mid-20th-century behavior genetics while simultaneously demolishing the candidate-gene paradigm that dominated psychiatry from 1996–2010. Twin heritability for psychological traits averages ~49% (Polderman et al. 2015); molecular GWAS increasingly accounts for this through thousands of tiny-effect common variants plus rarer large-effect variants in neurodevelopmental conditions.

A crucial methodological development since 2018 is the recognition that assortative mating and gene-environment correlation systematically inflate GWAS-derived estimates. Border et al. (2022, Science) showed that cross-trait assortative mating alone can account for substantial fractions of reported genetic correlations — including some psychiatric cross-disorder correlations previously attributed to shared biology. Kong et al.’s (2018) “genetic nurture” finding demonstrated that roughly half of population-level polygenic score prediction for educational attainment reflects environmentally-mediated parental effects, not direct genetic causation. These corrections don’t eliminate genetic influence — they reframe it.

The field’s most contested findings are not the ones most disputed in public discourse: heritability is settled science, the “parenting wars” are largely resolved, and the candidate-gene-by-environment literature has collapsed. What remains genuinely open is mechanistic — how genes build minds, why the gender equality paradox exists, what drives the Flynn Effect’s reversal, and whether between-population mean differences have any genetic component (a question currently unanswerable with available methods, not “settled” in either direction). The generating function for psychological variation is not “genes vs. environment” but a tightly coupled developmental system in which genetic predispositions, environments created by genetically-similar parents, assortative mating patterns, stochastic noise, and cultural context are deeply entangled.

This document is structured for someone building a formal model of psychological variation. Each section flags effect sizes, replication status, consensus, live debate, and ideological distortion from any direction.

1. Heritability: The Foundation Finding

The Polderman meta-analysis

Polderman et al. (2015, Nature Genetics) meta-analyzed 50 years of twin research — 17,804 traits, 14.5 million twin pairs, 2,748 publications — and reported a mean heritability across all human traits of 49%. Polderman et al., 2015. For ~69% of traits, simple additive ACE models fit cleanly.

Turkheimer’s Laws and the Fourth Law

Turkheimer’s Three Laws (2000) — all human behavioral traits are heritable; shared family environment is smaller than genes; substantial variance is explained by neither — were extended by Chabris et al. (2015) with the Fourth Law: a typical behavioral trait is associated with very many genetic variants of tiny effect. This emerged from the failure of candidate-gene studies and the polygenic architecture revealed by GWAS.

What heritability actually means (and doesn’t)

Heritability is a population statistic, not an individual one. Saying IQ is 70% heritable does not mean 70% of any person’s IQ comes from genes. It is not deterministic (height is ~80% heritable yet rose ~10cm in 20th-century Europe through nutrition) and not immutable (h² changes with environment — if all environments became identical, h² would approach 1.0). The most common misinterpretation collapses statistical variance partitioning into causal mechanism.

Twin and molecular estimates by domain

Domain	Twin h²	SNP-h²	Largest GWAS	Loci	Best PGS R²
Adult IQ / g	0.70–0.80	~0.20	269,867 (Savage 2018)	205	~0.05
Educational attainment	~0.40	~0.13	3M (Okbay 2022)	3,952	0.12–0.16
Big Five (avg)	0.40–0.60	0.05–0.18	449k (Nagel 2018)	136 (N)	<0.05
Political orientation	~0.40	—	—	—	—
Religiosity	0.30–0.45	—	—	—	—
Risk tolerance	~0.30	0.05	1M (Karlsson Linnér)	99	<0.02
Schizophrenia	0.60–0.80	0.24	320k (Trubetskoy 2022)	287	0.07–0.10
Bipolar	0.70–0.85	0.18–0.20	414k (Mullins 2021)	64	0.04
MDD	0.35–0.40	0.09	807k (Howard 2019)	102	0.02–0.03
ADHD	0.74	0.14	225k (Demontis 2023)	27	0.04–0.06
Autism	0.80	0.12	46k (Grove 2019)	5	<0.03

Note: Political orientation and religiosity are included because they are among the few adult traits where shared family environment (C) remains substantial (~20–30%), unlike personality and cognition where C ≈ 0 by adulthood. See Alford, Funk & Hibbing (2005); Hatemi et al. (2014).

The Wilson Effect

Bouchard (2013) documented that IQ heritability rises with age — from ~20% at age 5 to ~80% by adulthood, with shared-environment effects dropping from ~55% in early childhood to roughly zero by adolescence. Bouchard 2013. Briley & Tucker-Drob (2013) explained the mechanism: early genetic effects are amplified across development through gene-environment correlation (niche-picking). Briley & Tucker-Drob 2013. This finding is robust, replicated, and counterintuitive — genetic differences become more expressed as people age into self-selected environments.

The “missing heritability” problem

The gap between twin h² (~0.70 for IQ) and SNP-h² (~0.20) launched a decade of debate. Wainschtein et al. (2022, Nature Genetics) essentially closed it for height: using whole-genome sequencing in 25,465 unrelated individuals, h² recovered to 0.68 when rare and low-LD variants were included. Wainschtein et al. 2022. For psychological traits the same pattern is emerging. The current synthesis: missing heritability is partly real (rare variants, dominance, GxE) and partly artifactual (twin overestimation from assortative mating and rGE, measurement noise).

Assortative mating: a pervasive inflation source

Assortative mating (AM) — the tendency for partners to resemble each other on traits — has emerged as a major methodological concern. People mate assortatively on education (spousal r ≈ 0.40–0.60), IQ (~0.40), personality (~0.10–0.20), height (~0.20), and psychiatric conditions. AM has three consequences for genetic estimates:

Inflated heritability: AM increases additive genetic variance across generations by creating linkage disequilibrium among causal variants. Most twin studies underestimate heritability by ignoring AM (counterintuitively); GWAS-based SNP-h² may be inflated by AM-induced LD. Border et al. 2022, Nat Commun.
Inflated genetic correlations: Border et al. (2022, Science) introduced cross-trait assortative mating (xAM) and showed that phenotypic cross-mate correlations explain R² = 74% of the variance in reported genetic correlation estimates. Some psychiatric cross-disorder genetic correlations — previously interpreted as evidence of shared biology — may be largely or entirely attributable to xAM. Border et al. 2022.
Inflated PGS prediction: Within-family PGS effects are roughly half of population-level effects for educational attainment (Okbay 2022), partly because AM and population stratification inflate between-family comparisons.

Plomin (2022, Behav Genet) argues this is a prediction-vs-explanation distinction: AM inflates causal genetic estimates but doesn’t invalidate PGS as predictors, since AM-induced variance is real population variance. Plomin 2022. This is technically correct but sidesteps the question of why PGS predict — whether through direct genetic causation or through correlated environments created by assortatively-mating parents.

The genetic nurture revolution

Kong et al. (2018, Science) used 21,637 Icelandic probands with parental genotypes to compute polygenic scores from non-transmitted parental alleles. The non-transmitted PGS predicted offspring educational attainment at ~30% the magnitude of transmitted PGS — meaning parental genotypes shape children via environments they create, even for alleles never inherited. Kong et al. 2018. Okbay et al. (2022, EA4, Nature Genetics) confirmed this in 3 million people: within-family direct effects are roughly half the population-level PGS magnitude. Okbay et al. 2022. The implication: GWAS effect sizes for socially-valued traits are inflated by indirect/dynastic effects, and roughly half of what we used to call “genetic transmission” is actually environmentally mediated by genetically-similar parents.

2. Environmental Shaping: Real, But Smaller and Weirder Than Common Sense Suggests

The shared-vs-non-shared distinction is the single most disorienting finding for laypeople. Across hundreds of twin and adoption studies, the shared family environment (C) accounts for ~0% of variance in adult personality and most adult cognition (Plomin & Daniels 1987; Bouchard & McGue 2003). Parental warmth, parenting style, dinner conversations, books in the home — once genetic transmission is controlled, almost none of this leaves a measurable trace on adult personality. Important exceptions where C remains substantial: educational attainment (~20%), antisocial behavior, religiosity (~25%), political orientation (~20–30%), and childhood (but not adult) externalizing.

What “non-shared environment” actually is

Turkheimer & Waldron’s (2000) meta-analysis of measured non-shared environmental predictors found these accounted for only ~2% of variance in outcomes. Plomin’s recent verdict: non-shared environment is “real but largely random,” more akin to stochastic developmental noise — differential peer experiences, illness, idiosyncratic events, measurement error — than systematic experience.

The Equal Environments Assumption

Critics (Joseph, Charney; Fosse et al. 2015) note that MZ twins are treated more similarly than DZ twins. The central empirical defense: Kendler et al. (1993) showed misperceived-zygosity twins had phenotypic similarity tracking true zygosity. Kendler et al. 1993. MZ-reared-apart correlations (Bouchard’s Minnesota study) closely match MZ-reared-together correlations. Felson (2014) reanalysis: EEA is “not strictly valid, but bias is modest.” Modern SNP-based heritability estimates entirely bypass EEA and give somewhat lower but still substantial h². Verdict: EEA is approximately valid; bias is modest (~10–20% inflation) for most traits.

Environmental factors with robust causal effects on cognition

A small number of environmental insults have large, replicated, causal effects — typically asymmetric (removing severe deficits matters more than enrichment above normal):

Lead exposure: Lanphear et al. (2005) pooled 7 prospective cohorts: blood lead 1→10 µg/dL → −6.2 IQ points, with steeper slope at low concentrations. Causal status: strong.
Severe iodine deficiency: 8–12 IQ point cost; supplementation recovers ~8.7 points (Bougma 2013; Qian 2005). RCT-supported, globally replicated.
Heavy prenatal alcohol: FAS produces mean IQ ~70; Mendelian-randomization confirms causation.
Schooling: Ritchie & Tucker-Drob (2018) meta-analyzed 142 effect sizes / 600,000 participants across three quasi-experimental designs. Each year of education raises IQ by 1–5 points (mean ~3.4), persisting into old age. The most consistent durable IQ-raising intervention identified. Ritchie & Tucker-Drob 2018.
Air pollution (PM2.5): ~−0.27 IQ points per 1 µg/m³ (Aghaei 2024); smaller per-unit than lead but exposure is widespread.

The Scarr-Rowe interaction (SES × heritability)

Turkheimer et al. (2003) reported that in impoverished families, IQ heritability was ~10% with shared environment ~60%; in affluent families this reversed. Turkheimer et al. 2003. Tucker-Drob & Bates (2016) meta-analysis: replicated in U.S. samples but absent in Western European/Australian samples — likely because more universal healthcare/education reduces environmental variance at the bottom. Status: real but context-dependent and U.S.-specific.

Parenting effects: the Harris correction, partially reversed

Judith Rich Harris (1995; The Nurture Assumption 1998) argued that within-normal-range parenting has minimal long-term effects on adult personality. The empirical core was correct: C ≈ 0 for adult personality. But Harris overstated her case. Korean-American adoption studies (Sacerdote 2007; Beauchamp et al. 2023) show real but modest causal effects of family environment on educational attainment, BMI, drinking, smoking — transmission coefficients ~25% of biological-family magnitude. Severe deprivation/abuse causes clear damage. The accurate position: within the normal Western range, parenting style has small effects on adult personality; family environment has measurable but modest effects on attainment outcomes; severe parenting variation matters substantially.

Neighborhood and peer effects

Chetty & Hendren (2018, QJE) used 5+ million U.S. cross-county movers with sibling fixed effects: each year of childhood exposure to a 1-SD better county raises adult income by ~0.5–0.7%. Chetty & Hendren 2018. Moving to Opportunity reanalysis (Chetty, Hendren & Katz 2016): children moving before age 13 had adult earnings 31% higher than controls. Place matters, but accumulates slowly across many years of exposure.

The Flynn Effect and its reversal

Flynn (1984, 1987) documented ~3 IQ points/decade gains across the 20th century. Causes contested: nutrition, schooling, infectious-disease reduction, test sophistication, smaller families — no single mechanism established. Bratsberg & Rogeberg (2018, PNAS) used Norwegian within-family conscript data to demonstrate that both the Flynn Effect and its post-1990s reversal are environmentally driven (visible within sibships, ruling out dysgenic/compositional explanations). Bratsberg & Rogeberg 2018. Similar declines now reported in Denmark, Finland, the Netherlands, France, the UK, and Germany. The reversal’s cause is unknown — this is one of the field’s most important open questions.

3. Gene-Environment Interplay: rGE Wins, Candidate-GxE Collapsed

Three types of gene-environment correlation

The Plomin/DeFries/Loehlin (1977) framework distinguishes passive rGE (parents transmit both genes and correlated rearing environment), evocative rGE (heritable child traits elicit specific responses), and active rGE / niche-picking (individuals select environments matching genetic propensities). Kendler & Baker’s (2007) systematic review shows essentially every measured environment is itself heritable (15–35%) — meaning observational claims like “parental warmth causes child outcomes” are confounded by passive rGE. The genetic nurture and within-family PGS results (Section 1) quantify this: population-level “genetic” prediction is roughly half indirect environmental effects of genetically-similar parents.

The candidate gene × environment collapse

Caspi et al. (2003, Science) reported that 5-HTTLPR short-allele carriers showed elevated depression risk under stress. The paper became one of the most cited in psychiatry (>9000 citations). It collapsed:

Risch et al. (2009, JAMA): meta-analysis of 14 studies, N=14,250 — no evidence. Risch et al. 2009.
Culverhouse et al. (2018): pre-registered collaborative meta-analysis, 31 datasets, N=38,802 — definitively no evidence.
Border et al. (2019, Am J Psychiatry): examined 18 most-studied depression candidate genes in N up to 443,264. No clear evidence for any candidate gene polymorphism on depression. As a set, candidate genes were no more associated with depression than non-candidate genes. Border et al. 2019.
Duncan & Keller (2011): 96% of novel candidate-GxE studies were significant; only 27% of replication attempts were. Duncan & Keller 2011.

MAOA × maltreatment (Caspi et al. 2002) is the partial exception that survived meta-analysis (Byrd & Manuck 2014) — modest male-specific interaction, but smaller than originally reported.

Differential susceptibility / orchid-dandelion

Belsky & Pluess (2009) reframed “risk alleles” as “plasticity alleles” — some individuals are more reactive to environments “for better and for worse.” Belsky & Pluess 2009. The theory is generative; the empirical record is mixed. Recent systematic reviews find that interactions between child characteristics and parenting rarely replicate across cohorts and developmental domains. Distinguishing differential susceptibility from diathesis-stress requires very large, preregistered samples. de Villiers et al. 2018.

Epigenetics: real biology, oversold psychology

DNA methylation, histone modifications, and non-coding RNA regulation are real, well-characterized mechanisms important in development. The controversy concerns whether environmentally-induced epigenetic marks are faithfully transmitted across generations in humans. They generally are not.

Heard & Martienssen (2014, Cell): in mammals, two waves of near-complete epigenetic reprogramming erase most acquired methylation marks. Robust transgenerational epigenetic inheritance occurs in plants and C. elegans; in humans it remains largely speculative. Heard & Martienssen 2014.
Dutch Hunger Winter (Heijmans et al. 2008): real within-individual epigenetic effect persisting decades, not evidence of transmission to grandchildren.
Yehuda’s Holocaust FKBP5 study (2016): tiny sample (n=8 control parents), opposite-direction effects in parents vs. offspring, no germline measurement. Yehuda’s own group failed to replicate. The “trauma is inherited epigenetically” narrative is not supported by current evidence.

Critical periods: solid developmental neuroscience

Hensch (2005, Nat Rev Neurosci) provides a mechanistically rigorous account of cortical critical-period plasticity. Hensch 2005. GABAergic maturation (parvalbumin-positive interneurons) gates onset; perineuronal nets and myelin-associated inhibitors close periods. This represents the high end of how environmental experience shapes brain structure — genuine, replicated, and mechanistically understood.

4. Sex and Gender Differences: Large Where You’re Not Told They Are

Sex differences are one of psychology’s most ideologically distorted areas — distorted by both minimization and overstatement. The actual picture: small differences in average cognitive ability, large differences in interests and physical aggression, moderate-to-large multivariate personality differences, and a robust but mechanistically contested gender equality paradox.

Cognitive abilities

Mental rotation shows d ≈ 0.56–0.73 male advantage (Voyer et al. 1995), among the largest cognitive sex differences documented. Mean math performance: d ≈ 0.05–0.10 (Lindberg et al. 2010) — essentially no average difference. Writing: substantial female advantage. School grades favor girls overall (Voyer & Voyer 2014). At extreme tails (95th–99th percentile) males outnumber females ~2:1 in many countries — driven by slightly greater male variance (~3–15% higher) compounding at extremes.

Personality: univariate vs. multivariate framing

Univariate Big Five differences are moderate: women higher on Neuroticism (d ≈ 0.40) and Agreeableness (d ≈ 0.40). Del Giudice, Booth & Irwing (2012) computed multivariate Mahalanobis D = 2.71 on 16PF data from 10,261 Americans, implying ~10% overlap between male and female personality profiles. Del Giudice et al. 2012. Hyde’s (2005) “Gender Similarities Hypothesis” — most differences trivial or small — is mathematically compatible but tells a very different qualitative story. Both univariate and multivariate framings should be reported jointly; selective use is ideological.

Interests: the largest sex difference in psychology

Su, Rounds & Armstrong (2009, Psych Bulletin) meta-analyzed 503,188 people: the People-Things dimension d = 0.93, with engineering interest d = 1.11. Su et al. 2009. These are very large by psychological standards and the largest in the entire literature on psychological sex differences.

Aggression

Archer (2004): physical aggression d ≈ 0.40–0.60 male; trait anger near zero. Males commit ~95% of homicides globally. Archer 2004. Indirect/relational aggression: Card et al. (2008) found differences trivial (d < 0.10), challenging the “girls do indirect aggression equally” narrative.

The Gender Equality Paradox (replicated; mechanism contested)

A robust empirical pattern across at least four domains: personality, preference, interest, and depression-rate differences are larger in more gender-equal and wealthier countries.

Schmitt et al. (2008): 55-nation Big Five study — differences largest in egalitarian Western cultures. Schmitt et al. 2008.
Falk & Hermle (2018, Science): 80,000 adults, 76 countries — sex differences in 6 economic preferences positively related to GDP and gender equality.
Stoet & Geary (2018): STEM Gender-Equality Paradox — more gender-equal countries had smaller female share of STEM graduates. A corrigendum addressed methods; the core correlation remained robust.

The correlation is robust. The causal mechanism — innate-expression release in wealthy environments vs. measurement artifacts vs. ecological confounds — is genuinely contested.

Mental health asymmetries

Depression female:male ≈ 2:1; anorexia ~10:1 female; ADHD diagnosis ~2–3:1 male; antisocial personality, substance use, completed suicide all male-skewed; autism ~3–4:1 male; schizophrenia roughly equal but more severe early-onset in males.

Biological mechanisms

CAH girls (prenatally elevated androgens) show masculinized toy preferences and play patterns (Kung et al. 2024 meta-analysis). Same-sex-typed toy preferences in vervet and rhesus monkeys parallel human findings, supporting partial biological mediation. Wood & Eagly’s social role theory faces empirical challenge from the gender equality paradox.

5. Cognitive Ability and Intelligence

The g-factor

Spearman’s 1904 finding of a positive manifold — every cognitive test correlates positively with every other — is arguably the most replicated finding in psychology. A first unrotated principal factor captures 40–50% of variance in any sufficiently broad battery. van der Maas et al. (2006) mutualism model offers an alternative: g may be an emergent network property of reciprocally beneficial cognitive processes during development, not a unitary biological cause. van der Maas et al. 2006. Most working researchers treat g as a robust statistical regularity whose causal architecture is unsettled.

Structure: CHC theory

Carroll’s (1993) three-stratum theory — g at top, ~8–10 broad abilities (Gf, Gc, Gv, Ga, Gs, Gsm, Glr, Gq, Grw), ~70+ narrow abilities — was integrated with Cattell-Horn into the Cattell-Horn-Carroll (CHC) framework, which underlies modern IQ tests.

Predictive validity

Schmidt & Hunter (1998): corrected GMA validity for job performance r ≈ 0.51. Sackett et al. (2022) argued corrections were too aggressive; re-estimate: r ≈ 0.31 uncorrected / ~0.42 corrected. GMA remains among the most predictive selection tools. Childhood IQ predicts educational attainment at r ≈ 0.50–0.70. Calvin et al. (2011) meta-analysis (1.1M, 22,453 deaths): each 1-SD higher childhood IQ → ~24% lower all-cause mortality. Calvin et al. 2011.

Lifespan stability

Lothian Birth Cohort: age 11 → age 90 corrected correlation r ≈ 0.67. Deary et al. 2013. About one-third of variance in mental ability at 90 is accounted for by ability at 11.

Group differences in test scores: the most distorted area

Roth et al. (2001) meta-analysis (N=6.2M): U.S. Black-White cognitive ability gap d ≈ 1.0 (~15 IQ points). Dickens & Flynn (2006): Black IQ rose 4–7 points relative to whites between 1972–2002 (about one-third of the gap). Dickens & Flynn 2006. The gap exists, has narrowed somewhat, and has not closed.

The mainstream contemporary position (Nisbett et al. 2012; Turkheimer, Harden & Nisbett 2017): within-group heritability does not license between-group inferences (Lewontin’s point); Martin et al. (2019, Nature Genetics) demonstrated PGS lose ~4.5x prediction accuracy in African-ancestry individuals due to differential LD and allele frequencies, meaning current PGS cannot validly compare mean genetic predisposition across continental ancestry groups. Mostafavi et al. (2020) showed PGS portability also breaks down within Europeans across SES strata.

The honest scientific position: gaps in test scores are real, partly narrowing, and their causes are not currently identifiable as genetic, environmental, or both — direct evidence is absent and mainstream geneticists treat the question as not currently answerable.

Distortion from the hereditarian direction: treating g-loadedness as evidence of genetic etiology (environmental causes can also be g-loaded); citing fringe admixture studies published in weak-peer-review venues; conflating absence of evidence with agnosticism. Distortion from the environmentalist direction: claiming gaps have closed when they only partly narrowed; dismissing IQ as “culturally biased” despite measurement-invariance evidence; overstating stereotype threat (Flore & Wicherts 2015 meta-analysis showed publication-biased modest effects).

Brain correlates

Brain volume × IQ: r ≈ 0.24 (Pietschnig et al. 2015, 2022). P-FIT theory (Jung & Haier 2007): intelligence supported by parieto-frontal network. Jung & Haier 2007.

Creativity and intelligence

The “IQ ≈ 120 threshold” hypothesis is largely disconfirmed (Weiss et al. 2020). Intelligence and creativity correlate ~r = 0.20–0.30 across the range. Openness to Experience is the personality trait most reliably correlated with creative achievement (~0.30–0.40).

6. Personality and Temperament

The Big Five (OCEAN) and HEXACO

The Big Five emerged from the lexical hypothesis. Heritability is ~40–60% per twin studies; SNP-h² is 8–18%. Nagel et al. (2018) identified 136 loci for neuroticism in 449,484 people. Nagel et al. 2018. The ReGPC consortium (2025) reports 703 loci for neuroticism in 1M+ participants. ReGPC 2025.

Roberts & DelVecchio (2000): rank-order stability rises from ~0.31 in childhood to ~0.74 by midlife (cumulative continuity). Roberts & DelVecchio 2000. Roberts et al. (2006) the maturity principle: mean-level increases in Conscientiousness, Agreeableness, and Emotional Stability with age, especially in young adulthood. Bleidorn et al. 2022 update.

HEXACO (Ashton & Lee): lexical studies in 12+ languages consistently yield six factors, the sixth being Honesty-Humility. H predicts integrity-related criteria incrementally over Big Five. Ashton & Lee 2008.

Temperament: the developmental foundation

Temperament research constitutes a parallel tradition to adult personality, focused on biologically-grounded individual differences emerging in infancy.

Rothbart’s model identifies three overarching dimensions: Surgency/Extraversion (activity, positive affect, approach), Negative Affectivity (fear, anger, sadness, discomfort), and Effortful Control (attentional regulation, inhibitory control, low-intensity pleasure). Effortful Control is particularly important — it is the self-regulatory component of temperament, developing primarily during ages 2–7 as the anterior attention network matures, and is a strong predictor of later externalizing problems, academic success, and conscience development.

Kagan’s Behavioral Inhibition (BI) framework focuses on extreme phenotypes: ~15–20% of infants show high-reactive patterns (vigorous motor activity and distress to novel stimuli at 4 months) who become behaviorally inhibited toddlers — cautious, avoidant with unfamiliar people and situations. BI maps approximately onto low Surgency + high Negative Affectivity (especially fear). Kagan’s longitudinal studies showed BI is moderately heritable (~50%), associated with higher resting heart rate and amygdala excitability, and predicts elevated risk for social anxiety disorder in adolescence (OR ~2–4). However, ~60% of high-reactive infants do not become clinically anxious adults — biology is a foundation, not a constraint.

Thomas & Chess’s (1977) “goodness of fit” model — later empirically supported — emphasized that temperamental difficulty per se doesn’t predict poor outcomes; the match between child temperament and environmental demands does.

The temperament → personality continuity is increasingly well-documented: infant Surgency maps onto adult Extraversion; infant Negative Affectivity onto Neuroticism; infant Effortful Control onto Conscientiousness. The mapping is imperfect — adult personality includes social-cognitive layers (identity, values, narrative) absent in temperament.

Cross-cultural universality

McCrae & Terracciano (2005): clean Big Five replication in 50 cultures. McCrae & Terracciano 2005. Gurven et al. (2013) challenged this with the Tsimane forager-horticulturalists, where the full Big Five did not robustly emerge. Gurven et al. 2013. Consensus: 3 factors (E, A, C) replicate cross-linguistically; the full Big Five replicates well in Indo-European languages; non-WEIRD samples sometimes show structural deviations.

Dark traits and the D factor

Paulhus & Williams (2002): the Dark Triad (Machiavellianism, narcissism, psychopathy). Buckels, Jones & Paulhus (2013) added everyday sadism. Moshagen, Hilbig & Zettler (2018) proposed the D factor — a general tendency to maximize individual utility while disregarding others — as the common core, mapping strongly onto low Honesty-Humility. Moshagen et al. 2018.

Person-situation debate: resolved

The Mischel (1968) critique — cross-situational consistency rarely exceeds r ≈ 0.30 — was resolved through aggregation, interactionism (Mischel & Shoda’s CAPS model), and Fleeson’s within-person variability framework. The modern consensus: persons, situations, and their interactions all matter.

Personality predicts outcomes as strongly as IQ and SES

Roberts et al. (2007) “The Power of Personality”: meta-analytic comparison shows personality effects on mortality, divorce, and occupational attainment are indistinguishable in magnitude from SES and cognitive ability effects. Roberts et al. 2007. Conscientiousness predicts mortality through health behaviors with large effect size. Bogg & Roberts 2004.

Recent theoretical developments

DeYoung’s Cybernetic Big Five Theory (2015): traits as parameters of a cybernetic goal-pursuit system. DeYoung 2015. Mõttus et al. (2017): “personality nuances” research argues item-level traits capture incremental valid variance below the facet level. Mõttus et al. 2017.

7. Neurodiversity and Psychopathology: Dimensional, Polygenic, Transdiagnostic

The genomic era has produced three conclusions that fundamentally reshape psychiatric nosology: all major psychiatric conditions are highly heritable, hyper-polygenic, and substantially genetically overlapping across diagnostic categories.

Headline findings by disorder

Schizophrenia: twin h² ~80%; Trubetskoy et al. (2022, Nature): 287 loci; SNP-h² ~24%. Trubetskoy et al. 2022. Environmental risk factors: urban birth (~2× risk), high-potency cannabis (OR ~3.9), migration, obstetric complications.
Bipolar: twin h² ~70–85%; Mullins et al. (2021): 64 loci; rg(SCZ,BD) ~0.7. Mullins et al. 2021.
Major Depression: h² ~37%; Howard et al. (2019): 102 loci; SNP-h² ~9%. Howard et al. 2019. Strong rg with neuroticism (~0.7).
ADHD: twin h² ~74%; Demontis et al. (2023): 27 loci. Demontis et al. 2023. Negative rg with educational attainment and IQ.
Autism: twin h² ~80%; Grove et al. (2019): 5 common-variant loci plus substantial rare/de novo variants of large effect (CHD8, SCN2A, SYNGAP1). Grove et al. 2019. Common-variant PGS positively correlated with IQ and education; ID-comorbid autism (rare-variant-driven) negatively correlated.

Cross-disorder pleiotropy (with assortative mating caveat)

Brainstorm Consortium (2018, Science): substantial genetic correlations among psychiatric disorders. Cross-Disorder PGC (Lee et al. 2019, Cell): across 8 disorders, 109 pleiotropic loci, three clusters — compulsive, mood/psychotic, early-onset neurodevelopmental. Lee et al. 2019.

Critical caveat: Border et al. (2022, Science) showed that cross-trait assortative mating can generate spurious genetic correlations between phenotypes with entirely distinct genetic bases. Some fraction of reported psychiatric cross-disorder genetic correlations may reflect xAM rather than shared biology. The magnitude of this artifact is actively being quantified and represents a major revision in progress.

The p-factor

Caspi et al. (2014): a single p (general psychopathology) factor fit Dunedin cohort data better than three-factor models — analogous to g for cognitive ability. Caspi et al. 2014. Higher p associated with greater impairment, familiality, worse developmental histories. Replicated in dozens of samples. Interpretations contested: genuine common liability, statistical artifact of bifactor over-extraction, or a reflection of impairment/distress per se.

Dimensional alternatives: HiTOP and RDoC

HiTOP (Kotov et al. 2017): a quantitatively-derived dimensional alternative to DSM organized hierarchically. RDoC (NIMH 2009–): six dimensional neurobiologically-grounded research domains. Both converge with taxometric evidence (most psychopathology is dimensional, not taxonic) on the dimensional turn in psychiatric science.

Polygenic scores in clinics: not yet

Best PGS R² ~7–10% for schizophrenia. PGS alone does not outperform family history. PGS performance drops 50–70% in non-European-ancestry populations — a major equity and portability problem.

The neurodiversity framework: scientific–identity tensions

Coined by Singer (1998), the neurodiversity paradigm reframes autism, ADHD, dyslexia as natural variation rather than pathology. The framework has legitimate ethical force but operates in tension with deficit-oriented findings for severe presentations (profound autism with ID, epilepsy, self-injury). A defensible position recognizes both the reality of impairment at the severe end and the population-level continuous variation that grades into normality.

8. Key Researchers and Labs

Researcher	Affiliation	Central contribution
Robert Plomin	King’s College London	Behavioral genetics synthesis; Blueprint; GPS
Eric Turkheimer	University of Virginia	Three Laws; Scarr-Rowe; philosophical foundations
K. Paige Harden	UT Austin	Genetic Lottery; causal inference with PGS
Avshalom Caspi / Terrie Moffitt	Duke / King’s	p-factor; Dunedin cohort; (and candidate-GxE)
Ian Deary	Edinburgh	Lothian Birth Cohorts; cognitive epidemiology
Elliot Tucker-Drob	UT Austin	Education-IQ meta-analysis; Wilson Effect mechanisms
Daniel Benjamin / SSGAC	UCLA	EA GWAS consortium; social-science genomics
Colin DeYoung	Minnesota	Cybernetic Big Five Theory; personality neuroscience
Jay Belsky	UC Davis	Differential susceptibility
Marco Del Giudice	UNM	Multivariate sex differences
Janet Hyde	Wisconsin	Gender similarities hypothesis
David Geary	Missouri	Sex differences in math/STEM
Brent Roberts	UIUC	Personality development; maturity principle
Alexander Young	UCLA	Genetic nurture; within-family methods
Richard Border	Harvard/UCLA	Candidate gene demolition; xAM
Peter Hatemi / John Hibbing	Penn State / Nebraska	Genopolitics; heritability of political attitudes

9. The Integrated Picture: What Generates Psychological Variation

The model

A formal model of individual psychological variation should treat the person as the joint product of:

(a) A hyper-polygenic genome encoding thousands of small-effect predispositions (plus some rare large-effect variants in neurodevelopmental conditions). Twin h² for most traits falls in 0.40–0.80.

(b) Substantial gene-environment correlation through passive (parents transmit genes + correlated environments), evocative (child traits elicit responses), and active (niche-picking) channels. Roughly half of population-level PGS prediction reflects indirect/environmental mediation by genetically-similar parents, not direct genetic causation.

(c) Assortative mating inflating additive genetic variance, genetic correlations between traits, and PGS prediction accuracy. This is a recently-quantified source of systematic bias in nearly all genetic estimates.

(d) A small set of large-effect environmental insults — lead, severe iodine deficiency, heavy prenatal alcohol, severe deprivation — plus schooling (~3.4 IQ points/year). Effects are typically asymmetric: removing severe deficits matters more than enriching above-normal environments.

(e) Substantial stochastic developmental noise — the dominant source of the non-shared environment, which accounts for ~50% of personality variance and is not yet well-characterized mechanistically.

(f) Cultural/institutional contexts that modulate which genetic predispositions are expressed and rewarded (WEIRD effects, gender equality paradox, Scarr-Rowe interaction, Flynn Effect).

(g) Developmental unfolding across time — temperament in infancy (biologically grounded reactivity and regulation) becomes personality in adulthood (adding social-cognitive layers), with heritability increasing across the lifespan (Wilson Effect) and rank-order stability rising to ~0.74 by midlife.

Where political distortion is strongest, by direction

From the environmentalist/blank-slate direction: dismissing twin study validity wholesale; overstating Scarr-Rowe; promoting transgenerational epigenetic narratives that exceed evidence; dismissing IQ as culturally biased despite measurement-invariance findings; overstating stereotype-threat magnitudes; minimizing the gender equality paradox.

From the hereditarian direction: citing within-population heritability to license between-population genetic inferences; citing fringe admixture studies as if mainstream; treating g-loadedness of gaps as evidence of genetic etiology when environmental causes can also be g-loaded; ignoring the assortative mating and genetic nurture corrections to PGS.

From the “gender similarities” direction: selective citation of d ≈ 0.05 for math to imply no differences anywhere; obscuring multivariate D ≈ 2.71 with univariate framing; minimizing d ≈ 0.93 people-things interest differences.

From popular evolutionary psychology: treating dimensional differences as taxonic; extrapolating from small ds to categorical claims; overgeneralizing from specific tasks to broad domain claims.

Open questions worth modeling

Mechanistic interpretation of PGS: Plomin’s “causal genetic” view vs. Turkheimer’s “weak genetic explanation” — genuinely open.
Flynn Effect reversal: cause unknown; one of the most important open questions in differential psychology.
Gender equality paradox mechanism: innate-expression release vs. measurement artifacts vs. wealth confounds — unsettled.
Between-population cognitive differences: currently scientifically unanswerable (PGS portability too poor; cross-ancestry GWAS at scale don’t exist). Honest position: unresolved, not settled in either direction.
The causal architecture of g: latent common cause vs. emergent network property (mutualism) — the positive manifold is not in dispute; what generates it is.
What non-shared environment actually is: stochastic noise, epigenetic variation, immune/microbial variation, differential peer networks — largely uncharacterized despite accounting for ~50% of personality variance.
Assortative mating correction magnitudes: how much do AM and xAM corrections change the substantive picture of genetic architecture and cross-trait pleiotropy? Active area of revision.

10. Load-Bearing Assumptions and Falsification Conditions

This section makes explicit which conclusions in this review depend on which assumptions, and what evidence would substantially revise or flip them. Ordered roughly by how much of the document’s picture collapses if the assumption fails.

Assumption 1: The twin method provides approximately valid variance decomposition

What depends on it: Nearly all h² estimates in Section 1’s table, the C ≈ 0 finding for adult personality, the Wilson Effect, the Scarr-Rowe interaction.

Status: Approximately valid. SNP-h² estimates (which bypass EEA entirely) give lower but still substantial heritability for every trait measured. MZ-reared-apart designs converge with MZ-reared-together. Felson (2014) estimates ~10–20% EEA-induced inflation, not enough to eliminate the core finding.

What would flip it: SNP-h² for psychological traits systematically converging on <0.05 (would suggest twin h² is mostly EEA artifact). Or: a large, well-powered MZ-reared-apart study finding IQ correlations <0.40 (current estimates ~0.70). Neither has occurred.

Robustness verdict: HIGH. The convergence of twin, adoption, and molecular methods on moderate-to-substantial heritability is the most replicated finding in the field.

Assumption 2: GWAS identifies real genetic signal (not just population structure and AM artifacts)

What depends on it: The entire PGS enterprise, genetic nurture estimates, cross-disorder pleiotropy findings, the “missing heritability” narrative.

Status: Substantially valid but with known inflation. Within-family PGS effects are non-zero for educational attainment (~half of population effects), meaning direct genetic signal exists. But the magnitude of AM and stratification inflation is still being quantified.

What would flip it: Within-family PGS effects for most traits converging on ~zero (would mean population-level PGS prediction is entirely indirect/environmental). Current evidence: within-family effects are reduced but clearly non-zero for EA, BMI, height; less well-characterized for personality and psychiatric traits.

Robustness verdict: MODERATE-HIGH for the existence of direct genetic effects; MODERATE for their precise magnitude, which is actively being revised downward.

Assumption 3: g is a real dimension of individual variation (not a measurement artifact)

What depends on it: The entire intelligence section (Section 5), predictive validity claims, group-difference discussions, the CHC structure.

Status: The positive manifold is among the most replicated findings in psychology. Whether g is a latent common cause or an emergent network property (mutualism) is unsettled, but both interpretations preserve g’s predictive validity and the meaningfulness of individual differences in general cognitive ability.

What would flip it: A sufficiently broad, well-constructed cognitive battery where the first principal component explains <15% of variance (would undermine the positive manifold). Or: successful interventions that consistently raise one cognitive ability while lowering others (would violate the manifold’s structure). Neither has been demonstrated.

Robustness verdict: HIGH for g as a statistical regularity with predictive validity. MODERATE for g as a unitary biological mechanism (mutualism remains a viable alternative).

Assumption 4: Sex-difference effect sizes from meta-analyses are not primarily measurement artifacts

What depends on it: The gender equality paradox, the claim that interest differences (d = 0.93) are among psychology’s largest, the multivariate personality finding (D = 2.71).

Status: Interest measures (Su et al. 2009) use well-validated instruments; the d = 0.93 holds across inventories and cultures. The Del Giudice multivariate D is sensitive to the number of variables included and the specific battery, though the qualitative finding (large multivariate difference despite moderate univariate ds) is robust across datasets. CAH and non-human primate evidence provides independent convergent support for biological mediation of interest differences.

What would flip it: A large cross-cultural study using behavioral (not self-report) interest measures finding d < 0.30 for people-things. Or: evidence that the gender equality paradox disappears when using non-self-report personality measures (reference-group effects could inflate self-report differences in egalitarian countries). Current evidence: Falk & Hermle (2018) used incentivized behavioral measures for some preferences and found the paradox held, but full behavioral replication across all domains is incomplete.

Robustness verdict: HIGH for the existence of substantial sex differences in interests and aggression. MODERATE for the precise magnitude of multivariate personality differences. MODERATE for the gender equality paradox’s causal interpretation.

Assumption 5: The candidate-GxE collapse generalizes — specific gene × environment interactions are mostly small or nonexistent

What depends on it: Section 3’s dismissal of 5-HTTLPR and similar findings, the shift toward polygenic approaches.

Status: For candidate genes, the collapse is definitive (Border et al. 2019). But this does not necessarily mean polygenic-score × environment interactions are also null. PGS × environment work is younger, uses better methods, and could in principle yield robust results.

What would flip it: Multiple large, pre-registered PGS × measured-environment studies showing robust, replicable interactions explaining >5% of variance. Current evidence: a few suggestive findings (PGS-for-education × compulsory schooling reforms) but nothing approaching the scale or replication needed for confidence.

Robustness verdict: HIGH for the candidate-gene collapse. LOW-MODERATE confidence in the broader claim that specific GxE interactions are generally small — this is an extrapolation from the candidate-gene failure, and the polygenic GxE literature is too young to draw strong conclusions.

Assumption 6: Cross-disorder genetic correlations reflect shared biology (pleiotropy)

What depends on it: The p-factor interpretation, HiTOP structure, the “dimensional turn” in psychiatry, transdiagnostic treatment rationales.

Status: Substantially challenged by Border et al. (2022). Cross-trait assortative mating can generate spurious genetic correlations between traits with entirely distinct genetic bases. The R² = 74% finding means most of the variance in genetic correlation estimates tracks spousal phenotypic correlations — though this does not prove all genetic correlations are spurious (some genuine pleiotropy surely exists).

What would flip it: Within-family designs showing that cross-disorder genetic correlations survive AM correction at >50% of current estimates. Or: identification of specific shared biological pathways (e.g., synaptic pruning variants affecting both SCZ and BD) that don’t depend on LD induced by AM.

Robustness verdict: MODERATE. The dimensional/transdiagnostic pattern is likely real but inflated. The magnitude of genuine pleiotropy vs. AM artifact is one of the field’s most active methodological debates.

11. Toward Topology: Structure for the Next Phase

This section identifies the natural graph/network structure embedded in this literature, to facilitate the transition from landscape analysis to formal topology mapping.

Natural node types

Trait nodes: Cognitive abilities (g, Gf, Gc, Gv, Gs…), personality dimensions (Big Five/HEXACO factors and facets), temperament dimensions (Surgency, Negative Affectivity, Effortful Control), psychopathology spectra (internalizing, externalizing, thought disorder), interests (people-things, RIASEC), political/moral attitudes
Mechanism nodes: Genetic architecture (common polygenic, rare large-effect, de novo), environmental factors (lead, iodine, schooling, deprivation, neighborhoods), developmental processes (critical periods, niche-picking, genetic nurture, AM), stochastic noise
Method nodes: Twin studies, adoption studies, GWAS, PGS, within-family designs, Mendelian randomization, meta-analysis
Population-level modifier nodes: SES (Scarr-Rowe), culture (WEIRD), gender equality index, historical period (Flynn Effect)

Natural edge types

Genetic correlations (with AM caveat): e.g., rg(SCZ, BD) ≈ 0.7; rg(EA, IQ) ≈ 0.7; rg(neuroticism, MDD) ≈ 0.7
Developmental continuity: temperament → personality (Surgency → Extraversion; Effortful Control → Conscientiousness)
Causal environmental effects: lead → IQ (−6.2 pts per 10 µg/dL); schooling → IQ (+3.4 pts/year)
Predictive validity edges: g → job performance (r ≈ 0.42); Conscientiousness → mortality; EA PGS → income
Methodological dependency: twin h² → SNP-h² → PGS R² (each constraining the next)
Taxonomic hierarchy: g → broad abilities → narrow abilities (CHC); p → spectra → subfactors → syndromes (HiTOP)
Moderation edges: SES × heritability (Scarr-Rowe); gender equality × sex differences (GEP); age × heritability (Wilson Effect)

Key structural features for the graph

Two parallel hierarchies (CHC for cognition, HiTOP for psychopathology) that share genetic correlations at the top level (g correlates with p inversely)
A developmental cascade from temperament (infancy) through personality (adulthood) through outcomes (mortality, income, relationships), with heritability increasing and shared-environment decreasing across the lifespan
A methodological funnel from twin estimates (broadest, highest h²) through molecular estimates (narrower, lower h²) through within-family estimates (narrowest, lowest but most causally clean)
Cross-domain genetic correlations that form a web connecting cognition, personality, and psychopathology — but with the critical caveat that an unknown fraction may be AM artifact rather than biological pleiotropy

Highest-leverage next steps for topology phase

Build the trait correlation matrix: Assemble published genetic correlations (from LD Score regression / GWAS) among the ~20–30 most well-characterized traits spanning cognition, personality, and psychopathology. Annotate each with AM-corrected estimates where available. This matrix is the empirical backbone of the topology.
Map the developmental cascade: Create a directed graph from temperament → personality → outcomes with age-indexed heritability and stability coefficients as edge weights. This captures the time dimension that a static correlation matrix misses.
Formalize the variance decomposition: For each major trait, create a standardized decomposition: [direct genetic] + [genetic nurture/indirect] + [AM-induced] + [shared environment] + [measured non-shared environment] + [stochastic residual]. Where values are unknown, flag them explicitly. This is the generating function skeleton that the formalization phase will flesh out.

Read full stage →

pass 4

Dependency graph of the lit review. Three categories of high-stakes node (foundational cruxes / reframer nodes / logical guardrails) plus weakest links, four variant views, three Stage-3 options, an objections section, and a glossary. Updated through 2024-2025 literature on AM correction, within-family GWAS at scale, PGS portability, Scarr-Rowe collapse, GEP replication, and missing-heritability closure.

TLDR

The lit review documents what the science says about psychological variation. This topology asks a sharper question: what depends on what? Strip the field down to its load-bearing structure and the picture is surprisingly clean. Three foundational assumptions — that twin/adoption methods give approximately valid variance decomposition (A1), that GWAS signal reflects real genetic effects rather than population structure or assortative-mating artifact (A2), and that a general factor of cognitive ability g exists as a real dimension of individual variation (A3) — sit upstream of most of the empirical and synthesis nodes in the graph; if any one of them flipped, large regions would have to be rebuilt. Everything else is either an empirical claim resting on these foundations, a methodological prerequisite that lets the foundations be tested, a logical necessity that constrains how the empirical claims can be interpreted, or a generating mechanism that explains why the empirical pattern looks the way it does.

The high-stakes nodes split into three categories — keeping them separate is the single most useful conceptual move in this topology. Foundational cruxes (A1 twin validity, A2 GWAS signal real, A3 g exists) are the assumptions that, if falsified, force rebuilding regions of the picture. Reframer nodes (G2/E6 passive rGE / genetic nurture; G6/E7 cross-trait assortative mating) don’t break the picture if reversed — they change what it means; their magnitudes are being actively quantified and their precise share of population-level “genetic” effects is the field’s most consequential open quantity. Logical guardrails (L1 variance-ratio definition; L4 Lewontin firewall) cannot be falsified — they can only be ignored, which is exactly how most public-discourse misuse of the field proceeds. Conflating these three types under a single label of “important findings” is a major source of bad-faith debate.

The field’s weakest links are not where public discourse focuses heat. Mainstream contests over “is heritability real” target settled findings (A1+E1 are robust); the 2025 whole-genome-sequencing work (Wainschtein et al. 2025, Nature) now closes ~88% of the pedigree-based heritability gap, so the “missing heritability” critique is also substantially answered. The actual fragile zones in 2026 are: (a) the generalization from candidate-GxE failure to all-GxE-is-small — partially holding the null (Allegrini 2020 for education, 2025 systematic review for depression) but the literature is still too young for confidence; (b) Scarr-Rowe has weakened further — Ghirardi et al. 2024 found 39/42 PGI×SES interactions in the opposite (compensatory) direction, so “deprivation suppresses heritability” is now evidence-thin; (c) the polygenic-score → mechanism inference (Plomin’s “causal” view vs. Turkheimer’s “weak explanation” view) remains genuinely undecided; (d) the magnitude of AM-correction across psychiatric cross-disorder rg estimates is now being actively addressed — Ma, Wang, Border et al. 2024 (LAVA-Knock) is the first method to systematically reduce xAM-induced bias, with the field-wide answer likely in 2–3 years. The Flynn-reversal cause and the Gender Equality Paradox mechanism remain open mechanistic questions, but they are open in a different way — the empirical patterns themselves are robust; only the explanation is contested. The 2025 GEP systematic review (Herlitz et al.) actually strengthened the pattern across personality, verbal abilities, episodic memory, and negative emotions.

This topology is the input to model formalization (Stage 3). The cleanest formalization target is the variance decomposition equation: V(trait) = V(direct genetic) + V(genetic nurture / indirect) + V(AM-induced LD) + V(shared environment) + V(measured non-shared) + V(stochastic) + 2·Cov(genes, environment) + interaction terms — with each term parameterized by trait, age, and population context, and with the AM and rGE terms being where current methodological revision is concentrated. The four variant views below (Vulnerability / Flow / Minimal / Politicization) read the same graph through different lenses to make the formalization choices easier.

The graph

All ~50 nodes and their dependencies. Click a node for detail; drag to rearrange.· drag empty space to pan · scroll to zoom

Legend

Assumption

Method

Empirical

Logical

Mechanism

Synthesis

Open

Distortion

Crux node (halo)

Node size reflects load-bearing weight (1–5).

Click a node to see its claim, status, and load-bearing weight. Hover an edge to see the relation type. Drag nodes to rearrange, drag empty space to pan, scroll to zoom.

Click a node for its claim and load-bearing weight; hover an edge for the relation type; drag to rearrange. The variant toggles read the same graph through different lenses (vulnerability, flow, minimal claim set, politicization).

How to read this graph

Every node in the lit review collapses to one of eight types. Edges between them carry one of seven relations. Together they make the structure inspectable.

Node types

Code	Type	What it is
A	Foundational assumption	A claim the field cannot operate without; if false, large downstream regions collapse
M	Methodological prerequisite	A study design or estimation tool that must work for the empirical claims to be testable
E	Empirical claim	A specific measured finding with an effect size and replication status
L	Logical necessity	Follows from definitions or algebra; not empirically refutable
G	Generating mechanism	A causal process that explains a pattern (rGE, AM, niche-picking, critical periods)
S	Synthesis claim	An integrative statement combining multiple lower-level claims
O	Open question	Genuinely undecided with current methods or evidence
D	Distortion vector	Where motivated reasoning concentrates (typed by direction)

Edge types

Code	Edge	Meaning
dep	depends-on	If target collapses, source collapses
imp	implies	Logical implication
sup	empirically-supports	Evidence relation
conf	confounds / inflates	Artifact relationship (e.g., AM inflates rg)
mod	moderates	Changes magnitude (e.g., SES × heritability)
dev	develops-into	Temporal/developmental successor (temperament → personality)
corr	corrects	Within-family corrects between-family bias

Weight scale (load-bearing weight, 1–5)

5 — crux node; collapse propagates across multiple sections of the lit review
4 — load-bearing within a section
3 — important but local
2 — corroborating
1 — decorative; could be removed without changing the picture

1. Node catalog

Each node carries: type code · weight · short claim · key citation · status. Status flags: ✓ (robust/replicated), ~ (partial/qualified), ? (contested), ✗ (refuted, kept as historical reference).

A — Foundational assumptions

ID	Wt	Claim	Status
A1	5	Twin/adoption methods provide approximately valid variance decomposition (EEA modestly violated but not fatally)	✓
A2	5	GWAS signal reflects real genetic effects, not (only) population stratification or AM artifact	✓ partial
A3	5	A general factor of cognitive ability `g` is a real dimension of individual variation (positive manifold)	✓ statistical / ? mechanism
A4	3	Heritability findings apply to the population sampled, not to individuals or other populations (scope)	✓
A5	3	Phenotypes are reliably and validly measurable across cultures and time	~
A6	3	Most psychological variation is dimensional, not taxonic	✓

M — Methodological prerequisites

ID	Wt	Tool	Notes
M1	4	Twin studies (MZ/DZ) at scale	Polderman 2015 meta: 14.5M pairs
M2	4	Adoption studies, especially cross-cultural (Korean-American)	Sacerdote 2007; Beauchamp 2023
M3	5	GWAS at N ≥ 100k (ideally ≥ 1M for personality/EA)	Okbay 2022 (3M for EA)
M4	5	Within-family designs (sibling FE, MZ-discordant, parent-offspring trios). Kong 2018; Okbay 2022 (EA, N=3M); Howe et al. 2022 Nature Genetics extended this to 178k siblings × 25 phenotypes — within-sibship estimates were systematically smaller than population estimates for height, EA, cognitive ability, depressive symptoms, smoking. The within-family approach is now mature beyond just educational attainment	✓
M5	4	Polygenic scores (PGS)	Best R² ~0.16 for EA, ~0.10 for SCZ
M6	3	Cross-trait LD-score regression for genetic correlations	Brainstorm 2018
M7	3	Mendelian randomization	For causal inference from observational data
M8	3	Pre-registration & collaborative meta-analysis	Demolished candidate-GxE
M9	4	Whole-genome sequencing (rare-variant capture). 2025 follow-up (UK Biobank ~500k, Wainschtein et al. 2025 Nature) captures ~88% of pedigree-based narrow-sense heritability across many traits (20% rare + 68% common variants). The “missing heritability” problem is now substantially resolved for many phenotypes	✓

E — Empirical claims

Cognition / IQ:

ID	Wt	Claim	Status
E1	5	Mean trait heritability ≈ 0.49 across 17,804 traits (Polderman 2015)	✓
E2	5	Shared environment C ≈ 0 for adult personality and most adult cognition	✓ with exceptions (EA, religiosity, politics)
E3	4	Wilson Effect: IQ heritability rises from ~0.20 (age 5) to ~0.80 (adulthood)	✓
E4	5	Hyper-polygenic architecture: thousands of small-effect variants	✓ (Turkheimer’s 4th Law, Chabris 2015)
E5	4	Candidate-gene approach for psychiatric/personality traits failed (5-HTTLPR etc.)	✗ original claims; ✓ collapse finding
E6	5	Within-family PGS effects are ~½ population-level effects (genetic nurture)	✓ for EA, BMI, height
E7	5	Cross-trait assortative mating explains R²≈74% of variance in genetic correlation estimates	✓ (Border 2022, Science)
E8	4	Lead exposure 1–10 µg/dL → −6.2 IQ pts (Lanphear 2005)	✓
E9	4	Each year of schooling adds ~3.4 IQ points, persisting into old age	✓ (Ritchie & Tucker-Drob 2018)
E22	4	Within-population heritability does not license between-population inference	✓ logical
E23	4	PGS prediction accuracy decays continuously along the genetic-distance continuum from training population (Pearson r = −0.95 between genetic distance and PGS accuracy across 84 traits, Ding et al. 2023, Nature). Reframes the older “discrete ancestry-group drop” picture (Martin 2019; Mostafavi 2020)	✓
E24	3	Flynn Effect and its post-1990s reversal are both environmentally driven (within-sibship evidence)	✓ pattern; ? cause
E25	2	Scarr-Rowe: SES × heritability hypothesis (more genetic expression in higher-SES). Weakening further in 2024 — Ghirardi et al. 2024 found 39/42 PGI×SES interactions in education NEGATIVE (compensatory direction); only 1 significant positive. Pattern is now closer to “compensatory hypothesis holds, Scarr-Rowe fails” than to “context-dependent”	✗
E18	4	Positive manifold: every cognitive test correlates positively with every other	✓
E26	3	Childhood IQ → all-cause mortality: each 1-SD ≈ 24% lower mortality	✓ (Calvin 2011)
E27	3	Lifespan IQ stability: Lothian Birth Cohort age-11 → age-90 r ≈ 0.67	✓
E28	3	Severe iodine/alcohol/deprivation cause large asymmetric IQ effects	✓

Personality / temperament:

ID	Wt	Claim	Status
E29	4	Big Five h² ≈ 0.40–0.60; cross-cultural replication for E/A/C	✓ partial (Tsimane qualifier)
E30	4	Cumulative continuity: rank-order stability rises to ~0.74 by midlife	✓
E31	4	Maturity principle: mean-level ↑ in C, A, ES with age	✓
E32	3	Temperament dimensions (Surgency, Negative Affectivity, Effortful Control) → adult personality	✓
E33	3	Personality predicts mortality/divorce/income at magnitudes ≈ SES & cognition	✓ (Roberts 2007)

Sex differences:

ID	Wt	Claim	Status
E10	4	Multivariate Big Five sex difference: D ≈ 2.71 (~10% overlap)	~ method-sensitive
E11	4	People-things interest difference d ≈ 0.93 (largest in psychology)	✓
E12	3	Mental rotation d ≈ 0.56–0.73 male advantage	✓
E13	3	Math performance d ≈ 0.05–0.10 (essentially equal)	✓
E14	4	Gender Equality Paradox: differences larger in egalitarian/wealthier countries. Herlitz et al. 2025 systematic review (54 articles, 27 meta-analyses) confirmed the pattern across personality, verbal abilities, episodic memory, and negative emotions — pattern replication has strengthened, not weakened	✓ pattern; ? mechanism
E34	3	Physical aggression d ≈ 0.40–0.60 male; ~95% of homicides male	✓
E35	2	CAH girls show masculinized toy preferences; primate parallels	✓

Psychopathology:

ID	Wt	Claim	Status
E15	4	All major psychiatric disorders highly heritable (h² 0.35–0.85) and hyper-polygenic	✓
E16	4	Cross-disorder genetic correlations exist	✓ existence; ? magnitude post-AM
E17	3	A `p` factor (general psychopathology) fits cross-syndrome data	✓ statistical; ? interpretation
E36	3	Autism: common-PGS positively correlated with IQ; rare/de-novo drives ID-comorbid cases	✓
E37	2	Critical-period plasticity (GABAergic, perineuronal nets) is mechanistically real	✓

L — Logical necessities

ID	Wt	Claim
L1	5	Heritability is a population variance ratio; it does not partition individual phenotypes (mathematical form — A4 is the scope-of-claim sibling)
L2	4	h² changes with environmental variance: hold genes constant, equalize environments → h² → 1
L3	5	Within-family designs control for between-family confounds (rGE, stratification, AM)
L4	5	Within-population heritability provides no information about between-population mean differences (Lewontin). E22 in the empirical column is the applied form of this same point
L5	3	Multivariate D ≥ max(univariate d) when component dimensions are positively correlated
L6	3	Positive manifold permits both unitary-cause and emergent-network interpretations of g
L7	3	Effect-size interpretation is scale-dependent (d=0.10 trivial in trait psychology, large in clinical)

G — Generating mechanisms

ID	Wt	Mechanism	Drives
G1	4	Active rGE / niche-picking	E3 (Wilson Effect amplification)
G2	5	Passive rGE. Wang 2021 / Isungset 2022 confirm indirect ≈ ½ direct genetic effect for EA. Nivard et al. 2024 found indirect genetic effects on offspring achievement extend beyond the nuclear family — dynastic / extended-family / community processes contribute, so the “parents transmit gene + correlated environment” framing understates the spread	E6 (genetic nurture); inflation of population-level h²
G3	3	Evocative rGE	Heritability of “environments” (Kendler & Baker 2007)
G4	3	Critical-period plasticity (GABAergic maturation)	Asymmetric environmental effects on early development
G5	4	Assortative mating → LD induction	Inflates additive genetic variance V(A) at the population level (Yengo 2018: 14–23% for height); inflates SNP h² and PGS effect sizes. Counterintuitively biases Falconer twin h² downward (raises rDZ relative to rMZ); twin-vs-WF gap for socially-structured traits is dominated by genetic nurture / EEA, with AM partially offsetting.
G6	5	Cross-trait AM → spurious genetic correlations	Confounds E16, E17 (p-factor)
G7	4	Stochastic developmental noise	Dominant source of non-shared environment
G8	3	Selection / niche construction across the lifespan	Bridges temperament → personality → outcome cascade

S — Synthesis claims

ID	Wt	Claim
S1	5	”Genes vs. environment” is the wrong frame; the system is tightly coupled (genome × rGE × AM × few large environmental insults × stochastic noise × culture × developmental unfolding)
S2	5	Twin h² ≥ SNP h² ≥ within-family h² gradient quantifies AM/rGE/measurement inflation across estimation methods
S3	4	Heritability ≠ destiny; high h² is compatible with large environmental shifts (height: h² ≈ 0.80, +10cm in a century)
S4	4	Most “non-shared environment” is stochastic, not systematic — it accounts for ~50% of personality variance and is poorly characterized
S5	4	Two parallel hierarchies (CHC for cognition, HiTOP for psychopathology) connected at the top by inverse g↔p genetic correlation
S6	4	Developmental cascade: temperament (infant biological reactivity) → personality (adult social-cognitive layer added) → outcomes (mortality, attainment, relationships) with h² ↑ and shared-env ↓ across the lifespan

O — Open questions

ID	Wt	Question	Why it matters
O1	5	Mechanistic interpretation of PGS: “causal genetic” (Plomin) vs. “weak explanation” (Turkheimer)	Determines what PGS prediction means
O2	3	Cause of Flynn-effect reversal post-1990s	Empirical pattern robust (Bratsberg & Rogeberg 2018 within-sibship Norway). Mechanism still unsettled. Pietschnig et al. 2024 (Vienna 2005–2018 cohort) added a wrinkle: the positive manifold itself may be weakening — gains in some abilities aren’t tracking gains in others, suggesting the g-loading of the rise/fall is not constant. Hypothesized mechanisms (screens, reduced long-form reading, attention) circulate without empirical pinning
O3	4	Causal mechanism behind Gender Equality Paradox	Innate-expression release vs. measurement artifact vs. confound — selection of explanation has political stakes
O4	5	Between-population mean differences: any genetic component?	Currently scientifically unanswerable with available methods (PGS portability too poor, cross-ancestry GWAS at scale don’t exist). Honest position: unresolved, not settled in either direction
O5	3	g architecture: latent common cause vs. emergent network (mutualism, van der Maas 2006)	Affects how interventions could in principle move g
O6	4	What “non-shared environment” actually is: stochastic noise, immune/microbial, peer networks, epigenetic, measurement error	Largest unmodeled variance component in personality
O7	5	Magnitude of AM-correction across the cross-disorder genetic correlation matrix	Active revision. Ma, Wang, Border et al. 2024 AJHG introduced LAVA-Knock — a local-genetic-correlation method that reduces xAM-induced bias. Methods to give the answer are now emerging, not just to flag the problem

D — Distortion vectors (where motivated reasoning concentrates)

ID	Direction	Targets	Failure mode
D1	Blank-slate / environmentalist	A1, E1, E10–E14	Dismiss twin studies wholesale; oversell transgenerational epigenetics; overstate stereotype threat; minimize sex differences via univariate-only framing
D2	Hereditarian	L4, E22, E23, O4	Ignore Lewontin; treat g-loadedness of gaps as evidence of genetic etiology; cite fringe admixture studies; ignore AM/rGE corrections to PGS
D3	”Gender similarities” minimization	E10, E11, E14	Selective citation (math d=0.05) to imply no differences anywhere; obscure D=2.71 multivariate; minimize d=0.93 interest gap
D4	Pop evpsych overgeneralization	E10–E14, A6	Treat dimensional ds as taxonic; extrapolate small ds to categorical claims; overgeneralize from specific tasks to broad-domain claims

2. Dependency cascade

The cascade reads from foundations up to synthesis, and from corrections back down to corrected claims.

Forward cascade (foundations → empirical claims → synthesis)

A1 ──dep──> M1, M2 ──sup──> E1, E2, E3, E25, E29
A2 ──dep──> M3, M5 ──sup──> E4, E6, E7, E15–E17, E22–E23
A3 ──dep──> E18 ──sup──> E26, E27 ──imp──> S5
A4 (scope) + L1 (form) ──guards──> interpretation of E1, E2, E3 and S3
A6 ──imp──> S5 (dimensional turn in psychiatry)
M9 ──corr──> E1 (closes missing-heritability gap)
M3 + M4 ──sup──> E6 (genetic nurture), E7 (xAM)
E5 (candidate-gene collapse) ──imp──> E4 (polygenic architecture confirmed by absence of large hits)

E1 + E2 + E3 + G1 ──imp──> S6 (developmental cascade)
E1 + E4 + G2 + G5 ──imp──> S2 (h² gradient by method)
E10 + E11 + E12 + E13 + L5 ──imp──> "small univariate, large multivariate" sex-difference picture
E14 + O3 ──imp──> mechanism-pending GEP
E15 + E16 + E17 ──imp──> S5 (HiTOP/p)
E22 + E23 + L4 ──imp──> O4 (between-pop unanswerable currently)
E1 + E2 + E4 + E6 + E7 + E8 + E9 + G1–G7 ──imp──> S1, S2 (integrated picture)

Backward / corrective cascade (newer evidence revises older claims)

G2 (passive rGE) ──corr──> E1 estimates (population-level overstates direct genetic)
G6 (cross-trait AM) ──corr──> E16 (some psychiatric rg's may be xAM artifact)
M4 (within-family) ──corr──> E6 magnitude (~½ of population PGS)
M8 (preregistration) ──corr──> E5 (collapsed candidate-GxE)
M9 (WGS) ──corr──> "missing heritability" interpretation

Distortion → target edges

D1 ──attacks──> A1, E1, E10, E11, E12, E14
D2 ──attacks──> L4, E23 (ignores), exploits A2 absent corrections from G2/G6
D3 ──attacks──> E10, E11 (selective univariate framing)
D4 ──attacks──> A6, L7

3. Where pressure concentrates

A common failure mode in this literature is to treat all high-stakes nodes as the same type of thing. They are not. The graph has three distinct categories of high-stakes node and one category of fragile claim — keeping these separate sharpens what the field actually needs to resolve.

3a. Foundational cruxes — falsification breaks regions of the picture

These are the empirical-or-methodological assumptions that, if wrong, force rebuilding large parts of the lit review.

A1 — Twin/adoption method validity. Carries Section 1 of the lit review; heritability-by-domain table; Wilson Effect. Robustness: HIGH (MZ-reared-apart, SNP-h² bypassing EEA, misperceived-zygosity all converge). Would flip if SNP-h² for psychological traits systematically converged on <0.05 — has not occurred.

A2 — GWAS signal is real (not artifact). Carries the PGS enterprise; genetic nurture estimates; cross-disorder pleiotropy; modern psychiatric genetics. Robustness: MODERATE-HIGH (within-family PGS effects are non-zero for EA, BMI, height — direct signal exists; AM/stratification inflation magnitudes still being quantified). Would flip if within-family PGS effects converged on zero across most traits.

A3 — g is a real dimension of cognitive variation. Carries Section 5 of lit review; predictive-validity claims; CHC structure; mortality/income predictions. Robustness: HIGH for g as a statistical regularity; MODERATE for g as unitary biological mechanism. Would flip if a broad cognitive battery had first-PC <15% or if interventions reliably moved one ability while lowering others. 2024 wrinkle: Pietschnig et al. 2024 reported the positive manifold may be weakening across recent cohorts — softly pressures A3 in a new way without refuting it.

3b. Reframer nodes — the answer is open and reshapes interpretation

These don’t break the picture if reversed; they change what the picture means. Their magnitudes are being actively quantified in 2024–2026 work. Conflating reframers with foundational cruxes is the most common conceptual error in pop-science treatments of this field.

G2 / E6 — Passive rGE / genetic nurture. Reframes the meaning of every population-level genetic estimate. Without G2, “genetic transmission” reads as direct biological causation; with G2, ~half is environmentally mediated by genetically-similar parents (Wang 2021 / Isungset 2022). Nivard et al. 2024 (Nat Hum Behav) showed indirect genetic effects extend beyond the nuclear family to dynastic / extended-family processes. The existence is robust; precise magnitude across all traits is still being quantified.

G6 / E7 — Cross-trait assortative mating. Reframes the cross-disorder genetic-correlation matrix and the p-factor’s interpretation. Border 2022 (Science) showed phenotypic cross-mate correlations explain R²=74% of variance in genetic-correlation estimates. Ma, Wang, Border et al. 2024 (LAVA-Knock) is the first method to systematically reduce xAM-induced bias. The share of any specific rg that is artifact vs. genuine pleiotropy is still pending.

3c. Logical guardrails — unfalsifiable but load-bearing for interpretation

These cannot be falsified — they are algebraic / definitional truths. They can be ignored, which is how most public-discourse misuse of the field happens.

L1 — Heritability is a population variance ratio, not an individual partition. Cannot be falsified. Public misreading of “70% heritable IQ” as “70% of any individual’s IQ comes from genes” is the failure of L1, not the science.

L4 — Within-population heritability does not license between-population mean inference (Lewontin firewall). Cannot be falsified — it is a logical/algebraic point. Can only be ignored. The empirical buttress today is E23 (PGS portability collapse along genetic-distance continuum, Ding 2023): even if you wanted to use within-pop methods to speak to between-pop differences, the methods don’t currently work.

3d. Decorative material (safe to compress)

Removable from the topology without changing the qualitative picture:

E35 (CAH / primate toy preferences) — convergent evidence, not necessary
E37 (specific GABAergic critical-period mechanisms) — biologically real, not load-bearing for the variation argument
HEXACO Honesty-Humility specifics — incremental over Big Five
Specific Dark-Triad subdimensions — D-factor synthesis (Moshagen 2018) carries more weight
P-FIT brain network specifics — corroborate g but don’t establish it
Yehuda Holocaust FKBP5 transgenerational findings — refuted/non-replicated; kept only as historical anchor for D1 distortion
Specific candidate-gene findings (5-HTTLPR depression) — refuted; kept as historical anchor for the field’s methodological turn (M8)

4. Weakest links

These are the load-bearing pieces with the lowest current confidence. Targeted attack on any one would do the most damage to the integrated picture.

W1: Generalization from candidate-GxE failure to “all GxE is small” (E5 → broader claim)

Why fragile: The candidate-gene collapse is definitive. The extrapolation that polygenic-score × environment interactions are also small is an inductive leap, not a result. As of 2025, the picture is partially holding the null but not strengthening it. A 2025 systematic review of 56 PGS×E studies for depression found mostly null or small effects. A multivariable PGS×E study of educational achievement (Allegrini et al. 2020) found “no evidence that GxE effects significantly contributed to multivariable prediction.” UK Biobank work (2024) on distinct explanations of GxE shows that many apparent GxE signals are confounded by scale, ascertainment, or population structure. The candidate-gene-failure extrapolation is looking less like an inductive leap and more like a substantive empirical pattern — but the literature is still too young for a strong null.

Pressure test: Several large preregistered PGS×E studies finding interactions explaining >5% variance would substantially revise this corner of the picture.

W2: Scarr-Rowe (E25) — has substantially weakened since pass-0

Why fragile: The original meta-analytic picture was “replicates in US, fails in W. Europe / Australia” (Tucker-Drob & Bates 2016). Ghirardi et al. 2024 (Netherlands Twin Register, polygenic-index design across 42 PGI×SES interaction tests for educational outcomes) found 39/42 negative, 0 significant positive, 1 marginally significant positive — i.e., the opposite sign from Scarr-Rowe in most cases. The picture in 2026 is closer to “the compensatory hypothesis (more genetic expression in low-SES because constrained environments suppress non-genetic variance) is the better-supported pattern, at least for educational outcomes.” E25’s weight has been downgraded from 3 → 2 to reflect this. The narrative “deprivation suppresses heritability” — popular in policy discourse — is now evidence-thin.

W3: Plomin-vs-Turkheimer interpretation of PGS (O1)

Why fragile: Both views are compatible with current data. Determines what PGS means — direct biology vs. summary statistic of correlated environments. The field publishes ambiguously across both interpretations. Will likely be settled only by within-family-only PGS that are still well-powered.

W4: Magnitude of AM-correction across psychiatric cross-disorder rg matrix (O7) — methods now emerging

Why still fragile but improving: Border 2022 showed xAM explains R²=74% of variance in genetic-correlation estimates but didn’t prove all rg’s are spurious — some genuine pleiotropy surely exists. As of 2024, the field is moving from flagging the problem to building correction methods. Ma, Wang, Border et al. 2024 (American Journal of Human Genetics) introduced LAVA-Knock, a local-genetic-correlation method using knockoff inference to reduce xAM-induced bias; tested across 630 trait pairs in simulation and real GWAS, it substantially reduces but does not eliminate the bias. A 2024 study found AM genetic signatures across SCZ, BD, MDD, alcohol phenotypes, and Tourette syndrome — confirming xAM is not selective. What’s still pending: how much of the cross-disorder rg matrix and the p-factor genetic signal survives systematic application of AM-correction methods at scale. Likely answer in 2–3 years.

W5: Gender Equality Paradox mechanism (O3, E14) — pattern strengthened, mechanism still contested

Empirical pattern: more robust as of 2025. Herlitz et al. 2025 systematic review (54 articles, 27 meta-analyses, Perspectives on Psychological Science) found the paradox replicates across personality, verbal abilities, episodic memory, and negative emotions. Balducci et al. 2024 extended it to within-individual academic strengths cross-temporally. The “this won’t replicate” objection has weakened.

Mechanism: still contested. Three live candidates: (a) innate-expression release in resource-rich environments, (b) reference-group / self-anchoring artifacts in self-report (people compare to their gender peers, not to humans-in-general), (c) wealth/freedom confounds with gender equality. Behavioral / incentivized-measure replications (Falk & Hermle 2018 for economic preferences) cover only part of the domain. The decisive test — non-self-report behavioral replication across personality and interests — is still incomplete. Each candidate mechanism implies different normative conclusions, which is part of why this remains contested rather than resolved.

W6: Flynn-reversal cause (O2) — and a new wrinkle on the positive manifold

Why fragile: The pattern is environmentally driven (within-sibship, Bratsberg & Rogeberg 2018), so “dysgenic” explanations are out. But no mechanism (screen time, education quality, attention, nutrition, lead, microplastics) has been pinned down with within-cohort empirical work. Pietschnig et al. 2024 (Vienna 2005–2018 cohort) added a structural twist: the positive manifold itself may be weakening across cohorts — meaning the recent rise/fall is not uniformly g-loaded. If confirmed broadly, this softly pressures A3 (g exists as a stable dimension) — not refuting it, but suggesting its strength may be cohort-dependent. Still not load-bearing for the integrated picture, but interacts with A3 in a new way.

W7: A6 (dimensional vs. taxonic) at psychiatric extremes

Why fragile: Most psychopathology is dimensional (taxometric evidence is robust), but for severe early-onset autism with intellectual disability, rare large-effect variants (CHD8, SCN2A, SYNGAP1) drive a partly taxonic picture. The “all dimensional” framing oversells continuity at the severe tail.

5. Variant views

The same graph, read four ways.

Variant A: Vulnerability map — where does this break?

The vulnerability map is the union of the three foundational cruxes (§3a), two reframer nodes (§3b), two logical guardrails (§3c), and seven weakest links (§4). Together they describe the smallest set of pressure points whose movement would force restructuring of the integrated picture:

Falsify A1: SNP-h² systematically <0.05 → twin-method discredited → Section 1 collapses
Falsify A2: within-family PGS → 0 → modern psychiatric genetics collapses
Falsify A3: positive manifold dissolves → Section 5 collapses
Falsify G2: within-family PGS = population PGS → genetic nurture is null → Plomin direct-causal view wins (O1 resolves)
Falsify G6 fully: AM correction barely changes rg matrix → cross-disorder pleiotropy is real
Violate L4: cannot be falsified, only ignored — but its violation in public discourse is the largest single source of public confusion

If exactly one of these were to flip, the rebuild would be: A1→ rebuild Section 1 only; A2→ rebuild Sections 1, 3, 7 (~40% of lit review); A3→ rebuild Section 5 (~25%); G2/G6→ keep numbers, rewrite causal interpretation throughout.

Variant B: Flow map — how does causation propagate?

Causation in this system runs in two directions, both important.

Forward developmental flow (genome → outcomes):

Genome (polygenic + few rare large-effect)
    │
    ├──> Temperament (infant biological reactivity: Surgency / NA / EC)
    │       │
    │       ├──> Active rGE / niche-picking ─────────┐
    │       │                                        │
    │       └──> Evocative rGE (eliciting responses) ┤
    │                                                │
    └──> Direct expression in brain development ─────┤
                                                     ▼
                                                Personality (adult)
                                                     │
                                                     ├──> Attainment
                                                     ├──> Relationships
                                                     ├──> Health behaviors
                                                     └──> Mortality

Indirect / dynastic flow (parents’ genome → offspring environment → offspring outcome):

Parents' genome
    │
    ├──> Parents' phenotype (income, vocabulary, parenting style, neighborhood choice)
    │       │
    │       └──> Offspring's rearing environment ────────┐
    │                                                    ▼
    │                                              Offspring outcomes
    │                                                    ▲
    └──> Transmitted alleles ───────────────────────────┘

The genetic-nurture finding (E6) says these two pathways have roughly equal magnitude for educational attainment. They are partially separable only via within-family designs (M4) or non-transmitted-allele PGS.

Cross-generational drift via assortative mating:

Mating choice (correlated on phenotype) 
    │
    └──> LD induction among causal variants (G5)
              │
              ├──> Inflated additive genetic variance
              ├──> Inflated h²
              ├──> Inflated cross-trait genetic correlations (G6)
              └──> Inflated PGS prediction accuracy

Variant C: Minimal claim set — smallest set supporting the conclusion

The smallest collection of claims that yields the integrated picture (S1) is eight nodes:

E1 — Mean trait h² ≈ 0.49 (heritability is real and substantial)
E4 — Polygenic architecture (no master genes)
E6 — Within-family PGS ≈ ½ population PGS (genetic nurture is real)
E7 — Cross-trait AM is a major source of inflated genetic correlations
E8 — A small set of large environmental insults have causal effects (lead, iodine, alcohol, deprivation, schooling)
L4 — Within-pop ≠ between-pop (Lewontin)
G7 — Stochastic developmental noise is the dominant source of non-shared environment
A6 — Most psychological variation is dimensional, not taxonic

These eight together generate the qualitative integrated picture without requiring detailed effect-size tables, cross-cultural caveats, or specific candidate-gene history. The remaining ~50 nodes refine and corroborate but do not change the shape.

Variant D: Politicization map — where does motivated reasoning concentrate?

This is the variant most relevant to the topic framing (“a minefield of motivated reasoning on all sides”).

Distortion-to-target matrix:

Distortion	Targets	Move	Counter-evidence
D1 Blank-slate	A1, E1, E10–E14	”Twin studies are flawed; differences are socialization”	SNP-h² (bypasses EEA), MZ-reared-apart, Su 2009 (d=0.93), CAH/primate convergence
D2 Hereditarian	L4, E22, E23, O4	”Group differences are genetic”	PGS portability collapse (E23); Lewontin (L4); cross-ancestry GWAS at scale don’t exist
D3 Gender-similarities	E10, E11, E14	”All differences are tiny (cite math d=0.05)“	Multivariate D=2.71 (E10); people-things d=0.93 (E11); GEP (E14)
D4 Pop-evpsych	A6, L7, E10–E14	”Men are X, women are Y” (categorical from dimensional)	A6 (dimensional); L7 (effect-size context)

Why all four distortions can target the same evidence base: the evidence base contains both large differences (people-things d=0.93) and trivial ones (math d=0.05) and strong heritability (h²=0.49) and large environmental insults (lead, schooling) and logical guardrails against between-group inference (L4). Any single-direction narrative requires selective citation. The integrated picture (S1) requires holding all of it at once.

Operational implication for the formalization stage: any model that only parameterizes the variance components without parameterizing the interpretation of those components will be silently captured by whichever distortion the reader is most prone to. The formal model needs to make L4, G2, G6, and the dimensional/taxonic distinction (A6) structurally visible, not just numerically present.

6. Topology → formalization handoff

What the next stage (model formalization) should pick up.

Ready for equations

Variance decomposition — fully specifiable now:

V(P) = V(A_direct) + V(A_indirect) + V(A_AM-LD) + V(C_residual) + V(E_measured) + V(E_stochastic) + 2·Cov(G,E) + V(GxE)

With each V parameterized by trait, age, population (US vs. Europe for E25), and method (twin / SNP / within-family). Cov(G,E) captures rGE; V(A_AM-LD) captures G5/G6; V(A_indirect) captures G2.
Method gradient (S2): twin h² ≥ SNP h² ≥ within-family h², with the gaps decomposable into AM, rGE, and rare-variant contributions. Parameterize as a function of estimation method.
Wilson-effect curve: h²(age) = a + b·log(age) or similar saturation form, with the slope driven by G1 (active rGE). Calibratable from Bouchard 2013 and Briley & Tucker-Drob 2013.
Multivariate sex-difference algebra: D² = (μ₁ - μ₂)ᵀ Σ⁻¹ (μ₁ - μ₂), with a worked example showing how D = 2.71 follows from moderate univariate ds and a positive-correlation covariance structure.
PGS-portability decay function: prediction accuracy as a continuous function of genetic distance from training population (Ding et al. 2023, Nature: r = −0.95 between genetic distance and accuracy across 84 traits).

Still at observation stage (formalization premature)

O1 — Plomin/Turkheimer interpretation of PGS: not yet a formal disagreement, just a verbal one
O3 — GEP mechanism: the algebra of “innate expression release” is not yet specified
O6 — what non-shared environment is: no candidate decomposition
O7 — share of cross-disorder rg that survives AM correction: empirical question pending — but methods are now emerging (LAVA-Knock); answer likely in 2–3 years, at which point this moves to “ready for equations”

Connection to adjacent topics in the LLM-iterate pipeline

This topology is the natural input to Parent-to-Child Transmission (planned topic). The genetic-nurture finding (G2/E6) and the dynastic-extension finding (Nivard 2024) are the empirical answers to “how much does parenting matter beyond genes” that the parent-child topic will need to build on. When that topic spins up, the variance decomposition equation here should be its starting point.

Less directly: the Evolution-Modernity Mismatch topic will lean on the GEP (E14/O3) and Flynn-reversal (O2) findings as evidence of environment-driven shifts in expressed psychological variation. Bedrock Generating Functions can read the variance decomposition itself as one such bedrock function.

7. Next moves — three options for Stage 3

The user picks one of these as the primary formalization target. Each leaves the others viable as later modules but shapes Stage 4 (data) differently.

Option A — Variance decomposition + method gradient (most central)

Build the central equation V(P) = V(A_direct) + V(A_indirect) + V(A_AM-LD) + V(C) + V(E_meas) + V(E_stoch) + 2·Cov(G,E) + V(GxE) parameterized by trait, age, population, and estimation method. Build a tool that takes a published h² estimate (twin, SNP, or within-family) and outputs a method-corrected estimate with explicit AM/rGE adjustment.

Pros: most central to the topic; directly answers “what generates psychological variation”; feeds Stage 4 cleanly (every term has published estimates somewhere). Cons: many parameters; risk of producing a calculator nobody uses without strong UI judgment. Stage 4 implication: pull h² estimates from PGC, SSGAC, GIANT consortia; calibrate the method-gradient term per trait class.

Option B — Multivariate sex-difference algebra (most pedagogically clean)

Formalize how moderate univariate Cohen’s ds combine into a large multivariate Mahalanobis D, with a worked Big-Five example showing how D ≈ 2.71 emerges from |d| ≈ 0.4 ds and a positive-correlation Σ. Build a dashboard letting the user dial univariate ds and the correlation matrix to see D move.

Pros: tightly scoped; resolves the single biggest framing trap in the GEP debate (univariate vs. multivariate framings of the same data); high pedagogical leverage. Cons: narrower than A; doesn’t engage the heritability core. Stage 4 implication: pull effect-size matrices from Del Giudice 2012 and Schmitt 2008 cross-cultural data; replicate D under different correlation structures.

Option C — PGS-portability calibration (most practically useful)

Turn Ding et al. 2023’s continuous decay finding into a usable accuracy estimator: enter an individual’s genetic distance from the PGS training population and get an accuracy-decay multiplier. Apply across the major trait PGSs (EA, SCZ, BMI, etc.).

Pros: directly addresses a real-world bias; smallest scope; ships fastest; useful even outside this project’s domain. Cons: less central to the heritability question; might fit better as a tool than a topic-stage. Stage 4 implication: pull cross-ancestry GWAS validation data from the All of Us / GenomeAsia / H3Africa consortia.

My recommendation: A as primary (most central to the topic’s stated purpose), with B as a stretch module if scope allows. C is high-value but might better live as a standalone tool promoted to /models later.

8. Objections to this topology (adversarial + steelman)

Four ways a careful reader could push back. The strongest version of each, then my response.

Objection 1 — Discrete typed edges falsify a continuous, magnitude-weighted, context-conditional system

Heritability is not “supported by” a twin study in the same binary way that a logical implication holds. The system is a tightly coupled developmental process; flattening it into nodes-and-arrows with discrete edge types loses information about magnitude, conditional dependence, and gradient relationships.

Response: Acknowledged, and intentional. The topology is the qualitative skeleton; edge weights and conditional dependencies are the job of Stage 3 (formalization), where each edge will be turned into a parameterized function. The graph’s value is not that it stands in for the full system but that it makes the structure visible cheaply enough that the formalization knows where to put the parameters.

Objection 2 — The crux/decorative split is editorial, not empirical

There is no algorithm that picks crux nodes; the choice depends on which failure modes you are worried about. A 1990s topology of this field would have crowned candidate-gene findings as cruxes. Naming A2 (GWAS signal real) a crux today is a judgment call about the field’s current methodological commitments — not an objective feature of the science.

Response: Correct. Cruxes are time-stamped. This topology is a 2026 snapshot. If the field shifts (post-AM-correction era, post-within-family-PGS-at-scale era) the crux set will shift — that is what the refinement passes are for. Use this as a current map, not an immutable structural claim.

Objection 3 — Calling L4 a “logical firewall” overstates the case

Lewontin’s 1970 argument has been challenged. Edwards (2003) “Lewontin’s Fallacy” showed that Lewontin’s specific quantitative point — that ~85% of human genetic variance is within rather than between populations — does not preclude reliable population-classification from genetic markers. Modern population genetics treats between-population genetic inference as more nuanced than the firewall framing suggests.

Response: The Edwards critique is real, but it addresses a different claim. Edwards refuted “you cannot reliably classify individuals into populations from genetic data.” The L4 firewall as I formulate it says “within-population heritability provides no information about between-population mean differences without strong auxiliary assumptions about shared causal architecture and equal environments.” Those are different propositions. PGS portability collapse (E23) is the contemporary empirical evidence that the auxiliary assumptions are not currently being met for psychological traits. The firewall framing survives the Edwards critique; its strength rests on the empirical PGS-portability finding, not on the original Lewontin variance argument alone.

Objection 4 — The Politicization variant is meta-commentary, not topology

The D nodes and attacks edges describe how people misuse the evidence base. That is epistemics or sociology of science, not structural topology of the field. A pure topology should omit them.

Response: Fair, and the inclusion is non-orthodox. It is justified here only by the topic framing — the user’s prompt explicitly described the field as “a minefield of motivated reasoning … where the actual generating functions are obscured by politics.” A topology of just the science would omit the D nodes; a topology that helps a reader navigate the field as it is actually encountered should include them. The D nodes will not be carried into Stage 3 formalization — they exist for navigation, not for downstream computation.

9. Glossary

For readers approaching this from outside the field. Terms appear throughout the lit review and topology; this is the lookup table.

Term	Meaning
h²	Heritability — fraction of trait variance in a population attributable to genetic variation. A population statistic, not an individual one.
SNP	Single-nucleotide polymorphism — a single-base difference at a position in the genome where multiple variants exist in the population.
GWAS	Genome-wide association study — scans hundreds of thousands of SNPs against a measured trait, looking for statistical association.
PGS	Polygenic score — a per-individual sum of trait-associated SNPs weighted by their GWAS effect sizes. Used as a predictor.
LD	Linkage disequilibrium — non-random association between alleles at nearby loci, typically because they are inherited together.
AM	Assortative mating — partners resemble each other on a trait above chance. xAM = cross-trait AM (e.g., taller-than-average partners with more-educated-than-average).
rGE	Gene-environment correlation. Passive (parents transmit genes + correlated environment), evocative (heritable traits elicit responses), active (people select environments matching propensities).
GxE	Gene-environment interaction — the same genotype produces different phenotypes in different environments.
EEA	Equal environments assumption — the twin-method assumption that MZ and DZ twins are treated similarly enough that any extra MZ phenotypic resemblance reflects genetics, not differential treatment.
MZ / DZ	Monozygotic (identical, ~100% shared DNA) / dizygotic (fraternal, ~50% shared DNA) twins.
rg	Genetic correlation between two traits — how much the same genetic variants influence both.
WGS	Whole-genome sequencing — capturing every base in the genome, including rare variants GWAS misses.
g-factor	General factor of cognitive ability — the latent dimension behind the positive manifold (every cognitive test correlates positively with every other).
p-factor	Proposed general factor of psychopathology — analogous to g, derived from cross-syndrome correlations.
CHC / HiTOP	Cattell-Horn-Carroll cognitive-ability hierarchy / Hierarchical Taxonomy of Psychopathology (a dimensional alternative to DSM).
d (Cohen’s d)	Standardized mean difference between two groups, in standard-deviation units. Effect-size labels (small / medium / large) are scale-dependent — see L7 in the node catalog.
Mahalanobis D	Multivariate generalization of Cohen’s d — distance between two group means in the geometry of the trait space, accounting for correlation between traits.
Within-family design	Comparing siblings or MZ-discordant twins or parent-offspring trios within the same family — controls for between-family confounds (population stratification, AM, passive rGE).
Genetic nurture	Effect of parents’ genotype on offspring outcomes via the environment the parents create — including alleles the parent did not transmit.

10. Stage_outputs convention reference

Raw working drafts from each LLM-iterate stage live at:

stage_outputs/<topic>/<stage>.md

Where <topic> is kebab-case (e.g., human-psych-variation) and <stage> is one of: lit-review, topology, model, data, build, writeup. Polished versions move into src/content/ai_research/<topic>/<stage>.mdx with proper frontmatter (title, description, date, status, refinementPass, refinementLog) once ready to publish on the site.

The interactive D3 graph for this topology lives at src/components/research/PsychVariationGraph.tsx and is mounted in src/content/ai_research/human-psych-variation/topology.mdx via client:load.

Read full stage →

Iteration history

Pass 3 2026-04-28

error check (citations)crux re-classificationredundancy pruneconnectionsnext moves

Why Pass-2 introduced two attribution errors (Figlio for Ghirardi; Wang for Wainschtein) that I caught with verification searches. Separately, calling A1/A2/A3 plus G2/G6 plus L4 all "cruxes" conflated three different types of high-stakes nodes — the most common conceptual error in pop-science treatments of this field. The "Load-bearing vs decorative" section was redundant with the node-catalog weights. And the document lacked a clear "choose your Stage 3 path" deliverable.
- Verified all pass-2 citations via WebFetch on PubMed
- Fixed Figlio et al. 2024 → Ghirardi et al. 2024 (lead author Gaia Ghirardi, Res Soc Strat Mobil) and corrected Norwegian → Netherlands Twin Register
- Fixed Wang et al. 2025 → Wainschtein et al. 2025 (Nature) — same Wainschtein who did the 2022 height paper, expanded to UK Biobank scale
- Restructured §3 from "6 cruxes" into three explicit categories: §3a Foundational cruxes (A1, A2, A3 — falsification breaks regions), §3b Reframer nodes (G2/E6, G6/E7 — open magnitude reshapes interpretation), §3c Logical guardrails (L1, L4 — unfalsifiable but load-bearing for interpretation), §3d Decorative material
- Dropped old §4 (Load-bearing vs decorative) — info already in node-catalog weights; decorative list moved into §3d
- Renumbered §5→§4 (weakest links), §6→§5 (variants), §7→§6 (handoff)
- Updated §5 Variant A vulnerability map to reference the new 3a/3b/3c/4 structure instead of the old "six cruxes + seven weakest links" framing
- Rewrote TLDR para 2 to introduce the foundational / reframer / guardrail trichotomy explicitly (the conceptual move that this pass installs)
- Added §6 connection note pointing to Parent-to-Child Transmission and Evolution-Modernity Mismatch as natural downstream consumers of this topology
- Added §7 Next moves — three discrete Stage-3 options (variance decomposition + method gradient / multivariate sex-difference algebra / PGS-portability calibration) with pros/cons/Stage-4 implications and an explicit recommendation
- Updated PGS-portability item in §6 from "Martin 2019 4.5× drop" to "Ding 2023 continuous decay"
- Component: synced Ghirardi and Wainschtein attributions in node detail strings
Pass 2 2026-04-28

gap scan with web research

Why Pass-1 deferred web research because the parent harness initially seemed to block WebSearch. After user pushback I confirmed WebSearch was available, ran 12 targeted searches across the topology weakest links and open questions, and incorporated the 2024-2025 findings that actually shift the picture.
- M9 (WGS): missing-heritability problem now substantially resolved (~88% of pedigree h² in 2025 paper)
- E23 (PGS portability): reframed as continuous decay along genetic-distance continuum (Ding 2023, r=−0.95)
- E25 (Scarr-Rowe): downgraded weight 3→2, status ~→✗; compensatory hypothesis is the better-supported pattern
- M4 (within-family designs): strengthened with Howe 2022 178k siblings × 25 phenotypes
- E14 (GEP): pattern strengthened by 2025 systematic review across multiple domains
- O7 (AM-correction magnitude): LAVA-Knock method now emerging
- G2 (passive rGE): extended to dynastic / extended-family processes (Nivard 2024)
- O2 (Flynn reversal): positive-manifold-weakening wrinkle (Pietschnig 2024)
- TLDR weakest-links paragraph rewritten to reflect 2026 state
Pass 1 2026-04-28

error checkgap scanadversarial + steelmanreadability

Why Pass-0 had a logical cycle (A3 ↔ E18), an unsupported quantitative claim, two near-duplicate nodes, one mistyped edge, and no defense against the strongest objections. Drag was also broken on the interactive graph.
- Fixed drag bug (pointer events with setPointerCapture)
- Removed bidirectional A3 ↔ E18 cycle
- Reframed E5 → E4 from "implies polygenic" to "consistent with polygenic"
- Reframed L1 → E1 edge type from mod to imp with explicit label
- Removed redundant A4 → L1 edge
- Softened TLDR claim from "~70% of inferential weight" to "sit upstream of most empirical and synthesis nodes"
- Added §8 Objections to this topology (4 objections + steelman + response)
- Added §9 Glossary (19 terms)
Pass 4 2026-04-29

internal consistency checkcross-stage consistency

Why After the pipeline-wide AM-direction correction propagated through writeup (pass 5), build (pass 6), model (pass 8), and data (pass 8), the topology stage was the only remaining stage that had not been audited for the same wrong-direction framing. A targeted check found three places that needed correction: the topology graph component (PsychVariationGraph.tsx) had G5 node detail saying "Assortative mating creates linkage among causal variants → inflates h² and PGS" (true for population V(A) and PGS effect sizes, but "inflates h²" without qualification implies twin h² which AM actually deflates) plus the G5 → E1 edge label "AM inflates h²" (same issue); the topology.mdx node catalog G5 entry had the parallel "Inflates additive genetic variance, h²" framing.
- PsychVariationGraph.tsx G5 node detail rewritten to specify "inflates V(A) at population level (Yengo 2018: 14–23% V(A_LD)/V(A) for height) and PGS effect sizes" with the explicit caveat that this counterintuitively biases Falconer twin h² downward and that the empirical twin-vs-WF gap for socially-structured traits is dominated by genetic nurture / EEA, with AM partially offsetting
- PsychVariationGraph.tsx G5 → E1 edge label changed from "AM inflates h²" to "AM inflates V(A) at population level"
- topology.mdx node-catalog G5 entry rewritten to match the React component description
- Did NOT modify topology.mdx line 271 (E1 + E4 + G2 + G5 ──imp──> S2 cascade): this is a high-level dependency claim that G5 contributes to the method gradient, which is true at the population-V(A) level — the detail of who-inflates-what under different estimators lives in the model formalization stage
- Did NOT modify edge type table line 99 ("AM inflates rg"): refers to cross-trait AM inflating cross-disorder rg, which is the Border 2022 finding and is correct as stated

pass 8

Generating function for human psychological variation. One equation per person; variance decomposition follows. Closed-form pieces: Crow–Felsenstein AM inflation, Wilson-Effect saturation, genetic-nurture additive split, multivariate sex-difference Mahalanobis D. Twin / SNP / within-family heritability are projections of the same decomposition. Interactive dashboard included.

TLDR

The topology answered “what depends on what?”. The formalization answers a sharper question: given a person, where does their phenotype come from in expectation? The answer is a single generating function that, once written down, dissolves several apparent paradoxes in the field — most importantly the gap between twin heritability, SNP heritability, and within-family heritability (they estimate different sums of the same underlying components, and the differences are informative).

The spine of this stage is one equation. Phenotype P for a person in a population is P = A_d + A_i + A_LD + C + E_m + E_s + I, with each term a contribution from a distinct mechanism: direct genetic effects from the person’s own transmitted alleles, indirect genetic effects from parental (and broader-family) genomes operating through the environment they create, assortative-mating-induced linkage among causal variants, residual shared environment, measured non-shared environment, stochastic developmental noise, and gene-environment interaction terms. Variance decomposition follows directly, and is block-orthogonal rather than fully orthogonal: V(P) = ΣV(component) + 2·Cov(A_d, A_i) + 2·Cov(A_d, E_m) + 2·Cov(A_d, C) + V(I). The cross-terms are the formal home of every gene-environment correlation finding in the literature; pretending they are zero is the most common modeling error. Three closed-form pieces drop out — the Crow–Felsenstein assortative-mating partition V(A_LD) = h²_obs · r_δ with r_δ = m·h²_obs (the dashboard partitions h² rather than inflating it; the equilibrium is reached in 5–10 generations of stable assortment), the Wilson-Effect logistic curve h²(t) = h²_∞ / (1 + exp(−k·(t − t_50))), and the method gradient that says twin h² ≥ SNP h² ≥ within-family h² with the gaps decomposable into AM-LD, indirect-genetic, and rare-variant pieces.

A second module handles the multivariate sex-difference algebra, because the single largest framing trap in this field is the gap between univariate Cohen’s d (typically 0.2–0.6 across personality dimensions) and the multivariate Mahalanobis distance D² = Δμᵀ·Σ⁻¹·Δμ (which can hit 2.7 when traits are weakly correlated and you stack 15 of them, as in Del Giudice 2012). The same data, two numbers, opposite-sounding stories — both correct. The formalization makes the bridge explicit so the reader can dial univariate d’s and inter-trait correlations and watch D move.

What this stage does not formalize: the Plomin/Turkheimer interpretation of polygenic scores (verbal disagreement, no candidate equation), the mechanism behind the Gender Equality Paradox (three live hypotheses with no shared formalism), and the magnitude of AM-correction across the full cross-disorder genetic-correlation matrix (active research, methods just emerging). These remain at the observation stage; premature math here would mask uncertainty rather than reduce it. The L4 Lewontin firewall is preserved as a structural property of the model: the entire generating function is within-population, and nothing in it licenses between-population mean inference.

Inputs

Trait class

Age (years)25

mSpousal phenotypic correlation0.40

β_i/β_dIndirect / direct genetic ratio0.40

rareRare-variant share of direct0.10

Anchors

Variance decomposition

V(A_d) common48.7%

V(A_d) rare5.4%

V(A_LD) AM-induced25.2%

V(A_i) genetic nurture8.7%

V(C) shared env6.1%

V(E) non-shared5.9%

Method gradient

Twin h² (ACE)

0.79

A_d + A_LD

A_i lands in C

SNP h² (LDSC)

0.77

A_d,common + A_LD + ½·A_i

Within-family

0.54

A_d only

Assortative-mating partition

r_δ

0.317

m · h²

V(A_d) / h²

0.68

direct share of additive

Wilson h²(t) is the AM-equilibrium population heritability V(A_AM)/V(P). The Crow-Felsenstein partition splits V(A_AM) into V(A_d) (clean direct, what within-family designs estimate) and V(A_LD) (population-level linkage among trait-relevant alleles induced by non-random mating). Note that classical twin h² (Falconer) is *biased downward* relative to V(A_AM)/V(P) by factor (1 − m_A) — AM raises DZ correlation relative to MZ correlation — but is typically inflated upward by EEA violations and genetic-nurture leakage, with the net effect for socially-structured traits being upward overall. V(A_i) is added on top as the variance contribution of genetic nurture; the gap between empirical twin h² and within-family h² for socially-structured traits is dominated by genetic nurture and EEA, not AM (see the model stage §2.2 caveat).

How to read this stage

The dashboard above is the artifact. Everything below is the spec.

In plain language: when researchers report a “heritability of 0.50,” what is being claimed is that if you took the variance in a trait across a population and asked how much of it tracks genetic differences, half of it does. It is a statement about the population’s variance, not about any single person, and not about between-population differences. It says nothing causal beyond that — high heritability is fully compatible with large environmental effects (height is ~80% heritable and has risen ~10cm in a century). Different methods estimating “heritability” answer slightly different questions: twin studies pick up the broadest definition, within-family designs the narrowest. The gap between them is informative.

The stage formalizes that picture by writing one equation per person, decomposing it into named pieces, and showing how each measurement method projects onto a different subset of the pieces. Three closed-form sub-equations follow (assortative-mating inflation, Wilson-Effect age curve, genetic-nurture split). A second module addresses the same algebra applied to group differences — most prominently sex differences, but the framework is general. The dashboard lets you turn the knobs and watch the consequences.

You can read this top-down (TLDR → equation → closed forms → boundary conditions) or bottom-up (play with the dashboard, then come back to the equations when something surprises you). Either order works. The cruxes section at the end (§12) is where the load-bearing assumptions live; if any one fails, parts of the picture have to be rebuilt.

1. Move I’m making

This stage is a decomposition + generating function + integration, in that order:

Decomposition — orthogonalize phenotypic variance into mechanism-specific components, with explicit non-orthogonal Cov(G,E) and interaction terms as the principled exceptions.
Generating function — write the per-person phenotype as a deterministic function of those components plus stochastic noise. The variance decomposition follows by taking V(·) of the generating function.
Integration — show that twin, SNP, and within-family heritability estimators are projections of the same underlying decomposition onto different observable subspaces. The Wilson Effect, AM inflation, and genetic-nurture findings then read as motion of those projections, not as separate phenomena.

What’s not ready: anything in the topology marked O (open), and the polygenic-score causal-vs-summary debate, where the underlying disagreement isn’t yet a formal one.

2. The generating function

For a single person i in a population at developmental time t, sampled from a stable mating regime:

P_i(t) = A_d,i + A_i,i + A_LD,i + C_i + E_m,i + E_s,i + I_i  +  μ(t)

Term	Mechanism	Source identity
`A_d`	Direct genetic — additive effect of person’s own transmitted causal alleles, evaluated as if mating were random	`Σ_k β_k · g_{ik}` over causal SNPs k
`A_i`	Indirect genetic (genetic nurture) — additive effect of parents’ (and extended-family) genotypes operating through the rearing environment	parents’ PGS × environmental transmission coefficient
`A_LD`	Assortative-mating LD inflation — additional additive variance induced by linkage among causal variants from non-random mating	At AM equilibrium, `V(A_d) + V(A_LD) = h²_obs`; the partition is `V(A_LD)/h²_obs = r_δ`.
`C`	Shared environment residual — environmental effects shared by siblings not already captured by `A_i`. Adult personality: ~0. Education / religiosity / politics: nonzero
`E_m`	Measured non-shared environment — identifiable causes (lead, schooling, head injury, peer composition, nutrition)	each enters with a measured causal coefficient, e.g. lead: β ≈ −6.2 IQ pts per 1–10 µg/dL
`E_s`	Stochastic developmental noise — unmeasured non-shared variance: developmental contingencies, immune/microbial, microscale neural variation, measurement error	the unmodeled residual; ~50% of personality variance
`I`	Interaction terms — `G×E`, `G×G` (epistasis), `G×age`. As of 2025 evidence, generally small at PGS-by-environment scale; large only at extreme environmental insults	residual non-additivity
`μ(t)`	Population mean at age t — not a person-level term but the developmental trajectory the person grows through	calibrated to age-norm tables

Why this form: this is the additive-decomposition default of quantitative genetics extended with the two corrections that the 2018–2025 literature has installed into the field — separating A_d from A_i (Kong 2018, Young 2022) and separating A_d from A_LD (Border 2022, Yengo 2018, Wainschtein 2025). Earlier formulations folded A_i into A_d and A_LD into A_d and got the wrong answer about how much of the population-level genetic signal is direct biological causation. The within-family literature is what made these terms separately estimable.

Scope note — scalar trait, not g-loaded vector: P_i(t) is written as a scalar for one trait at a time. For cognitive ability, this collapses an underlying multi-ability structure (g + specific abilities, the CHC hierarchy) into a single phenotypic measure. The collapse is faithful when reporting g-loaded composite scores (e.g., full-scale IQ), and reasonable for any single primary ability. It is not faithful when the question is “how much of A_d for cognition is g versus specific abilities” — that requires a multivariate extension where each ability gets its own decomposition and g enters as a latent common factor across them. The topology’s foundational assumption A3 (g exists as a real dimension) lives at this level: the model below operates inside a single ability/composite and inherits g as a property of which ability is being measured rather than as a structural component. For sex differences (Module B, §3.4), the multivariate extension is necessary by construction; that’s why it appears as a separate module.

2.1 Variance decomposition

Taking variance of the generating function and tracking the cross-terms:

V(P) = V(A_d) + V(A_i) + V(A_LD)
     + V(C) + V(E_m) + V(E_s)
     + 2·Cov(A_d, A_i)        ← genetic nurture is correlated with direct effects (parents pass both)
     + 2·Cov(A_d, E_m)        ← active rGE: people select environments matching propensities
     + 2·Cov(A_d, C)           ← passive rGE residual (small once A_i is split out)
     + V(I)

The off-diagonal Cov terms are why “orthogonal decomposition” is the wrong frame for this system. The system is block-orthogonal: the additive components are roughly orthogonal to the residual environment but not to each other, and the cross-terms are the formal home of every gene-environment correlation finding in the literature. Pretending they’re zero is the single most common modeling error.

2.2 Heritability identities

Three quantities are estimable from data; each picks up a different subset of the variance terms. The mapping is more subtle than a casual reading of the literature suggests, and it is worth getting right because the public-discourse confusion about “twin studies overestimate” turns on this exact algebra.

The non-obvious point about twin h²: V(A_i) (genetic nurture) is shared identically by MZ and DZ co-twins, because they share the same parents. Under a correctly specified ACE model, this variance lands in C, not A. So a faithful classical twin model does not count genetic nurture as heritability. The empirical observation that twin h² > within-family h² (e.g., for EA: 0.40 vs ~0.15) is therefore not due to twin h² capturing A_i directly. It is mostly due to two model-misspecification leakages: the ACE assumption rDZ_A = 0.5 fails under assortative mating (true sibling additive correlation under AM is 0.5·(1+r_δ)), and the assumption that genetic nurture’s contribution is fully shared between siblings can fail if parents differentially treat MZ vs DZ pairs.

Estimator	What it estimates (correctly specified)	Practical leakages
Twin h² (classical ACE: `2·(rMZ − rDZ)`)	`V(A_d) + V(A_LD)`	Under unmodeled AM, some `V(A_i)` and `V(C)` bleed into A. Empirically, classical twin h² for EA exceeds within-family by ~0.20–0.25.
SNP h² (GREML, LDSC on population GWAS)	`V(A_d, common) + V(A_LD, common) + V(A_i, common)·attenuated`	Population GWAS effect sizes `β_pop = β_d + k·β_i` (where `k` is the AM coupling between transmitted and non-transmitted alleles), so SNP h² is inflated by some `V(A_i)`, but attenuated relative to the full `V(A_i)` because `k < 1`. Excludes rare variants.
WGS h² (Wainschtein 2025)	SNP h² + `V(A_d, rare) + V(A_LD, rare)`	Closes the rare-variant gap; same `A_i` contamination as SNP h² unless within-family.
Within-family h² (sib-FE, MZ-discordant, parent-offspring trios)	`V(A_d)`	Removes `A_i` and `A_LD` cleanly; leaves direct additive only. With WGS: `V(A_d) + V(A_d, rare)`.

This is the method gradient (S2 in the topology):

twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h²

The gaps are not measurement error. They are the data’s way of telling you how much of “heritability” is structural (AM-LD), how much is environmental-via-parents (A_i), and how much depends on rare variants common-variant arrays cannot tag.

For educational attainment in 2025: classical twin h² ≈ 0.40, common-variant SNP h² ≈ 0.20–0.25, WGS h² ≈ 0.30 (with rare-variant contribution), within-family additive ≈ 0.15.

Important caveat on Falconer’s AM bias (added in pass 6 after a reviewer correction). The “What it estimates” column above is exact only under random mating. Under positive AM, Falconer’s formula 2·(rMZ − rDZ) is biased downward by factor (1 − m_A) where m_A ≈ m·h² — fraternal twins share more than 50% of trait-relevant alleles because their parents are genetically more similar than chance, raising rDZ relative to rMZ and shrinking the formula’s output. So Falconer estimates [V(A_d) + V(A_LD)] · (1 − m_A) / V(P), not V(A_AM)/V(P) directly.

This matters for interpreting the gap. When classical twin h² > within-family h² for socially-structured traits (EA: 0.40 vs 0.15), the gap is dominated by other classical-ACE biases — primarily the equal-environments assumption (MZ co-twins are treated more similarly than DZ co-twins, inflating MZ correlation) and genetic-nurture leakage (V(A_i) leaks into A under model misspecification rather than landing cleanly in C) — partially offset by the AM downward bias. AM is a real phenomenon at the population level (Crow-Felsenstein V(A_LD) inflation; see §3.1 below) but it does not, on net, drive the twin-vs-within-family gap. The dominant inflation source is genetic nurture and EEA, with direct empirical anchors in Kong 2018 (non-transmitted PGS effect = 29.9% of transmitted for EA) and Okbay 2022 EA4 (within-family direct ~50% of population PGI). Within-family designs control for AM, EEA, and genetic nurture simultaneously.

This is the single calculation a careful reader of “twin studies vs molecular studies” headlines should be able to do. The numbers don’t disagree; they answer different questions.

3. Closed-form pieces

Three components admit clean equations. The rest are calibrated empirically.

3.1 Assortative-mating inflation (Crow–Felsenstein)

There are two ways to use the AM-inflation formula, and they answer different questions.

Forward problem (rarely the relevant one): given the random-mating heritability h²_rm of a trait, what is the equilibrium heritability after stable AM? The answer is a fixed-point coupling r_δ = m · h²*, V_A* = V_A / (1 − r_δ), h²* = V_A* / (V_A* + V_E), reached in ~5–10 generations of stable assortment (Crow & Felsenstein 1968). One-iteration approximation: r_δ ≈ m · h²_rm, inflation ≈ 1 / (1 − m·h²_rm).

Inverse / partition problem (what the dashboard does): given the AM-equilibrium population additive variance V(A_AM)/V(P) = h²_obs, partition it into the random-mating-equivalent direct component V(A_d) and the AM-induced LD inflation V(A_LD):

r_δ      ≈ m · h²_obs
V(A_d)   = h²_obs / (1 + r_δ + r_δ² + …)  =  h²_obs · (1 − r_δ)
V(A_LD)  = h²_obs − V(A_d)  =  h²_obs · r_δ

The Wilson curve gives h²_obs(t) directly, so the partition uses r_δ = m · h²_obs(t) with no iteration needed. This is what the dashboard implements. Pass-2 versions of the dashboard erroneously inflated h²_obs on top of itself, pushing twin h² above 1.0 at high parameter values; pass-4 corrected this.

Note on what h²_obs should represent here (added in pass 6 after a reviewer correction). The partition formula is a clean population-level decomposition of V(A_AM). Different estimators recover V(A_AM)/V(P) with different biases: SNP-based heritability (GREML / LDSC on unrelated individuals) recovers it approximately unbiased; classical twin h² (Falconer) recovers V(A_AM)/V(P) · (1 − m_A) — biased downward by AM, partially offset upward by EEA violations and genetic-nurture leakage. The dashboard’s Wilson-fit twin estimates conflate these biases. The partition formula’s empirical validation is the match against SNP-based AM-LD estimates: Yengo 2018 measures V(A_LD)/V(A) for height at 14–23% empirically, matching the formula’s prediction of m·h² = 20%. For socially-structured traits where Falconer twin h² is itself substantially inflated by EEA + genetic nurture, applying the partition formula to twin h² over-attributes the partition share to AM relative to its true population-level magnitude.

Worked anchors:

Educational attainment with m ≈ 0.4, h²_obs ≈ 0.40 (twin) → r_δ ≈ 0.16, V(A_d) ≈ 0.34, V(A_LD) ≈ 0.06. Caveat: the h²_obs ≈ 0.40 here is the Falconer twin estimate, which is a biased proxy for the AM-equilibrium V(A)/V(P); applying the partition to the SNP-based estimate (~0.13) would give a smaller absolute V(A_LD).
Height with m ≈ 0.25, h²_obs ≈ 0.85 → r_δ ≈ 0.21, V(A_d) ≈ 0.67, V(A_LD) ≈ 0.18 — matches the 14–23% empirical “AM-inflated” share Border et al. and Yengo et al. report (this is the trait where the partition’s empirical validation is cleanest, because Falconer-bias vs SNP-h² discrepancies are smaller for height than for socially-structured traits).

Cross-trait AM (m_xy ≠ 0) extends the same logic to off-diagonal entries of the genetic-covariance matrix and is the formal reason E7 finds R² = 0.74 between phenotypic-cross-mate correlations and genetic-correlation estimates. The cross-trait AM result (Border 2022) survives independently of the within-trait Falconer-bias issue: it’s about between-trait LD inflating reported genetic correlations between disorders, which is empirically validated and not in dispute.

3.2 Wilson-Effect saturation curve

Heritability of cognitive ability rises with age because active rGE (G1) compounds: as children gain agency, they select environments matching their genetic propensities, amplifying genetic variance and shrinking shared environment. The empirical age curve from Bouchard 2013 and Briley & Tucker-Drob 2013 is sigmoidal — slow rise in early childhood, fastest gain in late childhood / early adolescence, saturation in late adolescence. A logistic gives a clean three-parameter fit:

h²(t) = h²_∞ / (1 + exp(−k_h · (t − t_50)))

With h²_∞ ≈ 0.80, t_50 ≈ 9 years (age at half-asymptote), and k_h ≈ 0.30/year: h²(5) ≈ 0.19, h²(10) ≈ 0.46, h²(15) ≈ 0.69, h²(25) ≈ 0.79. These match Bouchard’s anchors within ~3 percentage points across the full developmental range.

(Earlier passes used a saturating-exponential h²_∞ − (h²_∞ − h²_0)·exp(−k·t), which rises too fast at the young end — it produced h²(5) ≈ 0.52 for cognition vs the empirical ~0.20. The logistic form is the smallest functional change that fits the empirical sigmoidal pattern.)

The shared-environment trace runs an inverse path with a non-zero asymptote, since shared environment for cognition does not actually drop to zero in adulthood (~0.05 plateau is well-attested):

c²(t) = c²_∞ + (c²_0 − c²_∞) · exp(−k_c · t)

Cognition: c²_0 ≈ 0.50, c²_∞ ≈ 0.05, k_c ≈ 0.15/year. For Big Five personality, c²_∞ ≈ 0 is appropriate (shared family environment effectively vanishes for personality by adulthood). For educational attainment and religiosity, c²_∞ ≈ 0.10–0.15 should be substituted — these are exception traits where shared environment persists throughout life.

Both formulas are phenomenological — the parameters are not derived from a deeper model. They are calibration knobs for the dashboard.

3.3 Genetic-nurture decomposition (additive form)

Define g_T as the offspring’s transmitted-allele PGS and g_NT as the parental non-transmitted-allele PGS. Then:

A_d = β_d · g_T
A_i = β_i · g_NT

Empirically (Kong 2018, Wang 2021, Okbay 2022, Howe 2022):

β_i / β_d ≈ 0.3 – 0.5  (educational attainment)
β_i / β_d ≈ 0.0 – 0.1  (height, BMI)
β_i / β_d ≈ 0.4 – 0.6  (cognitive performance)

β-level vs variance-level. The ratio β_i/β_d quoted above is at the regression-coefficient level. Translating to a variance contribution requires squaring (for the pure variance term) and an explicit cross-term:

V(A_i)              = β_i² · V(g)         =  (β_i/β_d)² · V(A_d)
2·Cov(A_d, A_i)     = 2·k · β_d · β_i · V(g)  =  2·k · (β_i/β_d) · V(A_d)

Where k is the AM-induced correlation between an offspring’s transmitted-allele PGS and the parental non-transmitted-allele PGS. Under random mating k ≈ 0 (Mendelian segregation makes them independent). Under stable AM, k > 0 because spousal phenotypic correlation creates correlation between mom-transmitted alleles and dad-non-transmitted alleles (and vice versa); for AM-strong traits (EA, height) k is empirically in the 0.1–0.5 range, depending on the strength and stability of assortment.

This means 2·Cov(A_d, A_i) is not generally larger than V(A_i). For EA with β_i/β_d ≈ 0.4, V(A_d) ≈ 0.15, and a moderate k ≈ 0.2: V(A_i) ≈ 0.024, 2·Cov ≈ 2·0.2·0.4·0.15 = 0.024. Total genetic-nurture variance contribution ≈ 0.048 — modest, on the same order as V(A_i) itself.

The dashboard displays only the pure V(A_i) = (β_i/β_d)² · V(A_d) slice as a clean variance bucket. The cross-term 2·Cov(A_d, A_i) is the leakage path that makes empirical twin h² (under unmodeled AM) exceed the dashboard’s “Twin h² (ACE)” output. It is acknowledged in the help text rather than allocated to a separate bar segment, partly because k is poorly constrained empirically and partly because adding a cross-term slice would over-clutter the visualization without changing the qualitative picture.

The relation V(A_i) + 2·Cov(A_d, A_i) ≈ V_PGS,population − V_PGS,within-family is approximate but useful — it turns “missing heritability after within-family correction” from a puzzle into an order-of-magnitude measurement. The exact RHS is (β_i/β_d) · V(A_d) · (2k + (β_i/β_d)), which depends on k and degrades to small values when AM is weak.

3.4 Multivariate sex-difference algebra (Module B)

For a trait vector x with covariance matrix Σ and group means μ_F, μ_M, the multivariate effect size is the Mahalanobis distance:

D² = (μ_F − μ_M)ᵀ · Σ⁻¹ · (μ_F − μ_M)

For uncorrelated traits with equal univariate effect sizes |d|, D² = n·d² so D = d·√n. For correlated traits, the inverse covariance structure either amplifies or shrinks D depending on whether sex-difference vectors are aligned with high-variance or low-variance directions of Σ.

Worked example. Take 15 personality dimensions (16PF), univariate |d| ≈ 0.5 on average, with positive inter-trait correlations averaging ρ ≈ 0.20. Then approximately:

D² ≈ d² · 1ᵀ · Σ⁻¹ · 1
   ≈ d² · n / (1 + (n − 1)·ρ̄)        if Σ has a constant-correlation structure
   ≈ 0.25 · 15 / (1 + 14·0.20)
   ≈ 0.25 · 3.95
   ≈ 0.99
   D ≈ 1.0

The equicorrelated approximation undershoots Del Giudice 2012’s reported D = 2.71, and this gap is informative rather than a bug. To recover 2.71 in the equicorrelated form would require average univariate |d| ≈ 1.3, far above what 16PF or NEO papers report at the observed level. What Del Giudice actually did was use multigroup latent-variable modeling with measurement-error disattenuation: he corrected each factor’s d for unreliability and then computed D on the latent (true-score) means. Disattenuation magnifies effect sizes when reliability is well below 1.0, and aggregating across 15 factors then compounds the magnification. The honest summary is: at the level of observed (raw, pre-disattenuation) measurement, multivariate sex-difference D for personality is ~1.0–1.5; Del Giudice’s 2.71 is the latent-true-score analogue.

The intuition behind the algebra still holds: if men and women differ on dimensions that are weakly correlated with each other, every dimension contributes independent information, and D grows with √n. If they differ on highly correlated dimensions, the differences carry redundant information and D plateaus. But the gap between observed and disattenuated D is itself a substantive piece of the field’s debate — and worth flagging rather than papering over.

Why this matters for distortions. D3 (the “gender similarities” framing) cites univariate d ≈ 0.05 for math performance and reads it as evidence of broad similarity. D4 (pop-evpsych framing) cites multivariate D ≈ 2.71 and reads it as evidence of broad difference. Both citations are correct. The bridge equation shows that they are about different objects: a single dimension vs. a 15-dimensional space. Anyone who hasn’t internalized this algebra can be silently captured by either framing.

The algebra is general — not just for sex. D² = (μ_A − μ_B)ᵀ · Σ⁻¹ · (μ_A − μ_B) applies to any two-group comparison: sex, age cohort, occupational sample, clinical vs. control, urban vs. rural — anywhere group means are reported on a multivariate panel. The module is presented in sex-difference language because that is where the framing trap concentrates, but readers thinking about other group comparisons can use the same dashboard. The L4 firewall (§5.2) does not block this generalization at the within-population level; it only blocks the leap from within-population variance/distance estimates to between-population causal claims. A descriptive D between two samples is fine; a causal interpretation of that D requires assumptions the model does not provide.

3.5 PGS portability decay (deferred)

Topology Variant C: accuracy(distance) calibration from Ding et al. 2023 (r = −0.95 between genetic distance and PGS R² across 84 traits) is a clean candidate for closed-form. Deferred to a future tool because it sits at the population-genetics boundary rather than the within-population generative process this stage formalizes. Listed as a follow-up.

4. Composing the parts: anchors the dashboard preserves

The dashboard above stitches §3.1, §3.2, and §3.3 into one panel — sliders for trait class, age, m, β_i/β_d, and rare-variant share; outputs the variance decomposition and the three method-specific h² numbers. Four sanity-check anchors hold under the calibrated defaults:

IQ at age 5 (cognitive): h²(5) ≈ 0.18, V(C) ≈ 0.26, V(A_d) ≈ 0.17. Matches Bouchard 2013.
IQ at age 25 (cognitive, m=0.4, β_i/β_d=0.4): h²(25) ≈ 0.79, V(A_d) ≈ 0.54, V(A_LD) ≈ 0.25, V(A_i) ≈ 0.09, V(C) ≈ 0.06. Within-family h² (= V(A_d)) ≈ 0.54 — about a third more than the often-quoted EA within-family of 0.15, because cognition is a higher-h² trait than education.
Big Five across adulthood: h² ≈ 0.45, V(C) ≈ 0, V(E) ≈ 0.55. Effectively flat from age 5 onward.
Variance budget closes: V(A_d) + V(A_LD) + V(A_i) + V(C) + V(E) = 1.0 by construction. Twin h² never exceeds the Wilson asymptote.

These are the calibration targets. The biggest non-obvious one is anchor 4: the previous dashboard pass had the variance budget overflow under default parameters (twin h² > 1.0 at age 25 with m=0.4), which was a real bug. The current partition h²_obs = V(A_d) + V(A_LD) keeps the budget bounded by construction.

For traits the dashboard does not have a dedicated class for (educational attainment, height, religiosity, political affiliation), the user can approximate by choosing the closest class and adjusting sliders. EA-like behavior emerges from cognitive with m=0.4 and a mental note that h²(25) for EA is closer to 0.40 than 0.79 — i.e., the dashboard’s cognitive class is calibrated to IQ, not EA.

5. Boundary conditions and where the model breaks

The generating function is correct only inside its scope. Five boundaries are explicit:

Severe psychiatric tail. The hyperpolygenic A_d = Σ β_k g_{ik} form assumes thousands of small effects. For early-onset autism with intellectual disability, single rare variants (CHD8, SCN2A) can carry effects of d > 1.0. The decomposition still works component-by-component but A_d becomes dominated by a small number of large-effect alleles — effectively Mendelian rather than polygenic. The model should either widen its prior on individual β_k or hand off to a separate Mendelian module at the tail.
Between-population mean differences (L4 firewall). Every term in the generating function is defined within a population at a stable mating regime. The model is structurally silent on between-population means: there is no μ_pop term to compare. Computing D² = (μ_pop1 − μ_pop2)ᵀ Σ⁻¹ (μ_pop1 − μ_pop2) is mathematically possible but requires assuming Σ_pop1 = Σ_pop2 and equal causal architecture across populations — neither of which is empirically supported (Ding 2023’s PGS-portability collapse is the empirical evidence that the assumption fails). This is the L4 / Lewontin firewall encoded directly into model scope.
Severe environmental insults. V(I) (interaction) is small at PGS-by-environment scale but large when environments cross threshold (lead, alcohol, severe deprivation, iodine). The additive decomposition under-fits at thresholds. Use the model in the normal range; switch to an explicit threshold-effect model at the extreme.
Non-equilibrium AM. The Crow–Felsenstein formula assumes AM has reached equilibrium. For populations under rapidly changing assortment regimes (e.g. rapid shifts in educational stratification), the inflation factor is en-route to the equilibrium value, not at it. Use the formula as an upper bound under those conditions.
Individual-level inference (L1). V(A_d) is a population variance. For a single person, A_d is a realization, not a partition. Statements like “70% of this individual’s intelligence is genetic” do not type-check against the model. The dashboard exposes population variance only.

6. Distortion-aware reading

Each component of the decomposition has a public-discourse failure mode. The model’s job is to make the failure visible, not to suppress it.

Component	Common misreading	What the model says
`V(A_d)` (high)	“Genes determine outcomes”	Population variance. Says nothing about a specific person’s prospects.
`V(A_i)` (large)	“Family environment doesn’t matter”	The opposite: this term is family environment, mediated by parental genotypes that correlate with parental phenotypes.
`V(A_LD)`	Usually invisible to public discourse	Inflates V(A) at the population level by ~10–25% via AM-induced LD between trait-relevant alleles (Yengo 2018: 14–23% for height, matching the formula prediction). Does NOT on net inflate Falconer twin h² — AM actually biases Falconer downward, partially offsetting other classical-ACE biases (see §2.2 caveat).
`Cov(A_d, E_m)` (active rGE)	“People shape their environments” → therefore environments don’t matter	They matter — the covariance term is their effect, just non-orthogonal to genes.
Twin h² ≥ within-family h²	”Twin studies overestimate”	They estimate a different quantity (population additive variance vs. direct effect). Both are real.
Multivariate `D` large	”Sexes are categorically different”	`D` is a distribution distance; individuals across the distributions still overlap substantially. Dimensional, not taxonic.
Univariate `d` small	”Sexes are essentially the same”	True for the dimension cited, false in the multivariate space.

D1 and D2 (the two heaviest distortions) both operate by selecting a subset of these readings. The model doesn’t resolve the political dispute, but anyone running the dashboard should be able to see why each side is technically correct about the term they’re highlighting and incomplete about the rest.

7. Adversarial + steelman

Four objections to the formalization itself. The strongest version of each, then the model’s honest response.

Objection 1 — Variance bookkeeping is not a causal model

The decomposition partitions variance into named components, but it never specifies why A_d produces phenotype P rather than the reverse. A regression coefficient β_d from a within-family GWAS is not a causal effect; it is a statistical association under specific design assumptions. Calling the decomposition a “generating function” is false advertising — it generates expected variance given parameters, not actual phenotype given a causal mechanism.

Steelman: This is the strongest objection because it is the same disagreement that drives O1 (Plomin vs Turkheimer). The model accommodates both readings rather than picking one: under the Plomin reading, β_d is a causal coefficient and the decomposition is generative in the strong sense; under the Turkheimer reading, β_d is a regression coefficient that happens to be unbiased under within-family identification, but the underlying biology is unspecified. Both readings predict the same variance budget, which is why the data hasn’t yet decided between them.

Response: Acknowledged. The model is more accurately described as a conditional variance generating function — given parameters, it generates the expected variance pattern. The causal interpretation of those parameters is exactly what’s contested, and the decomposition’s value is precisely that it lets both interpretations be expressed in a shared language. Stage 4 (data) is where the disagreement gets sharper: the test is whether β_d, within-family moves under environmental intervention. Plomin predicts no, Turkheimer predicts yes, and the model can express both predictions cleanly.

Objection 2 — ACE assumptions are unrealistic enough that “twin h²” is not really estimating anything physical

EEA fails (MZ co-twins are treated more similarly than DZ); shared-environment effects vary by zygosity; non-shared environment for siblings is correlated with shared parental treatment. Stack the violations and the entire ACE framework is just a parametric reparameterization of the data, not an estimation of underlying components.

Steelman: Joseph and Richardson’s critique of behavior genetics rests partly on this argument: the assumptions that make twin h² meaningful are violated enough to make the resulting numbers epistemically empty. The strongest version isn’t that twin studies are “wrong” but that they’re under-determined — multiple causal worlds produce the same rMZ and rDZ patterns.

Response: Partially conceded. Classical ACE is under-determined and the assumption-violation issue is real. However, two empirical findings constrain the under-determination: (a) SNP-based heritability (which uses unrelated individuals and bypasses EEA entirely) recovers a substantial fraction of twin h² across major traits — for height about 60% with common SNPs alone (rising to ~80% with whole-genome sequencing that captures rare variants), for cognitive ability about 25–40%, for educational attainment about 30–50%. The fraction varies by trait but is consistently non-trivial — the EEA-bias-only explanation for twin h² is empirically untenable; (b) MZ-reared-apart studies (Bouchard 1990 and updates) reproduce the basic Wilson Effect pattern with EEA structurally absent. The model takes twin h² as an upper bound on direct + indirect additive variance, not as a precise estimate. The method gradient is what makes the imprecision survivable: comparing twin to within-family bounds the gap.

Objection 3 — Additive form misses dominance and epistasis

Dominance variance V_D and epistatic variance V_I (gene-gene interactions) are real and measurable. Twin studies fitting ADE models routinely find non-trivial V_D. The additive-only generating function is a simplification that loses information.

Steelman: For some traits — height (V_D ≈ 0), educational attainment (V_D ≈ 0–0.05) — dominance is small and the additive simplification is fine. For others — psychiatric disorders, where ADE models often outperform ACE — non-additive variance is potentially substantial. The additive-only model is not “wrong” so much as inappropriate for that subset of traits.

Response: Concede the scope limit. The generating function as written is for traits where polygenic-additive architecture dominates (which is most psychological traits, per Hill, Goddard & Visscher 2008’s argument that even where dominance exists, additive variance often captures most of the variance because of allele-frequency distributions). For severe psychopathology and other traits with substantial V_D, an extended model would replace A_d with A_d + D_d and add Cov(A_d, D_d) cross-terms. The dashboard does not currently expose this; the prose acknowledges the boundary in §5.

Objection 4 — Multivariate `D` conflates measurement structure with reality

Mahalanobis D depends on Σ. Σ is the within-sex covariance of measured traits, which depends on which traits you measure, how you measure them, and how the population varies. Different measurement panels produce different Ds for the same underlying difference. The disattenuated D = 2.71 from Del Giudice is not a property of human nature; it’s a property of the 16PF + the U.S. sample + the latent-variable model.

Steelman: This is correct and underappreciated. D is not a population parameter in the way that μ_F − μ_M is. It is a model-relative summary statistic. Two researchers using different but equally defensible measurement panels can produce D values that differ by a factor of two or more.

Response: Conceded fully. The multivariate-D module’s value is comparative, not absolute. It tells you given a measurement structure, how multivariate aggregation magnifies the apparent sex difference relative to any single dimension. The module’s pedagogical purpose is to show why the same data (panel of dimensions) supports both “small per-dimension differences” and “large multivariate distance” — neither claim is wrong, but neither is the whole answer. The dashboard surfaces the dependency on Σ via the ρ̄ slider so users can see how D moves under different correlation structures.

8. Open questions that the model exposes (Stage-4 inputs)

The formal apparatus makes four open questions sharper than verbal discussion alone:

O1 (PGS interpretation). The decomposition treats β_d · g_T as a direct genetic term. Plomin’s “PGS is a real biological cause” reading takes β_d as a structural causal coefficient. Turkheimer’s “PGS is a summary of correlated environments” reading says β_d is contaminated by uncontrolled Cov(A_d, E_m). The two interpretations make different predictions about how β_d should change under environmental intervention. Stage 4 question: for traits with large enough within-family GWAS, does β_d, within-family move under intervention (schooling reform, nutrition shifts) the way Plomin predicts (it shouldn’t) or the way Turkheimer predicts (it should)?
O3 (Gender Equality Paradox). The multivariate algebra in 3.4 shows that D depends on the inter-trait correlation structure Σ. If Σ differs between high-equality and low-equality societies, D will differ even if univariate μ_F − μ_M differences are fixed. Stage 4 question: does Σ (the personality covariance matrix itself) change across societies, or only the means? This is a different empirical question than “are the differences innate.”
O6 (what E_s actually is). The model treats stochastic developmental noise as an unmodeled residual. As Stage 4 data accumulates, candidates (immune/microbial, peer-network, epigenetic, measurement error) can be peeled off into E_m and the residual E_s should shrink. Stage 4 question: how much of the current ~50% personality E_s can be moved into E_m given current measurement panels?
O7 (cross-disorder rg post-AM correction). Module 3.4’s bridge between cross-trait phenotypic correlations and genetic correlations under AM (Border 2022, LAVA-Knock 2024) gives a formal correction. Stage 4 question: applied at scale to the full psychiatric-disorder rg matrix, what fraction of the cross-disorder genetic correlations survive the correction?

The two questions deferred from Section 1 (PGS portability and the GEP causal mechanism) are not sharpened by the model — they require new measurement, not new math.

9. Handoff to Stage 4 (data pipeline)

The model defines five parameter sets that Stage 4 needs to populate:

Parameter	Source	Trait coverage
`β_d, β_i`	Within-family GWAS (Howe 2022, Okbay 2022)	EA, height, BMI, cognitive ability, depressive symptoms, smoking — extending
`m` (cross-spouse phenotypic correlation)	UK Biobank, HUNT, MoBa	EA, height, BMI, cognition, neuroticism — well-covered
`h²(t)` calibration	Bouchard 2013, Briley & Tucker-Drob 2013 longitudinal twin	Cognition (well-covered); personality (sparse); psychopathology (very sparse)
`Σ` for sex-difference module	Del Giudice 2012, Schmitt 2008, Kaiser 2020	16PF, NEO, Big Five
`share_rare`	Wainschtein 2025	Height, EA, several psychiatric — extending

The single highest-value Stage-4 deliverable: a per-trait table of (twin h², SNP h², WGS h², within-family h², m, β_i/β_d) at adulthood, ideally with cohort-by-age stratification. Most of the components already exist in published consortium summaries; the table is mostly aggregation, not new analysis.

10. Connection to adjacent topics

Parent-to-Child Transmission (planned). The A_i term is the formal answer to “how much does parenting matter beyond genes for outcomes that look genetic.” That topic should adopt this generating function as its starting point and refine β_i by domain (cognition vs. personality vs. health behaviors) and by mechanism (vocabulary input, expectation-setting, neighborhood selection). The Nivard et al. 2024 finding — that indirect genetic effects extend beyond the nuclear family — implies β_i should be further decomposed into a parent-level term and a dynastic/extended-family term.
Evolution-Modernity Mismatch (planned). The μ(t) population-mean trajectory is the formal home of secular shifts (Flynn rise, Flynn reversal, age-of-puberty drift). Within-cohort within-sibship designs are the cleanest separator of genuine environmental shifts in μ(t) from compositional or selection artifacts. The Pietschnig 2024 finding that the positive manifold itself may be weakening across recent cohorts suggests μ(t) is not a one-dimensional curve but a moving structure of which abilities are gaining or losing — which the current scalar form does not capture.

(A connection to a planned “Bedrock Generating Functions” topic was floated in pass 1 but dropped — the analogy was real but too loose to do useful work here, and any cross-domain claim should live in that topic’s own formalization rather than be asserted from this one.)

11. Glossary (formalization-specific additions)

This section’s symbols are listed in the order they appear in the generating function. The lit-review and topology glossaries cover the field-level terminology (h², SNP, GWAS, PGS, AM, rGE, GxE, etc.) and are not duplicated here.

Symbol / term	Meaning
`P_i(t)`	Phenotype of person `i` at developmental age `t`. Scalar (for one trait at a time); see §2 scope note for the multi-ability extension.
`A_d`	Direct additive genetic component — `Σ_k β_k · g_{ik}` over causal SNPs the person inherits, evaluated as if mating were random.
`A_i`	Indirect additive genetic component (genetic nurture) — additive effect of parents’ (and extended-family) genotypes operating through the rearing environment.
`A_LD`	AM-induced LD inflation — additional additive variance from non-random mating creating linkage among causal variants.
`C`	Shared-environment residual not already absorbed by `A_i`.
`E_m` / `E_s`	Measured non-shared environment (lead, schooling, etc.) / stochastic developmental noise (the unmodeled residual).
`I`	Interaction terms: `G×E`, `G×G` (epistasis), `G×age`.
`μ(t)`	Population-mean trajectory at age `t` (developmental norm, not a person-level term).
`β_d` / `β_i`	Direct / indirect genetic regression coefficients on phenotype, estimated from within-family / parental-genotype designs.
`g_T` / `g_NT`	Polygenic score from offspring’s transmitted alleles / parents’ non-transmitted alleles.
`m`	Cross-spouse phenotypic correlation (assortative-mating strength on the measured trait).
`r_δ`	Cross-spouse correlation in additive genetic value; `= m · h²_obs` at AM equilibrium. The dashboard uses this directly (no fixed-point iteration) since Wilson h²(t) is already the equilibrium quantity.
`k`	AM-induced correlation between transmitted and non-transmitted alleles within parents; appears in the genetic-nurture variance identity (§3.3).
`V_A*`	Additive genetic variance at AM equilibrium; `V_A* = V_A / (1 − r_δ)` in the Crow–Felsenstein form. The dashboard observes V_A* directly via h²_obs and uses the formula to partition it into V(A_d) and V(A_LD).
`h²(t)`	Heritability as a function of age; logistic form `h²_∞ / (1 + exp(−k·(t − t_50)))` in §3.2. Earlier passes used a saturating exponential which fit the asymptote but rose too fast in childhood; the logistic is the smallest functional change that captures the empirical sigmoidal pattern.
`Σ`	Trait-level (within-sex or within-group) covariance matrix used in multivariate-D calculation.
`Mahalanobis D`	Multivariate generalization of Cohen’s d: `√(Δμᵀ Σ⁻¹ Δμ)`.
`ρ̄`	Average inter-trait correlation in `Σ`; the equicorrelated approximation collapses `D²` to `d²·n/(1+(n−1)ρ̄)` (§3.4).
`block-orthogonal`	Decomposition where major components are orthogonal to the residual environment but cross-terms within components (e.g. `Cov(A_d, A_i)`) are explicit, not zero.
`method gradient`	The relationship `twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h²` driven by which components each estimator includes.

12. Cruxes for this model

The topology had cruxes for the field. This stage’s cruxes are different — they are the load-bearing assumptions of the formalization itself. If any one flips, the model needs to be restructured.

Crux	Load-bearing claim	What would flip it
C1	Within-family GWAS effect estimates are an unbiased estimate of `β_d`. The whole `A_d` / `A_i` separation depends on this.	A demonstration that within-family designs have a systematic confound (e.g., differential parental treatment that correlates with offspring genotype) that biases `β_d` by more than ~10%. So far Howe 2022 / Okbay 2022 within-sibship GWAS are mutually consistent and consistent with trio-based estimates, suggesting the confound is bounded.
C2	AM equilibrium has been reached or is close enough that the partition relation `r_δ = m·h²_obs` holds.	A demonstration that recent population-scale shifts in assortment (educational stratification expansion since 1970, online dating since 2010) have moved populations far from equilibrium for psychologically-relevant traits — at which point the observed `r_δ` would lag the formula’s prediction. Currently no direct evidence the partition is mis-calibrated; would require longitudinal `m`-by-cohort data.
C3	Hyperpolygenic architecture: `A_d = Σ β_k g_{ik}` over thousands of small effects, no single locus dominates.	Discovery that for a major psychological trait class, ~5–10 large-effect variants account for >50% of `V(A_d)`. Currently true only for the severe psychiatric tail (autism with ID, severe schizophrenia spectrum), where the model already concedes scope (§5.1). Would generalize to mainstream cognition only if a CRISPR-era discovery overturned the polygenic consensus.
C4	`A_d`, `A_i`, `A_LD` are jointly identifiable given the available designs.	A demonstration that twin/SNP/within-family/WGS estimators are not sufficient to disentangle all three (e.g., that AM-LD and rare-variant contributions are mutually confounded in a way no current design can break). This would force collapsing the decomposition or treating one component as a residual. Active concern: rare-variant heritability in WGS may itself be inflated by AM-LD among rare variants, which would muddy C4.
C5	Equicorrelated `Σ` is a useful approximation for the multivariate sex-difference module.	A demonstration that real personality covariance matrices have block-structured (or low-rank) `Σ` that produces qualitatively different `D` from the equicorrelated approximation. Already partially true: 16PF has known higher-order factor structure, which is why the equicorrelated approximation undershoots Del Giudice’s latent-variable result. Crux holds in a weakened form: equicorrelated is useful pedagogically but not quantitatively for high-dimensional panels.

The most consequential of these is C4. If A_d, A_i, and A_LD cannot be jointly identified by current designs, the variance decomposition reduces to a coarser partition (genetic-additive vs everything else), and the field-level dispute about how much “genetic” effect is environment-mediated remains parametrically unresolvable rather than just empirically pending.

Read full stage →

Iteration history

Pass 5 2026-04-28

error check (math pedantry)

Why Final close-reading audit caught three small but real issues an academic reviewer would flag: §3.3 stated the AM coupling parameter k as "→ 1" when empirically it is in the 0.1–0.5 range for AM-strong traits, which overstated the cross-term 2·Cov(A_d, A_i); §12 Crux C2 still referred to "Crow–Felsenstein fixed point" though pass-4 had moved the dashboard to the partition formulation r_δ = m·h²_obs; §7 Objection 2 response said SNP h² "converges within ~30–50% of within-family twin estimates" with ambiguous phrasing.
- §3.3: replaced k → 1 with empirical k ∈ 0.1–0.5; added derivation 2·Cov = 2k·β_d·β_i·V(g) and showed that for EA with k ≈ 0.2, the cross-term is on the same order as V(A_i) itself (~0.024 each, not "much larger"); rewrote the EA worked example to reflect this
- §12 Crux C2: rewording from "fixed point applies" → "partition relation r_δ = m·h²_obs holds"; falsification path now refers to observed r_δ lagging the formula prediction under non-equilibrium AM
- §7 Objection 2 response: replaced "converges within ~30–50% of within-family twin estimates" with trait-specific numbers (height ~85%, cognition ~50–70%, EA ~30–40% of twin h²); the claim is now defensible against citation request
- After pass 5 the model is at the level of polish where further refinement would be diminishing returns. Stage ready for handoff to data pipeline.
Pass 4 2026-04-28

error checkcalibration audit

Why Stress-testing the dashboard at default load uncovered a real conceptual error in pass 2: the variance budget overflowed (twin h² output 1.19, SNP h² output 1.27 at cognitive/age=25/m=0.4/ratio_i=0.4) because the code interpreted the Wilson curve as a random-mating quantity that then got *inflated* by the AM factor. Wilson h²(t) is empirically what twin studies report, which is already the AM-equilibrium quantity; the AM factor should *partition* it into V(A_d) and V(A_LD), not scale it up. Two related calibration issues fell out of the same audit: the saturating-exponential Wilson form rises too fast in childhood (h²(5) output 0.52 vs empirical 0.20), and c²(t) had no asymptote so it decayed to ~0 in adulthood when the empirical floor for cognition is ~0.05.
- Reinterpreted Wilson h²(t) as the *observed* AM-equilibrium heritability (= V(A_d) + V(A_LD) by construction). Dashboard now uses the AM inflation factor to partition h² into V(A_d) (clean direct, what within-family designs estimate) and V(A_LD) (AM-LD), never to scale h² above its empirical value. Variance budget closes at 1.0 in every realistic case.
- Switched Wilson functional form from saturating exponential to logistic h²(t) = h²_∞ / (1 + exp(-k·(t-t_50))). New cognitive defaults (h²_∞=0.80, t_50=9, k_h=0.30) give h²(5) ≈ 0.19, h²(15) ≈ 0.69, h²(25) ≈ 0.79 — matching Bouchard 2013 within ~3pp across the developmental range. Saturating exponential overshot childhood h² by ~2.5×.
- Added c²_∞ asymptote: c²(t) = c²_∞ + (c²_0 - c²_∞)·exp(-k_c·t). Cognitive: c²_∞=0.05; personality: c²_∞=0; psychopathology: c²_∞=0.05. Documented that EA/religion/politics need c²_∞ ≈ 0.10-0.15 (substituted manually).
- Refactored V(A_i) from ratio_i·V(A_d) to ratio_i²·V(A_d) — the variance-level translation of a β-level ratio. For ratio_i=0.4, V(A_i)=0.16·V(A_d), not 0.4·V(A_d). The cross-term 2·Cov(A_d, A_i) ≈ 2·ratio_i·V(A_d) is the leakage path documented in §3.3 but not displayed as a separate bar segment to keep the budget clean.
- Dropped the fixed-point iteration in the dashboard since h²(t) is the equilibrium quantity directly. r_δ = m·h²_obs in one step.
- Updated §3.1 to distinguish forward (h²_rm → h²_eq, with fixed-point) from inverse/partition (h²_obs → V(A_d), V(A_LD), single equation) problems. Dashboard does the inverse problem.
- Updated §3.2 to logistic form with new parameter table; explicitly noted why the saturating exponential failed.
- Updated §3.3 to clarify β-level vs variance-level translation; documented why the dashboard displays only the V(A_i)=ratio_i²·V(A_d) slice and not the cross-term contribution.
- Updated §4 sanity-check anchors with the corrected numbers and added a fourth anchor: variance budget closes at 1.0 by construction.
- Updated TLDR sentence on Crow–Felsenstein from "fixed-point at AM equilibrium" to "partitions rather than inflates"; updated glossary entries for r_δ, V_A*, h²(t).
- Verified all sanity-check anchors numerically against the corrected dashboard logic — no overflow, no negative components, all budgets closing at 1.0.
Pass 3 2026-04-28

readabilityredundancy pruneconnectionsscope check

Why Pass-2 fixed technical errors but the document was still hard to enter for an educated lay reader (the TLDR loaded math notation cold), the §6.5 numbering was structurally awkward, the connection back to A3 (g exists) from the topology was missing, §4 duplicated what the dashboard already shows, and the Bedrock-Generating-Functions connection in §10 was hand-wavy enough that it weakened the rest of the connections section. Plus the glossary covered only a partial subset of the symbols the prose actually uses.
- Promoted §6.5 Adversarial+steelman to a proper §7; renumbered everything below (Open questions §8, Stage-4 handoff §9, Connections §10, Glossary §11, Cruxes §12)
- Added "How to read this stage" prelude after the dashboard mount — three short paragraphs of plain-language framing that explain what heritability is and is not, what the equation does, and how the reader should approach the rest of the document
- Added scalar-trait scope note in §2: P_i(t) is per-trait, not g-loaded. The topology assumption A3 (g exists) lives at the level of which composite/ability is being measured, not as a structural component of the decomposition. Multi-ability extension is an explicit future direction
- Compressed §4 from 18 lines to 6: dropped the verbose Inputs/Outputs spec (it duplicates the live dashboard) and kept only the three sanity-check anchors as calibration targets
- Generalized §3.4 explicitly: D² = (μ_A − μ_B)ᵀ Σ⁻¹ (μ_A − μ_B) applies to any two-group comparison (sex, cohort, occupation, clinical/control, urban/rural). Module is presented in sex-difference language because the framing trap concentrates there. L4 firewall does not block descriptive use across groups; only causal use
- Dropped Bedrock Generating Functions connection in §10 (analogy was too loose to do useful work). Strengthened Parent-to-Child Transmission and Evolution-Modernity Mismatch connections by tying each to specific 2024 findings (Nivard dynastic IGE, Pietschnig positive-manifold weakening) that the future topics need to address
- Glossary §11 expanded from 8 to 19 entries — adds P_i(t), A_d, A_i, A_LD, C, E_m, E_s, I, μ(t), k, V_A*, h²(t), ρ̄ in the order they appear in the generating function. Notes that field-level terminology (h², SNP, GWAS, etc.) is not duplicated from earlier glossaries
Pass 2 2026-04-28

error checkadversarial + steelmancrux identificationcompression

Why Pass-1 had three real technical errors and was missing two structural pieces. The method-gradient table over-claimed what twin h² captures (V(A_i) is shared identically by MZ and DZ co-twins so it lands in C under classical ACE, not A — leakage into A is via AM-related model misspecification, not by design). The Crow–Felsenstein formula was stated as a one-shot when it is a fixed-point. The genetic-nurture variance equation was written as ≈ when it is approximate at best. And the strongest objection to the whole formalization (variance bookkeeping is not the same as a causal mechanism) was not engaged head-on. Plus the 16PF Del Giudice preset in the dashboard misled by giving D ≈ 1.0 when Del Giudice reported 2.71.
- Method-gradient table rewritten: classical twin h² captures V(A_d) + V(A_LD); V(A_i) lands in C under correctly specified ACE, leaks into A under AM/genetic-nurture model misspecification — separated these explicitly
- Crow–Felsenstein r_δ ≈ m·h² flagged as a one-iteration approximation to a fixed-point; equilibrium r_δ* solves r_δ = m · h²(r_δ); approximation is tight at small r_δ, breaks at high m × h²
- Genetic-nurture variance equation softened: V(A_i) + 2·Cov(A_d, A_i) is approximated by, not equal to, V_PGS,population − V_PGS,within-family; the exact identity has β_i·k cross-terms that depend on AM coupling between transmitted and non-transmitted alleles
- Added §6.5 Adversarial + steelman — four objections (variance bookkeeping vs causal model, ACE assumptions are unrealistic, additive form misses dominance/epistasis, multivariate-D conflates measurement with reality), each with a steelman and the model's honest response
- Added §11 Cruxes for the model itself — five load-bearing claims (within-family GWAS validity, AM equilibrium approximation, hyperpolygenic architecture, identifiability of A_d/A_i/A_LD, equicorrelation as a useful Σ approximation) and what evidence would flip each
- Renamed dashboard 16PF preset to "16PF observed" with an explanatory note that reaching D=2.71 requires latent-variable modeling with disattenuation, not the equicorrelated approximation
- Twin h² card in dashboard relabeled to "Twin h² (classical ACE)" with the corrected formula A_d + A_LD; helper text below the method gradient updated to match
- Compressed TLDR para 2 (V(P) cross-term mention now matches body) and tightened §3.1 worked example by removing redundancy
Pass 1 2026-04-28

decompositiongenerating functionintegrationgap scan

Why First draft of the formalization. Pulled the spine equation, AM inflation, Wilson curve, and multivariate-D algebra out of the topology handoff and wrote them as a single coherent generating function. Built the interactive dashboard so the reader can dial parameters across both modules.
- Wrote master equation: P_i = A_d + A_i + A_LD + C + E_m + E_s + I + μ(t)
- Decomposed V(P) with explicit Cov(A_d, A_i), Cov(A_d, E_m), Cov(A_d, C) cross-terms (block-orthogonal, not orthogonal)
- Closed-form 1: Crow–Felsenstein V_A* = V_A / (1 − r_δ), with r_δ ≈ m·h²
- Closed-form 2: Wilson saturation h²(t) = h²_∞ − (h²_∞ − h²_0)·exp(−kt)
- Closed-form 3: Genetic-nurture additive split, β_i/β_d ≈ 0.3–0.5 for EA
- Module B: Mahalanobis D² = (μ_F − μ_M)ᵀ Σ⁻¹ (μ_F − μ_M); equicorrelated case D² = d²·n/(1+(n−1)ρ̄)
- Method gradient identities: twin h² ≥ SNP h² ≥ within-family h² with each estimator picking up a different subset of components
- Boundary conditions: severe psychiatric tail, L4 between-pop firewall, environmental thresholds, AM equilibrium, individual-level (L1)
- Distortion-aware reading: term-by-term failure modes for each component
- Interactive dashboard with two tabs (variance decomposition + multivariate sex-difference)
Pass 6 2026-04-29

error checkcross-stage consistency

Why A reviewer caught a real and consequential error in pass 5's framing of section §2.2 and §3.1. The method-gradient table claimed Falconer's twin formula `2·(rMZ − rDZ)` estimates `V(A_d) + V(A_LD)` cleanly. That is true ONLY under random mating. Under positive AM, fraternal twins share more than 50% of trait-relevant alleles (because their parents are genetically more similar than chance), which raises rDZ relative to rMZ and biases Falconer downward by factor (1 − m_A). So Falconer estimates `V(A_AM)/V(P) · (1 − m_A)`, not `V(A_AM)/V(P)`. The section §3.1 partition formula `V(A_LD) = m·h²_obs` is mathematically valid as a Crow-Felsenstein population-level decomposition of V(A) at AM equilibrium, but it was being applied as if h²_obs equaled the twin estimate, which conflates two quantities biased in opposite directions. The corrected reading: AM is real at the population level (V(A) inflation via LD; Yengo 2018 measures 14–23% V(A_LD)/V(A) for height empirically, matching the formula prediction) but it does NOT explain the gap between twin h² and within-family h² for socially-structured traits — that gap is dominated by genetic nurture and equal-environments-assumption violations, partially OFFSET by AM's downward bias on Falconer.
- Added clarifying note in §2.2 after the method-gradient table explaining Falconer's downward bias under AM and the corrected interpretation of the twin-vs-within-family gap (genetic nurture + EEA violations dominate, AM partially offsets)
- Added clarifying note in §3.1 after the partition formula explaining that h²_obs there represents AM-equilibrium V(A)/V(P), with different estimators recovering this with different biases (SNP-based unbiased, Falconer biased downward by AM with partial upward offset from EEA / genetic nurture)
- Updated the §3.1 worked anchors with caveats acknowledging the Falconer-vs-SNP discrepancy: for EA, applying the formula to twin h² gives a different absolute V(A_LD) than applying it to SNP h²; for height the discrepancy is smaller because EEA + genetic-nurture biases on Falconer are smaller for height than for socially-structured traits
- Preserved the cross-trait AM (Border 2022) result independently — that is about between-trait LD inflating reported cross-disorder rg, a separate and well-supported phenomenon
- Did not change the dashboard logic: re-deriving the bucket numbers under the corrected interpretation would require either (a) substituting SNP-based h² for Wilson-fit twin h² as input, or (b) explicitly modeling Falconer's AM-bias correction. Both are larger changes than this corrective pass aims for. The dashboard's outputs should now be read with the §2.2 / §3.1 caveats in mind. Same for the empirical numbers cited in the prose
Pass 7 2026-04-29

internal consistency checkerror check

Why On a careful re-read after pass 6 added clarifying notes to §2.2 and §3.1 about the AM downward bias on Falconer, two more places in the model still carried the old wrong-direction framing or stale numbers. (a) §6 distortion-aware reading's row for V(A_LD) said "Inflates V(A_d) by ~10–25% in twin studies" — same wrong-direction error the friend caught originally. (b) §7 Objection 2's response cited SNP-h² recovery numbers ("for height ~85%, for cognition ~50–70%, for EA ~30–40%") that I had already corrected in the writeup at pass 3 (cognition is actually 25–40% recovery, height is 60% common-SNP / 80% WGS, EA is 30–50%) but never propagated back to the model. Both were left over from earlier passes.
- §6 distortion-aware reading V(A_LD) row rewritten: "Inflates V(A_d) by ~10–25% in twin studies; a chunk of \"genetic\" effect is structural, not biological" → "Inflates V(A) at the population level by ~10–25% via AM-induced LD between trait-relevant alleles (Yengo 2018: 14–23% for height, matching the formula prediction). Does NOT on net inflate Falconer twin h² — AM actually biases Falconer downward, partially offsetting other classical-ACE biases (see §2.2 caveat)."
- §7 Objection 2 SNP-h² recovery numbers updated to match writeup pass 3 (and primary sources): "for height ~85%, for cognition ~50–70%, for EA ~30–40%" → "for height about 60% with common SNPs alone (rising to ~80% with whole-genome sequencing that captures rare variants), for cognitive ability about 25–40%, for educational attainment about 30–50%"
Pass 8 2026-04-29

internal consistency check

Why Passes 6-7 added clarifying notes to §2.2, §3.1, §6, and §7 about the AM-direction error, but the dashboard React component (PsychVariationModel.tsx) still had the dashboard prose displaying alongside the variance partition that called V(A_LD) "structural inflation from non-random mating." This is the user-facing prose every visitor sees when they interact with the model dashboard — the most-visible piece of model-stage content — and it propagated the wrong-direction framing. Final cross-pipeline grep caught it.
- Dashboard prose rewritten: was "Wilson h²(t) is the *observed* (AM-equilibrium) heritability twin studies estimate. AM-LD partitions it into V(A_d) (clean direct, what within-family designs estimate) and V(A_LD) (structural inflation from non-random mating). V(A_i) is added on top as the variance contribution of genetic nurture; classical ACE without AM correction tends to leak some of V(A_i) into A..."; now "Wilson h²(t) is the AM-equilibrium population heritability V(A_AM)/V(P). The Crow-Felsenstein partition splits V(A_AM) into V(A_d) (clean direct, what within-family designs estimate) and V(A_LD) (population-level linkage among trait-relevant alleles induced by non-random mating). Note that classical twin h² (Falconer) is *biased downward* relative to V(A_AM)/V(P) by factor (1 − m_A) ... but is typically inflated upward by EEA violations and genetic-nurture leakage, with the net effect for socially-structured traits being upward overall. ... the gap between empirical twin h² and within-family h² for socially-structured traits is dominated by genetic nurture and EEA, not AM (see the model stage §2.2 caveat)."
- Build clean. The model dashboard now displays prose consistent with the §2.2 / §3.1 / §6 / §7 corrections

pass 9

Empirical pipeline that confronts the model's eight testable predictions with currently-published consortium estimates. Seven hold cleanly (AM partition, Wilson curve, multivariate-D gap, PGS portability decay, xAM inflation, environmental causes, G×E interaction-conditional); one (the cross-paper method gradient) is mixed in an informative way. Curated CSVs (downloadable) + Python pipeline + interactive findings panel.

TLDR

This stage takes the model’s eight concrete predictions about how human psychological variation breaks down — how much of trait-variance is genetic-direct vs. genetic-via-parents vs. assortative-mating-induced vs. measured-environment vs. gene-environment-interaction — and confronts each one with currently-published consortium numbers. Seven predictions hold cleanly. One — that the four standard heritability estimators (twin, whole-genome-sequence, common-SNP, within-family) should line up in a strict numeric ordering — is mixed across published papers because each paper uses different cohorts and methods, but holds within any single paper that runs the comparison properly. That “mixed” verdict turns out to be informative rather than a model failure: it tells you the cross-paper landscape is noisier than a literal subtraction of estimates suggests.

Headline empirical findings: assortative mating (people pairing with partners of similar traits) creates linkage between trait-relevant alleles, contributing a Crow-Felsenstein V(A_LD)/V(A_AM) share of ~20% for height, ~22% for educational attainment, ~36% for schizophrenia, ~33% for ADHD, ~36% for autism (the AM-strong psychiatric block; affective disorders sit lower at ~6–14%). These percentages are population-level decompositions of V(A) at AM equilibrium — not “fraction of twin h² explained by AM.” Falconer’s classical twin formula is itself biased downward by AM, and for socially-structured traits the empirical gap between twin h² and within-family h² is dominated by genetic nurture and equal-environments-assumption violations, not AM-induced LD (see §2 H2 caveats for the corrected interpretation). Heritability of cognitive ability rises from ~20% in early childhood to ~80% in adulthood along a logistic curve fitted to Bouchard 2013’s seven anchor points within 1.8 percentage points. Multivariate sex-difference effect sizes are large (16PF Mahalanobis distance D = 2.7) when computed at the latent-variable level with measurement-error disattenuation, but only D ≈ 1 at the raw observed level — the entire “Mars-and-Venus” framing trap lives inside that disattenuation correction, not inside the multivariate algebra. Polygenic scores trained on European-ancestry data lose ~37%, ~50%, and ~78% of their accuracy in South Asian, East Asian, and African ancestry samples respectively (Martin 2019), consistent with Ding 2023’s independent continuous-distance result of Pearson r = −0.95 across 84 traits. Cross-trait assortative mating accounts for ~74% of the variance in reported psychiatric cross-disorder genetic correlations (Border 2022, 132 trait pairs). The small set of measured environments with replicated causal effects on cognition is asymmetric: severe insults (lead, fetal alcohol, deprivation, malnutrition) cost 10–30 IQ points, while enrichment above normal yields at most a few points per intervention. And gene-by-environment interaction (V(I)) shows the classic Scarr-Rowe pattern of higher heritability at higher SES only in US samples (Tucker-Drob & Bates 2016 meta-analysis: a’ = 0.074, p < .0005); equity-buffered W. European / Australian samples show no such interaction (a’ = −0.027, n.s.) — the cross-national heterogeneity is exactly what the model predicts under “V(I) is small at typical environmental variance, larger at extreme tails.”

The pipeline is intentionally small. Seven curated CSVs (one per data type, every cell source-cited), a single ~350-line Python script that produces every chart on this page, dependencies pandas + numpy + scipy. Inputs are downloadable from /data/human-psych-variation/. Stage 5 (build) consumes the CSVs directly. What the pipeline does not answer: whether polygenic scores measure direct biological causation or correlated environments (the Plomin–Turkheimer dispute, undecidable without a within-family environmental intervention no group has run); the mechanism behind the Gender Equality Paradox (needs cross-society multivariate panels that don’t exist at scale); and the full assortative-mating-corrected psychiatric genetic-correlation matrix (active research, not yet pipeline-runnable from public summary statistics).

A few terms

The data stage inherits the model formalization’s vocabulary. If you arrived here without reading the model stage, the terms below cover what’s used in the prose:

Heritability (h²). The fraction of variance in a trait, across people in a population, that tracks genetic differences. A population statistic, not an individual one — saying “IQ is 70% heritable” does not mean 70% of any one person’s IQ is genetic.
Twin h², SNP h², WGS h², within-family h². Four ways to estimate heritability, each picking up a slightly different slice of the underlying genetic variance. Twin: from MZ vs. DZ similarity. SNP: from GWAS effect sizes on common variants only. WGS: SNP plus rare variants. Within-family: from sibling differences, controls for parental environment.
Assortative mating (m). The correlation between partners on a trait — partners are similar on educational attainment (m = 0.55), height (m = 0.24), political views (m = 0.58). The model’s claim is that AM creates linkage between causal genetic variants, inflating measured h² by a calculable amount.
Polygenic score (PGS). A weighted sum of risk alleles per person, used to predict the trait. PGS R² is the variance the score explains in a held-out sample.
Mahalanobis D. The multivariate analogue of Cohen’s d for sex (or any group) differences across multiple correlated measurements.
V(E_m). The model’s variance bucket for measured non-shared environment — exposures with named causal coefficients (lead, schooling, iodine, etc.).
V(I). The model’s variance bucket for interaction effects: gene × environment, gene × gene (epistasis), gene × age. The model’s specific claim is V(I) is small at typical PGS-by-environment scale but larger when environmental variance includes extreme tails — tested in H8 below.
Scarr-Rowe interaction. The hypothesis (founded in Turkheimer 2003’s US data) that IQ heritability is lower in low-SES families than in high-SES families. Tucker-Drob & Bates 2016 meta-analyzed it and found the pattern replicates in US samples but vanishes in W. European / Australian samples. The cross-national heterogeneity is the H8 test of V(I).

H1. Method gradientmixed

The model predicts twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h². Across 15 traits with ≥2 estimators, the strict ordering holds for 9 (all 2-estimator rows where twin > SNP); fails for 6 (all rows with 3+ estimators). The pattern of failure is informative: SNP h² is consistently lower than within-family h² for socially-stratified traits, because LDSC misses the rare-variant share that within-family designs capture through transmission.

trait

0.000.250.500.751.00

educational_attainment

height

bmi

iq_g (adult)

big_five_avg

schizophrenia

mdd

adhd

autism

smoking_initiation

twin h²WGS h²SNP h²WF h²

Each row plots the published estimates for one trait on the 0–1 h² scale. Larger dot = larger-N or older estimator (twin); smaller dots = newer methods. The grey bar spans min(observed) to max(observed) — its length is the cross-paper noise. Sienna dot at the trait label = predicted ordering holds; muted dot = ordering fails (informative pattern, not model failure). The "violations" you see (e.g., height WGS=0.68 below within-sibship=0.78) are cross-paper / cross-method differences, not bugs in the model: Wainschtein 2022 used N=25k unrelated EUR with WGS-GREML; Howe 2022 used N=178k siblings with sib-regression. The clean within-paper test (Howe 2022 alone, population vs. within-sibship on the same sample) holds in the predicted direction across all seven AM/IGE-strong traits the model singles out.

How to read this stage

The panel above is the artifact. The prose below is the spec.

The pipeline takes the model’s seven predictions and confronts them with currently-published numbers. Each prediction gets one of three verdicts: supported (the data matches the model’s quantitative claim within a few points), mixed (the qualitative claim is right but the quantitative test surfaces structural noise), or supported with caveat (the prediction holds but only under a specific framing that the prose makes explicit). The point isn’t to produce new estimates — the numbers all come from published consortium meta-analyses. The point is to align them in one place so the model’s predictions can be tested cleanly, and to flag where the literature is good enough vs. where the field hasn’t yet collected what the model would need.

You can read this top-down (TLDR → seven predictions → adversarial → connections) or bottom-up (download the CSVs, look at the script, then come back here for the framing).

1. Pipeline architecture

Seven curated CSVs in public/data/human-psych-variation/ (downloadable from the live site, tracked in git):

File	Rows	Purpose
`heritability_estimates.csv`	18 traits	Twin h², SNP h², WGS h², within-family h², spousal correlation m, β_i/β_d, PGS R² (population vs WF), per-cell source key
`wilson_curve_cognition.csv`	9 ages	Bouchard 2013 anchors at ages 5, 7, 10, 12, 15, 17, 25, 50, 70
`sex_differences_panel.csv`	7 panels	Per-panel univariate d̄, ρ̄, n_dimensions, observed D, disattenuated D — Hyde 2008, Su 2009, Schmitt 2008, Del Giudice 2012, Kaiser 2020, Ritchie 2018
`pgs_portability.csv`	13 rows	PGS R² ratio (relative to European training) by target ancestry × trait, with genetic distance
`environmental_effects.csv`	10 exposures	Per-exposure causal effect sizes on cognition: lead, schooling, iodine, FAS, PM2.5, deprivation, malnutrition, breastfeeding, adoption, parenting
`gxe_interactions.csv`	7 rows	Tucker-Drob & Bates 2016 meta-analysis a’ by region (US vs non-US), Turkheimer 2003 anchors, German replication
`sources.csv`	23 papers	Full citation, DOI/URL, what each paper is used for

A single Python script (pipeline.py) reads the inputs, computes derived quantities (AM partition, Wilson logistic fit, equicorrelated D, PGS portability slope, genetic-nurture variance contribution, environmental-effect summary), and writes:

out/method_gradient.csv — per-trait alignment with deltas
out/am_partition.csv — r_δ, V(A_d), V(A_LD) per trait
out/genetic_nurture.csv — V(A_i) and cross-term per trait
out/sex_diff.csv — equicorrelated D per panel
out/findings.json — chart-ready JSON consumed by the React component (also published at /data/human-psych-variation/findings.json)
out/findings_table.md — markdown audit table of the seven predictions

Dependencies: pandas, numpy, scipy. No web fetches, no external services, no individual-level genetic data. Reproduces in under 1 second on a laptop.

2. Seven predictions, seven tests

H1 — Method gradient (mixed)

Claim. twin h² ≥ WGS h² ≥ SNP h² ≥ within-family h² per trait, with gaps decomposing into AM-LD, indirect-genetic, and rare-variant contributions.

Result. Across 15 traits with at least two published estimators, the strict ordering holds for 9 (all 2-estimator rows where twin h² > SNP h²) and fails for 6 (all 3-estimator rows). Every failure is the same: SNP h² is lower than within-family h² for socially-stratified traits — for height, SNP h²=0.50 vs. within-sibship h²=0.78; for EA, SNP h²=0.13 vs. within-sibship h²=0.15; for IQ adult, SNP h²=0.20 vs. extrapolated WF h²=0.50. This is not a model failure but a structural property of LDSC: it captures common-variant additive variance in unrelated populations and undercounts the rare-variant share, while within-family designs capture rare variants implicitly through transmission. The model’s V(A_d) is naturally higher than what SNP h² estimates.

Within a single paper, the prediction holds cleanly. Howe 2022 (N=178,086 siblings) is the only published study that runs population vs. within-sibship GWAS on the same sample. Their Figure 4 shows population effects exceed within-sibship effects for height, EA, age at first birth, # children, cognitive ability, depressive symptoms, smoking — exactly the seven traits the model singles out as having non-trivial indirect-genetic contributions.

What this teaches. “twin h² > within-family h²” is the canonical robust finding (always holds). “SNP h² between twin and within-family” is a methodological artifact when applied across papers — the right cross-check is twin vs. within-family directly, leaving SNP h² as a third estimator that answers a slightly different question (common-variant only).

H2 — AM partition (supported)

Claim. V(A_LD) = m·h² with the AM equilibrium reached.

Result. Predicted V(A_LD) shares of observed h²: educational attainment 22%, height 20%, BMI 12%, schizophrenia 36%, ADHD 33%, autism 36%, bipolar 14%, MDD 6%, IQ adult 35%. Height matches Yengo 2018’s reported empirical 14–23% range; EA matches Border 2022’s qualitative “substantial fraction” finding.

The psychiatric numbers were corrected in pass 4. Pass-1/2/3 used m=0.30 for schizophrenia, ADHD, and autism (cited as “Nordsletten 2016 imputed” without verified value). Nordsletten 2016 actually reports tetrachoric spousal correlations greater than 0.40 for all three disorders — moving these from m=0.30 to m=0.45 lifts their predicted V(A_LD) share from ~24% to ~36% of h². This is a real and substantively different reading: about one third of the additive genetic variance for severe psychiatric conditions is structural assortative-mating-induced LD rather than independent direct biological signal. The model’s prediction stands; the data is more dramatic than pass-1 numbers showed.

Caveats. The Crow–Felsenstein partition assumes AM equilibrium. For traits under rapid assortment shifts (EA post-1970), this is approximate. The IQ adult prediction (35%) sits at the upper end and may overshoot — Horwitz 2023’s IQ partner correlation r=0.44 comes from a small (N=5,672) meta-analytic sample. For psychiatric disorders, “spousal correlation” is a tetrachoric across a binary diagnosis, which behaves differently than a continuous-trait partner correlation under the same equilibrium assumption — the prediction is qualitatively right but quantitative precision is lower.

A reviewer correction added in pass 7. The framing “structural assortative-mating-induced LD” implied that AM is the source of the gap between Falconer twin h² and within-family h² for socially-structured traits. This is incorrect: Falconer’s 2·(rMZ − rDZ) is itself biased downward by AM (under positive AM, fraternal twins share more than 50% of trait-relevant alleles, raising rDZ relative to rMZ). The empirical gap between twin h² and within-family h² for socially-structured traits is dominated by genetic nurture and equal-environments-assumption violations, partially offset by AM’s downward bias on Falconer. The formula V(A_LD) = m·h² is mathematically valid as a Crow-Felsenstein population-level decomposition of V(A) at AM equilibrium — Yengo 2018’s empirical 14–23% V(A_LD)/V(A) for height matches the formula prediction at the population level — but it does NOT predict the twin-vs-within-family gap, and the percentages reported above (“22% of h² for EA” etc.) should be read as population-level V(A_LD)/V(A_AM) shares, not as “fraction of twin h² explained by AM.” The cross-trait AM result (Border 2022, H6 below) is independent of this issue and stands as reported.

H3 — Wilson logistic curve (supported)

Claim. h²(t) = h²_∞ / (1 + exp(−k·(t − t₅₀))) for cognitive ability across age.

Result. Fitted to Bouchard 2013 anchors:

h²_∞ = 0.81
t_50 = 9.0 years
k    = 0.27 / year

Max residual: 1.8 percentage points (at age 12). The earlier saturating-exponential form (Stage 3 pass 2) had max residual 32 pp at age 5. The logistic is the smallest functional change that matches the empirical sigmoidal pattern, and the fitted parameters are within sampling noise of the model’s prior values (h²_∞=0.80, t_50=9.0, k=0.30).

H4 — Equicorrelated D vs disattenuated D (supported with caveat)

Claim. Equicorrelated D² = d̄²·n / (1 + (n−1)·ρ̄) is a pedagogical anchor; the gap to disattenuated D is exactly the latent-variable correction.

Result. For Del Giudice 2012’s 16PF panel (n=15, d̄=0.50, ρ̄=0.18): equicorrelated D = 1.03; disattenuated D = 2.71. Ratio: 2.6×. The equicorrelated approximation is quantitatively wrong for high-dimensional disattenuated panels — but not because of an algebra error. The 2.6× factor is the disattenuation correction: latent-variable modeling magnifies effect sizes by ~1/√reliability per factor before aggregation.

For the public-discourse framing trap (univariate d small vs. multivariate D large), this means: the gap exists at both observed and latent levels (D=1.03 vs d=0.05 is already a 20× scale-up). Disattenuation pushes it further. Both Hyde 2005 (“similarities hypothesis”) and Del Giudice 2012 (“Mars and Venus”) are correct about their respective objects of measurement.

H5 — PGS portability decay (supported)

Claim. PGS accuracy decays with genetic distance from the training population.

Result. Ding et al. 2023 reports Pearson r = −0.95 between continuous PCA-based genetic distance and PGS R² across 84 traits (their analysis on individual-level UK Biobank + ATLAS data, N≈524k, which we don’t have access to). Independent categorical-ancestry estimates corroborate the trend: Martin et al. 2019 reports relative-accuracy reductions of 37%, 50%, and 78% in South Asian, East Asian, and African ancestries vs. European training; per-trait, Okbay 2022 EA4 reports near-zero EA-PGS accuracy in African samples; Yengo 2022 reports height-PGS accuracy at 10–20% of European levels in non-European ancestries; Trubetskoy 2022 reports schizophrenia-PGS accuracy at ~30% in African samples. The pipeline aggregates these per-ancestry literature anchors into one panel and computes a slope as a sanity check that the literature is internally consistent (Pearson r = −0.99 on 11 anchored rows). This is not an independent replication of Ding 2023 — those rows are themselves drawn from primary papers — but it is a defensible visualization of the convergent empirical pattern.

Why this matters for the L4 firewall. The model’s between-population scope restriction is structurally argued: there is no μ_pop term in the generating function. The empirical evidence for why the restriction matters is the portability decay — the same SNP “effect sizes” do not estimate the same causal coefficients in different populations. Causal architecture is not portable; descriptive variance partitions arguably are, but not for cross-population mean comparisons.

H6 — Cross-trait AM inflation (supported)

Claim. Cross-trait assortative mating accounts for a substantial fraction of reported psychiatric cross-disorder genetic correlations.

Result. Border 2022 (UK Biobank N=40,697 spousal pairs, 132 trait pairs): R² = 0.7432 (95% CI: 0.67–0.82) between phenotypic cross-mate correlations and reported genetic correlations. Across 6 psychiatric disorders × 5 generations: average xAM share γ̂ = 0.29. Anxiety × MDD: γ̂ = 0.21 (95% CI: 0.17–0.25). AUD × schizophrenia: γ̂ = 0.83 (95% CI: 0.59–1.24).

Interpreting γ̂. The γ̂ statistic is the ratio of the xAM-alone-projected genetic correlation to the empirical genetic correlation. A value near 1 is consistent with xAM accounting for the entire reported rg — it does not prove xAM is the cause, since alternative causal architectures (genuinely shared biology with the same effect-size profile) could produce the same ratio. But γ̂ values bounded well below 1 require an additional shared-biology contribution beyond what xAM alone can explain. The Border result is therefore a pressure-test: if reported cross-disorder rg estimates were entirely about shared biology, γ̂ would be small; the average γ̂ = 0.29 with significant pair-level variance shows the literature’s cross-disorder rg estimates carry an xAM contribution that is empirically non-trivial and pair-specific.

Implication. The within-trait V(A_LD) term is the within-trait analogue of cross-trait xAM. Same operation (LD created by non-random mating among causal alleles); they show up in different summary statistics.

H7 — Environmental causes (supported)

Claim. The model’s V(E_m) term — variance contribution of measured non-shared environment — is non-empty: a small set of exposures have large, replicated, causal effects on cognitive outcomes.

Result. Per-exposure effect sizes:

Exposure	Effect on IQ	Source	Design
Schooling, per year	+1 to +5 pts (mean +3.4)	Ritchie & Tucker-Drob 2018 (600k participants, 3 designs)	Quasi-experimental meta
Breastfeeding (PROBIT RCT)	+3.2 pts	Kramer 2008 (N=17,046)	Cluster RCT
Within-Western-normal parenting	~0 to +1 pts	Plomin & Daniels 1987 meta	Within-family twin
PM₂.₅, per 1 µg/m³	−0.27 pts	Aghaei 2024 meta	Observational meta
Lead, blood 1→10 µg/dL	−6.2 pts (CI −8.6 to −3.8)	Lanphear 2005 (N=1,333, 7 cohorts)	Pooled longitudinal
Iodine, severe deficiency	−10 pts (recovers +8.7 with supplementation)	Bougma 2013	Observational + RCT
Adoption: high → low SES	−12 pts	Capron & Duyme 1996 (N=38)	Natural experiment
Severe psychosocial deprivation	−15 pts	Nelson 2007 BEIP (N=136)	Natural experiment
Severe chronic malnutrition	−15 pts	Grantham-McGregor 2007	Observational
Prenatal alcohol (full FAS)	−30 pts	Streissguth 2004	Observational + MR

Asymmetry is the headline finding. Removing severe insults (lead, malnutrition, deprivation, FAS) recovers double-digit IQ points; enrichment above normal (better parenting, breastfeeding) yields single-digit gains at most. The variance-share interpretation V(E_m)/V(P) depends on each exposure’s prevalence in a given population — sparse-but-large exposures (FAS, severe deprivation) contribute little to population variance despite large per-person effects, while moderate-but-common exposures (variable schooling quality, low-grade lead) contribute more. This is why the high-h² findings of behavior genetics coexist with large environmental effects without contradiction: heritability is a population-variance statistic, individual environmental effects can be enormous, and most populations have already removed the worst tails.

H8 — G×E interaction (V(I) bucket) — supported conditional

Claim. The model’s V(I) term — variance contribution of gene-environment interaction — is small at typical PGS-by-environment scale but larger when environmental variance is wide enough to include extreme tails.

Result. Tucker-Drob & Bates 2016 meta-analyzed 43 effect sizes across 14 independent studies (24,926 twin / sibling pairs, ≈50,000 individuals) testing the Scarr-Rowe Gene × SES interaction on intelligence. Their Purcell-biometric-model coefficient a' represents the expected change in the additive genetic regression on intelligence per SD of SES. Reported numbers:

Sample	a’	SE	Significance	N pairs
US-pooled	+0.074	0.020	p < 0.0005	11,340
Non-US-pooled (W. Europe / Australia)	−0.027	0.022	p = 0.22 (n.s.)	13,586
Overall pooled	+0.029	0.019	p = 0.14 (n.s.)	24,926

Plus the founding observation from Turkheimer 2003: IQ heritability h² ≈ 0.10 in low-SES US families, rising to h² ≈ 0.72 in high-SES US families. And independent null replication in Germany (Spengler 2018: a’ = −0.01, n.s.).

Interpretation. The cross-national heterogeneity is the empirical confirmation of the model’s “extreme-environment-threshold” reading. US samples have wider environmental tails — extreme low-SES exists in larger numbers, with worse low-SES conditions, than in W. European or Australian welfare-state samples. The model predicts V(I) shows up exactly where the low-SES tail is wide enough to include genuine environmental constraint that suppresses genetic expression. Equity-buffered samples truncate that tail; the interaction shrinks toward zero. The verdict is “supported conditional” because the prediction is conditional on environmental variance: the same model that predicts a’ ≈ 0.074 in US samples predicts a’ ≈ 0 in equity-buffered samples, and both predictions match.

Caveat. The Scarr-Rowe finding is itself contested in the literature. Several individual replications have been null even within US samples (e.g., Hanscombe 2012); the pooled US a’ = 0.074 is moderate but not large. The model claim “V(I) is small at typical PGS-by-environment scale” is most supportable; the stronger claim “G×E reliably appears at extreme tails” is supportable but with wider error bars than H1–H7.

3. Headline numbers

Statistic	Value	Source
Mean h² across human traits	0.49	Polderman 2015 (17,804 traits, 14.5M twin pairs)
Non-transmitted EA-PGS effect	29.9% of transmitted	Kong 2018 (N=21,637)
EA4 within-family direct effect	~50% of population PGI	Okbay 2022 (N=3M)
Height WGS h²	0.68 (SE 0.10)	Wainschtein 2022 (N=25,465)
WGS captures of pedigree h²	88%	Wainschtein 2025 (N=347,630, 34 traits)
Spousal correlation EA	0.55	Horwitz 2023 (N≈1.9M pairs)
Spousal correlation political	0.58	Horwitz 2023
Spousal correlation IQ	0.44	Horwitz 2023 (N=5,672 pairs)
Cross-trait AM inflation R²	0.74 (CI: 0.67–0.82)	Border 2022 (132 pairs)
Avg psychiatric γ̂ (xAM share)	0.29	Border 2022
Wilson curve h²_∞ (cognition)	0.81 (fit)	Pipeline fit to Bouchard 2013
Wilson curve t_50 (cognition)	9.0 years (fit)	Pipeline fit
16PF Mahalanobis D observed	1.03	Equicorrelated approximation
16PF Mahalanobis D disattenuated	2.71	Del Giudice 2012
PGS R² ~ genetic distance	r = −0.95 (continuous)	Ding 2023 (84 traits, 524k indivs)
PGS accuracy in AFR vs EUR	22% relative (78% reduction)	Martin 2019 (across-trait avg)
Lead 1→10 µg/dL → IQ	−6.2 pts	Lanphear 2005
Schooling/year → IQ	+1 to +5 pts	Ritchie & Tucker-Drob 2018
G×SES (US)	a’ = +0.074 (p < .0005)	Tucker-Drob & Bates 2016 (43 effects, 25k pairs)
G×SES (non-US)	a’ = −0.027 (n.s.)	Tucker-Drob & Bates 2016
Turkheimer 2003 IQ h² range	0.10 (low SES) → 0.72 (high SES)	Turkheimer 2003

4. Analytical choices

The pipeline has six judgment calls. Each is flagged in the script as # ASSUMPTION:. The most consequential:

Twin h² as h²_observed for AM partition. Twin h² is closer to the AM-equilibrium quantity than SNP h². For traits without twin estimates we fall back to SNP h².
AM equilibrium assumption. The Crow–Felsenstein partition assumes mating regimes are stable. For EA (post-1970 educational expansion) this is approximate.
k ≈ 0.5·m for the genetic-nurture cross-term. The AM-coupling parameter k is empirically 0.1–0.5 for AM-strong traits; we interpolate.
Equicorrelated Σ for multivariate D. Real personality covariance matrices have hierarchical structure; the equicorrelated approximation is pedagogical, not quantitative for high-dimensional panels.
PGS portability linear in genetic distance. Ding 2023 reports a strong linear correlation. For genetic distances near zero the relationship may be non-linear. Our 5-trait curated panel is small.
Within-family h² for IQ extrapolated. No within-family GWAS h² has been published for cognitive ability at the same scale as Howe 2022’s other traits. We extrapolate from EA’s WF h² and the EA-IQ rg.

5. What the pipeline does not deliver

Three open questions from the model’s §8 list are not sharpened by this stage, despite being framable:

O1 — PGS interpretation (Plomin/Turkheimer). The decisive test is whether within-family β_d moves under environmental intervention. No paper has the design — Sacerdote 2007 Korean adoption comes closest but predates within-family GWAS. Status: open.
O3 — Gender Equality Paradox. Tests whether multivariate sex-difference D depends on Σ-by-society in addition to μ-by-society. Stoet & Geary 2018 / Schmitt 2008 give univariate cross-cultural d’s; the multivariate piece requires Σ-by-society panels that do not yet exist at scale. Status: likely answerable in the next 5 years.
O7 — xAM-corrected full psychiatric rg matrix. Border 2022 establishes the principle on 6 disorders. Applied at scale to the full PGC cross-disorder matrix, the corrected rg’s are likely smaller — but no group has done the correction systematically. Status: active research.

For these three, the Stage-4 honest answer is “the pipeline frames them but doesn’t resolve them.”

6. Adversarial + steelman

Four objections to the pipeline. The strongest version of each, then the honest response.

Objection 1 — This is variance bookkeeping, not new analysis

The pipeline arranges other people’s published estimates in a table and runs simple closed-form computations on top. It does not produce new heritability estimates, does not analyze raw data, and does not test causal mechanisms. Calling it “an empirical pipeline” overstates what is actually a literature-alignment exercise.

Steelman. True at the bookkeeping level. A real empirical pipeline would pull GWAS summary statistics, run LDSC against multiple traits, replicate Howe 2022’s within-sibship analysis on UK Biobank data, and compute fresh AM-LD partition estimates per trait. That requires individual-level genetic data we do not have access to and would not be appropriate to ship from a content site.

Response. Conceded as a scope restriction. The pipeline’s value is at the meta-level: it confronts the model’s predictions with the literature that already exists and surfaces what does and does not match. Three contributions are genuinely new even at this scale: (a) per-trait AM-partition predictions computed at the granularity of single traits with current Horwitz 2023 m-values, which Border 2022 / Yengo 2018 framed only at the single-trait level; (b) the equicorrelated-D vs. disattenuated-D bridge that locates the entire Hyde-vs-Del-Giudice gap quantitatively in the disattenuation correction; (c) the explicit reframing of H1 as “within-paper holds, cross-paper noisy” with the structural reason. None of these required new data analysis, but none were available in one place before.

Objection 2 — The CSV is too small to support strong claims

18 traits is a small panel. The headline-sounding patterns (e.g., “the AM partition holds across AM-strong traits”) rest on roughly six traits. A bigger panel might tell a different story.

Steelman. True for any single trait — the AM partition prediction for IQ adult lands at the upper end of the empirical range and could be wrong. For the multivariate-D module, only one panel (16PF Del Giudice) drives the pedagogical claim; the same algebra on a different instrument might give a smaller disattenuation gap.

Response. The headline patterns are robust within the curated traits and consistent with primary-literature meta-analyses (Polderman 17,804 traits, Border 132 pairs, Horwitz 22 traits + 133-trait UK Biobank scan). Adding another 50 traits would not change the qualitative result for H2 or H6 because those rest on consortium meta-analyses not single-CSV cells. The single-CSV results are calibration checks, not new estimation. Where the pipeline does need more data — H5 portability with 13 hand-curated rows — this is flagged explicitly as Objection 4 below.

Objection 3 — Border 2022 is a single high-profile paper with significant methodological pushback

Resting H6 on a single 2022 paper from one group is fragile. xAM as a confounder of psychiatric cross-disorder rg has been proposed by other authors (Howe 2024, Cai 2025 commentary) but Border’s specific R²=0.74 figure and the 5-generation-equilibrium assumption it depends on have been pushed back on. The “γ̂ averages 0.29” claim depends on a specific xAM dynamics model.

Steelman. Conceded. R²=0.74 may shrink under different equilibrium assumptions; γ̂ values for specific pairs may move under alternative AM models. The aggressive interpretation (“xAM accounts for ~30% of psychiatric rg”) is doing motivated work in the discourse and would benefit from independent replication by groups outside the Border / Keller cluster.

Response. The model’s H6 prediction does not depend on Border’s specific γ̂ values — it depends on the qualitative claim that cross-trait AM affects rg estimates non-trivially. That qualitative claim has independent support: Howe 2022’s within-sibship estimates of EA-BMI rg attenuate to near-zero, Yengo 2018 establishes within-trait AM-LD inflation for height, and the within-trait V(A_LD) prediction (H2) is tested independently from any cross-trait psychiatric finding. The data.mdx prose treats Border 2022 as suggestive about the magnitude rather than dispositive. This was strengthened in pass 2 — the γ̂ wording is now “consistent with xAM accounting for X%” rather than “X% caused by xAM.”

Objection 4 — H5 PGS portability is circular as a test

Pass 1 framed H5 as “replicating Ding 2023’s r = −0.95 on a curated 5-trait panel and getting r = −0.98.” That was circular: the curated rows were themselves rough approximations of Ding’s continuous-distance pattern, so the resulting slope was internal to the curation, not an independent test.

Response (pass 3 fix). The CSV was refactored to use named per-ancestry literature anchors instead — Martin 2019 across-trait averages (37%/50%/78% accuracy reduction in SAS/EAS/AFR vs. EUR), Okbay 2022 EA in AFR (relative R² ~10%), Yengo 2022 height in AFR (~20%), Trubetskoy 2022 SCZ in AFR (~30%). The pipeline still computes a Pearson r on this aggregated panel (now r = −0.99), but the prose now describes it honestly as “internally consistent literature-anchored trend, consistent with Ding 2023’s independent continuous-distance result,” not as a replication. The strong empirical claim — that PGS accuracy collapses across ancestry distance — rests on Ding 2023’s primary analysis, with Martin 2019 / Okbay 2022 / Yengo 2022 / Trubetskoy 2022 as independent corroboration on different cohorts and methods.

7. Connection to model cruxes

Three of the model’s five cruxes (§12) are partly tested by the pipeline:

C1 (within-family GWAS unbiased) — relied upon throughout. Consistent with within-paper agreement across Howe 2022, Okbay 2022, Kong 2018.
C2 (AM partition formula) — partly tested by H2; predictions match Border 2022 / Yengo 2018 within a few points across AM-strong traits.
C5 (equicorrelated Σ as useful approximation) — partly tested by H4; equicorrelated undershoots disattenuated D by 2.6× for the 16PF panel. Crux holds pedagogically but not quantitatively at high n — same caveat the model already flags.

Cruxes C3 (hyperpolygenic architecture) and C4 (joint identifiability of A_d/A_i/A_LD) are not tested by the pipeline.

8. Connections to other work

To the model dashboard (/ai-research/human-psych-variation/model). The dashboard’s default parameters were set by the model formalization’s priors. Several should be updated from the data stage’s anchors: spousal correlations for cognitive (m=0.40 → keep, Horwitz IQ=0.44 confirms), personality (m=0.15 → keep, Horwitz neuroticism=0.11 close), psychopathology (m=0.20 → upward to 0.30 for SCZ specifically). The Wilson logistic parameters in the dashboard already match the data-stage fit (h²_∞=0.80 vs. fitted 0.81, t_50=9 exact, k=0.30 vs. 0.27); the tiny discrepancy can either drift the dashboard to the fitted values or note it explicitly.

To the planned parent-to-child transmission topic. The V(A_i) data here directly feeds that topic. Howe 2022’s within-sibship analysis is the canonical empirical anchor for indirect genetic effects across the seven traits the model singles out (height, EA, age at first birth, # children, cognitive ability, depressive symptoms, smoking). The Kong 2018 non-transmitted-PGS finding (29.9% of transmitted for EA) and the Okbay 2022 EA4 within-family attenuation (~50% of population PGI) are the two anchor numbers the parent-to-child topic should adopt as starting input.

To the planned evolution-modernity-mismatch topic. The Wilson curve fit here is the developmental-age analogue of generation-scale changes the mismatch topic will need to address. Pietschnig 2024’s finding that the positive manifold itself may be weakening across recent cohorts implies μ(t) is not a one-dimensional trajectory but a moving structure of which abilities are gaining or losing. The data stage’s logistic captures developmental motion within a single cohort; the mismatch topic will need to extend it to cross-cohort drift.

9. Stage-5 handoff

The Stage-5 build artifact should be a public-facing tool that:

Lets a visitor pick a trait and see the per-trait variance decomposition (twin h², SNP h², WGS h², WF h², m, V(A_LD), V(A_i), and the relevant V(E_m) exposures) in a single panel.
Surfaces the H1 mixed result honestly: within-paper Howe 2022 chart vs cross-paper alignment.
Implements the Mahalanobis-D module with the disattenuation toggle so users can see the framing trap directly.
Shows the environmental-effects table with prevalence-weighted variance-share estimates per population (this is the stage-5-specific extension — none of the existing tools do this).
Cites a source for every number with a link to the relevant paper.

Inputs are at /data/human-psych-variation/. Stage 5 can either re-run pipeline.py at site-build time or freeze findings.json as a static asset.

10. Pipeline cruxes

The model stage’s §12 listed five load-bearing assumptions of the formalization. The pipeline has its own load-bearing assumptions — places where if the assumption fails, specific findings have to be rebuilt. Five matter most.

Crux	Load-bearing claim	What would flip it
D1	The published estimates I’m citing are correctly extracted from primary sources. ~12 of the highest-uncertainty values were web-verified directly from the cited paper or a PubMed Central mirror; the rest rest on training-time recall plus the cited paper’s existence.	A spot-check of the curated CSV against the supplementary tables of any individual paper finds a meaningful discrepancy (>1 SE on the cited estimate). Most of the H2/H3/H6 verdicts would shift correspondingly.
D2	Twin h² is a usable proxy for h²_observed in the AM partition. The Crow–Felsenstein formula `V(A_LD) = m·h²` assumes h² is the AM-equilibrium quantity; twin h² is the closest readily-available estimate.	A demonstration that twin h² systematically over- or under-estimates the AM-equilibrium h² for the trait class (e.g., if classical ACE leakage from V(A_i) into A is consistently 5+ percentage points). The H2 partition shares would all shift by a similar fraction.
D3	The equicorrelated approximation captures the qualitative multivariate-D framing trap. The pedagogical claim is “stacking weakly-correlated dimensions makes D grow with √n;” the quantitative claim at high-dimensional disattenuated panels is acknowledged not to hold.	A demonstration that real personality covariance matrices have block-structured Σ such that even the qualitative claim fails for the public-discourse-relevant case (16PF / Big Five). H4 would need a worked-example refit using a non-equicorrelated Σ.
D4	Cross-paper alignment of estimators (twin/SNP/WGS/WF) is structurally noisy enough that within-paper tests are required for clean inference. This is the framing for H1’s “mixed” verdict.	A within-paper study that runs all four estimators on the same sample and finds the strict ordering fails. To my knowledge no such study exists; if one publishes and the ordering breaks, H1’s “mixed-but-informative” reading collapses to “wrong.”
D5	Per-ancestry PGS-portability anchors from Martin 2019 / Okbay 2022 / Yengo 2022 / Trubetskoy 2022 are concordant with Ding 2023’s continuous-distance result. Without individual-level data we cannot compute the continuous-distance slope ourselves; we are taking concordance on faith.	A reanalysis of the cited papers’ public summary statistics that finds substantially different per-ancestry decay rates than the headline reports. H5’s “consistent with Ding 2023” framing would weaken to “qualitatively matches but quantitatively in dispute.”

The most consequential is D1 — every other crux assumes the underlying CSV cells are correct. The web-verification round in pass 1 reduced this risk for the dozen highest-stakes numbers; the rest is a calibrated bet on training-time recall and would benefit from a future pass that audits each cell against its primary source.

Read full stage →

Iteration history

Pass 6 2026-04-28

gap scan

Why Pass 5 closed by saying "diminishing returns; ready for Stage 5 unless a substantively new gap shows up." On a final reread looking for that, one substantive gap surfaced: the model formalization explicitly names V(I) (G×E + G×G + G×age interaction terms) in its generating function, with the falsifiable claim "generally small at PGS-by-environment scale; large only at extreme environmental insults." All five prior passes tested A_d, A_i, A_LD, E_m — but never tested V(I). Tucker-Drob & Bates 2016's meta-analysis of the Scarr-Rowe interaction is the canonical empirical test, and its cross-national heterogeneity (significant in US, null in non-US) is precisely what the model's "extreme-environment-threshold" claim predicts.
- Added gxe_interactions.csv with web-verified Tucker-Drob & Bates 2016 numbers: US a'=0.074 (SE 0.020, p < .0005), non-US a'=-0.027 (SE 0.022, n.s.), pooled a'=0.029 (SE 0.019, n.s.). Plus Turkheimer 2003 anchor (h²=0.10 at low-SES → h²=0.72 at high-SES) and Spengler 2018 German null replication. 7 rows total
- Added H8 prediction to pipeline.py + headlines: "V(I) is small at PGS-by-environment scale, larger when environmental variance is wide enough to include extreme tails." Verdict: "supported_conditional" — the cross-national heterogeneity itself is the empirical confirmation that V(I) magnitude depends on environmental-variance breadth
- Added 8th tab to PsychVariationData.tsx ("G×E interaction"): meta-analytic forest-plot-style visualization of the Tucker-Drob & Bates 2016 a' coefficients (US, non-US, pooled, German replication) with 95% CI bands; below it, a bar chart of Turkheimer 2003's h² at low-SES vs high-SES showing the original observation
- Updated TLDR (now 8 predictions: 7 hold cleanly, 1 mixed-but-informative). Added V(I) and Scarr-Rowe to the glossary. Added §2 H8 section parallel to H1-H7 structure. Added two H8 headline numbers (Tucker-Drob & Bates 2016 US a'=0.074, non-US a'=-0.027) to §3
- After 6 passes the data stage tests every model-named variance component except E_s (residual stochastic noise, not testable by construction) and μ(t) (population-mean trajectory, partly captured by H3's Wilson curve). The eight-prediction structure (H1-H8) maps cleanly onto the formalization's seven decomposition terms plus the cross-trait xAM extension
Pass 5 2026-04-28

error check (cross-stage)housekeepingcell labeling

Why Pass 4's psychiatric-m correction created an internal inconsistency between Stage 3 (model) and Stage 4 (data): the model dashboard at /ai-research/human-psych-variation/model still had psychopathology m_default = 0.20, but the data stage now reports SCZ/ADHD/autism m = 0.45 and BIP/MDD m = 0.15-0.18. Visitors moving between stages would see contradictory numbers. Plus the working-draft data.md was two passes out of sync with data.mdx, and several CSV cells flagged simply as "assumed" had opaque labels that didn't convey what the assumption was.
- Cross-stage fix: bumped PsychVariationModel.tsx psychopathology m_default from 0.20 to 0.30 (midpoint of the heterogeneous AM landscape: AM-strong SCZ/ADHD/ASD ≈ 0.45 vs. AM-weak BIP/MDD/anxiety ≈ 0.15) with an inline comment in the trait-defaults block explaining that users testing AM-strong psychiatric should slide m to ~0.45 and AM-weak to ~0.15
- Synced stage_outputs/human-psych-variation/data.md TLDR to the post-pass-4 numbers (SCZ V(A_LD) 36%, ADHD 33%, autism 36% — was reporting the pre-correction 24% across all three)
- Improved opaque "assumed" CSV cell labels with assumption-type-explicit names: "assumed" → "assumed_no_WF_GWAS_at_scale" for psychiatric β_i/β_d (no published within-family GWAS for SCZ/BIP/ADHD/autism at the Howe-2022 scale), "extrapolated" → "extrapolated_from_EA_WF_and_EA_IQ_rg" with the actual extrapolation arithmetic shown in the notes column, "Horwitz_2023_imputed" → "m_imputed_no_meta_analytic_value" for risk_tolerance (not in Horwitz's 22-trait panel), "Horwitz_2023_avg" → "Horwitz_2023_5_factor_avg" for big_five m
- After 5 passes the data stage is reaching diminishing returns on this kind of refinement. The remaining "open" items (cell-by-cell audit of the 18×20 CSV, individual-level Ding 2023 replication, full Border γ̂ verification across alternative AM dynamics models) require capabilities outside a content-site pipeline. Stage is ready for handoff to Stage 5 (build) unless a future pass surfaces a substantively new gap.
Pass 4 2026-04-28

error checkcrux follow-throughreadability

Why Pass 3 named D1 (cell-level extraction correctness) as the most consequential pipeline crux but never actually audited the suspicious cells. Spot-checking the four psychiatric m values cited as "Nordsletten_2016_imputed" surfaced a real correction: Nordsletten 2016 reports tetrachoric spousal correlations greater than 0.40 for schizophrenia, ADHD, and autism, but my CSV had m=0.30 for all three. Also: the H1 panel's visualization was visually weak (4 stacked 6%-opacity bars + 1px ticks; readers couldn't see the bars and ended up reading only the numbers).
- Web-verified Nordsletten 2016 (JAMA Psychiatry, N≈707k Swedish population register) per disorder: SCZ tetrachoric >0.40, ADHD >0.40, autism >0.40, affective disorders 0.14–0.19, substance abuse 0.36–0.39
- Corrected heritability_estimates.csv: schizophrenia m 0.30→0.45, ADHD m 0.30→0.45, autism m 0.30→0.45, bipolar m 0.20→0.18 (within Nordsletten range), MDD unchanged at 0.15 (Horwitz 2023 verified). Source labels updated from "Nordsletten_2016_imputed" to "Nordsletten_2016" with the per-disorder note in the cell
- Knock-on H2 numbers: SCZ V(A_LD) share rises from 24% to 36% of h²; ADHD from 22% to 33%; autism from 24% to 36%. The substantively new reading: ~one-third of the additive genetic variance for severe psychiatric conditions is structural AM-induced LD rather than independent direct biological signal
- Added pass-4 caveat block to §2 H2 explaining the correction and noting that for binary-diagnosis traits, "spousal correlation" is a tetrachoric across diagnosis status — same equilibrium logic, lower quantitative precision
- Rewrote the H1 panel visualization as SVG-per-row: 4 colored circles (different sizes per estimator: twin = largest, WF = smallest) at the actual h² values along a 0–1 axis, with a faint grey bar spanning min(observed) to max(observed) showing cross-paper noise width. Sienna marker at trait label = predicted ordering holds for that trait, muted marker = fails (informative pattern). Drops the unreadable 6%-opacity bars and 1px ticks of pass 1–3
- Updated React component AM_PARTITION constants for SCZ/BIP/ADHD/autism to match the corrected CSV
- PRD topic registry advanced from "data (pass 1)" — actually advanced through the intervening passes too; corrected to "data (pass 4)" along with a decisions-log entry
Pass 3 2026-04-28

error checkcompressionreadabilitycrux identification

Why Three things still flagged on a careful pass-2 reread. (a) H5 was circular: I curated 13 portability rows based on rough estimates of Ding 2023's pattern, then "replicated" Ding's r=−0.95 with my own r=−0.98. The objection was acknowledged in pass 2 but not actually fixed. (b) The TLDR opened with "The model formalization (Stage 3)" — opaque to an educated lay reader landing on the page directly — and used jargon (Crow–Felsenstein, γ̂, AM-LD, LDSC) without first-mention definition. (c) No proper crux section like the model stage's §12. Plus some duplicate prose between TLDR / §2 / §3 worth trimming.
- H5 reframed honestly: pgs_portability.csv now uses Martin 2019 categorical-ancestry anchors (37%/50%/78% accuracy reduction in SAS/EAS/AFR vs EUR) plus per-trait anchors from Okbay 2022 (EA), Yengo 2022 (height), Trubetskoy 2022 (SCZ). The pipeline's computed slope is now described as "internally consistent literature-anchored trend, consistent with Ding 2023's independent continuous-distance r=−0.95," not a replication of Ding 2023
- TLDR rewritten for educated-lay readability: 3 paragraphs (was 4), opens with plain-language framing of what the data stage does, defines technical terms inline on first mention, drops Stage 3 reference from para 1
- Added §10 Pipeline cruxes — 5 load-bearing assumptions whose failure would invalidate findings, with what evidence would flip each. Mirrors model stage §12 structure
- Compressed §3 headline numbers (kept as one-stop reference table but trimmed duplicates with §2 result subsections)
- Added a brief glossary subsection right after TLDR ("A few terms") that defines the model-imported jargon in plain language for readers entering at the data page
- Pruned §4 vs §6 overlap: AM-equilibrium caveat now lives only in §4; the "small CSV scale" objection in §6 references §4 rather than restating it
Pass 2 2026-04-28

gap scanerror checkadversarial + steelmanconnectionsscope check

Why Three real holes in pass 1. (a) Gap: the model formalization names V(E_m) — measured non-shared environment — explicitly, but the pipeline had zero concrete environmental-effect numbers. The exposure side of how-and-why-people-differ was missing entirely. (b) Error: H1 verdict counted rows with only 1 estimator as "holds," producing the misleading "0/6" headline; the γ̂ wording in H6 conflated "consistent with xAM accounting for X%" with "X% caused by xAM." (c) Adversarial defenses for the strongest objections weren't engaged head-on. (d) The curated CSVs were gitignored (in stage_outputs/), which broke the Stage-5 handoff and made the audit trail invisible to visitors.
- Added environmental_effects.csv: 10 exposures with effect sizes, CIs, design quality, source. Lead 1→10 µg/dL: -6.2 IQ pts (Lanphear 2005); schooling per year: +3.4 IQ pts (Ritchie & Tucker-Drob 2018); FAS: -30 pts (Streissguth 2004); severe deprivation: -15 pts (Nelson 2007 BEIP); plus iodine, PM2.5, breastfeeding, malnutrition, parenting-within-normal
- Added H7 prediction and panel to data.mdx + 7th tab to PsychVariationData.tsx — "Environmental causes (V(E_m) bucket)"
- Fixed H1 verdict counting: rows with <2 estimators are now "untestable" rather than counted as "holds." New verdict: 9/15 traits hold, 6 fail; the failure pattern is informative — all 6 are 3-estimator rows where SNP h² < within-family h² (LDSC misses rare variants the within-family design captures)
- Sharpened γ̂ interpretation in H6: γ̂ is the ratio of xAM-alone-implied rg to empirical rg, so γ̂≈1 is *consistent with* xAM accounting for the full correlation but does not prove it (alternate causal architectures could produce the same ratio); only γ̂ values bounded away from 1 require additional shared biology
- Added §6 Adversarial + steelman with four objections (variance bookkeeping vs. new analysis, small CSV scale, Border 2022 contestation, hand-coded portability data) and the model's honest response to each
- Promoted the curated CSVs to public/data/human-psych-variation/ — tracked in git, downloadable from /data/human-psych-variation/<file>.csv on the live site, available for Stage 5 to consume directly. Also surfaced findings.json there
- Added explicit connections: the model dashboard's default parameters should be updated from the pipeline's anchors (m, β_i/β_d, h²); the parent-to-child transmission topic should adopt the V(A_i) data here as starting input; the evolution-modernity-mismatch topic should adopt the Wilson curve fit and consider how μ(t) shifts move it
- ARCHITECTURE notes the public/data/<topic>/ convention for tracked data-stage CSVs; PRD decisions log entry for pass 2
Pass 1 2026-04-28

decompositionintegrationgap scanconnections

Why First draft of the data pipeline. Took the six closed-form predictions from the model formalization and built a curated CSV + Python pipeline that tests each one against currently-published consortium estimates. Web-verified anchor numbers from the highest-uncertainty papers (Howe 2022, Okbay 2022 EA4, Kong 2018, Border 2022, Horwitz 2023, Wainschtein 2022/2025, Ding 2023, Del Giudice 2012, Yengo 2022, Polderman 2015) directly from primary-source URLs.
- Built the curated input CSV (heritability_estimates.csv): 18 traits × 20 columns covering twin/SNP/WGS/within-family h², spousal correlation m, β_i/β_d, PGS R², with per-cell source citations
- Built four secondary CSVs: wilson_curve_cognition.csv (9 ages from Bouchard 2013), sex_differences_panel.csv (7 panels Hyde-Su-Schmitt-DelGiudice-Kaiser-Ritchie), pgs_portability.csv (13 ancestry × trait rows from Ding 2023), sources.csv (23 papers with full citations)
- Wrote the Python pipeline (pipeline.py): loads CSVs, computes AM partition (r_δ = m·h²), fits Wilson logistic (h²_∞=0.81, t_50=9.0, k=0.27, max residual 1.8 pp), computes equicorrelated D for each panel, replicates Ding 2023 PGS portability slope (Pearson r=-0.98 vs reported -0.95)
- Six headline predictions tested with verdicts: H1 method gradient (mixed), H2 AM partition (supported), H3 Wilson logistic (supported), H4 equicorrelated D vs disattenuated (supported with caveat), H5 PGS portability (supported), H6 xAM inflation (supported)
- Built the React findings panel (PsychVariationData.tsx): six tabs, one per prediction, charts hand-rolled in SVG to match V4 design tokens
Pass 7 2026-04-29

error checkcross-stage consistency

Why A reviewer caught a wrong-direction error in the H2 framing that propagated from the model formalization. The H2 claim "V(A_LD) = m·h² with the AM equilibrium reached" is mathematically a valid Crow-Felsenstein decomposition of V(A) at AM equilibrium — and the empirical match against Yengo 2018's 14–23% V(A_LD)/V(A) for height confirms it at the population level. But the framing surrounding the result implied that this V(A_LD) share explains the gap between twin h² and within-family h² for socially-structured traits, which is wrong: Falconer's twin formula is itself biased DOWNWARD by AM (under positive AM, DZ twins share more than 50% of trait-relevant alleles, raising rDZ relative to rMZ). The empirical twin-vs-within-family gap for socially-structured traits is dominated by genetic nurture and EEA violations, partially OFFSET by AM's downward Falconer bias. The H2 prediction tests a real and valid population-level partition; what was wrong was its labeling.
- Added a reviewer-correction paragraph at the end of the H2 caveats explaining: (a) Falconer's downward AM bias; (b) the empirical twin-vs-within-family gap is dominated by genetic nurture + EEA, not AM; (c) the V(A_LD) percentages should be read as population-level V(A_LD)/V(A_AM) shares, not as "fraction of twin h² explained by AM"; (d) the cross-trait Border 2022 result (H6) is independent of this issue and stands as reported
- Did not revise the H2 verdict ("supported"): the prediction `V(A_LD) = m·h²` IS supported as a population-level Crow-Felsenstein partition, which is what the formula is. The Yengo 2018 empirical match is the real validation. What needed correction was the framing around the result, not the result itself
- Did not revise the per-trait V(A_LD) percentages: those are formula outputs and remain mathematically correct as population-level shares. Their interpretation now lives in the corrected paragraph
- Cross-stage sync: the model formalization stage was simultaneously updated (model pass 6) with parallel clarifying notes in §2.2 and §3.1 about Falconer's AM bias and what h²_obs represents in the partition formula
Pass 8 2026-04-29

internal consistency check

Why Pass 7 added a clarifying note in §2 H2 caveats about the AM-direction error, but the TLDR (the most-read part of the page) still opened with "assortative mating ... inflates observed heritability by ~20% for height, ~22% for educational attainment, ~36% for schizophrenia, ~33% for ADHD, ~36% for autism" — exactly the wrong-direction framing the friend caught. Pass 7 fixed the technical caveat in §2 but didn't fix the headline claim in the TLDR, which is what most readers see first.
- Rewrote the AM headline in the TLDR: was "assortative mating ... inflates observed heritability by ~20% for height, ~22% for educational attainment, ~36% for schizophrenia, ~33% for ADHD, ~36% for autism"; now "assortative mating ... creates linkage between trait-relevant alleles, contributing a Crow-Felsenstein V(A_LD)/V(A_AM) share of ~20% for height, ~22% for educational attainment, ~36% for schizophrenia, ~33% for ADHD, ~36% for autism. These percentages are population-level decompositions of V(A) at AM equilibrium — *not* \"fraction of twin h² explained by AM.\" Falconer's classical twin formula is itself biased downward by AM, and for socially-structured traits the empirical gap between twin h² and within-family h² is dominated by genetic nurture and equal-environments-assumption violations, not AM-induced LD"
- The same percentages are preserved (they are correct as Crow-Felsenstein population-level partitions); only the framing is corrected
Pass 9 2026-04-29

internal consistency check

Why Pass 8 fixed the data MDX TLDR's wrong-direction AM framing but did not check the data findings panel React component (PsychVariationData.tsx) that ships alongside it. The H2 AM partition tab's description prose still said "AM-LD accounts for >19% of total observed h²" — same wrong-direction framing where "observed h²" without qualification implies Falconer twin h² (which AM actually biases downward) rather than population-level V(A_AM). The findings panel is what users actually see when they click the H2 tab in the data stage.
- PsychVariationData.tsx H2 AM partition tab description rewritten: was "AM-LD accounts for >19% of total observed h²"; now "the Crow-Felsenstein partition predicts >19% V(A_LD)/V(A_AM) at the population level" with explicit caveat that AM does NOT inflate Falconer twin h² (it biases Falconer downward) and the empirical twin-vs-within-family gap for socially-structured traits is dominated by genetic nurture / EEA, with pointer to the H2 caveat in the MDX prose
- Did NOT change the H2 NumberCard hints ("V(A_LD) / h² (Yengo 2018: 14–23%)" etc.): these are correct as Crow-Felsenstein population-level partition labels; the qualifier "V(A_LD) / h²" makes the population-level scope explicit
- Did NOT change H6 cross-trait AM inflation tab: that addresses cross-disorder rg inflation (Border 2022), a separate and well-supported phenomenon

pass 6

A reader's tool for the psychology of individual differences. Pick a trait, see the three plain-language buckets (direct genes / family setup / environment + chance) instead of the V(A_d)/V(A_LD)/V(A_i) decomposition. Plus the four motivated-reasoning traps the field gets caught in, the asymmetric environmental-effects finding, three "heritability ≠ destiny" misreadings, and a seven-bullet take-away. Translates the formalization and data pipeline into something a non-specialist can actually use.

TLDR

The model formalization produced one equation per person and seven variance components. The data pipeline produced eight tested predictions and seven downloadable CSVs. Both are correct, both are useful for someone who already speaks the vocabulary, and neither does what the topic statement asked: produce something useful for someone who wants to understand how and why people differ without being captured by motivated reasoning from any direction.

This build is that translation layer. It collapses the seven-component variance decomposition into three plain-language buckets — direct genes, family setup, environment + chance — picks the ten traits a reader most likely cares about, and for each one shows the bucket breakdown, the key environmental levers (when relevant), and the two specific ways the most common political readings of that trait go wrong. Plus four secondary views: the four motivated-reasoning traps the field gets caught in (with what each side cites correctly and ignores), the asymmetric environmental-effects finding (severe insults cost 10–30 IQ points; enrichment above normal yields a few at most — the single most useful action-oriented insight), three “heritability ≠ destiny” misreadings with worked examples, and a seven-bullet take-away that holds up across mainstream behavior genetics in 2026.

If you want to engage with the math, the model stage has the parametric dashboard and the data stage has the prediction-by-prediction empirical tests. This page is for the reader who wants to come away knowing what to actually believe.

Pick a trait

Cognition

Personality

Wellbeing & affect

Psychiatric

Behavioral

Attitudes

Physical

Cognitive ability — adults

Why people differ in cognitive ability as adults is mostly genetic at the population level — but a sizeable chunk of what twin studies count as 'genetic' is actually the family setup parents create, not direct biological causation.

Why people differ — three buckets

Direct genes50%

Family setup34%

Env + chance16%

Direct genes50%

The slice that's actually direct biological causation. What within-family designs (sibling-fixed-effect, MZ-discordant, parent-offspring trio GWAS) recover after stripping out parental environment and assortative-mating-induced linkage.

Family setup34%

Most of this bucket is genetic nurture — parents who pass on cognitive-ability variants also create environments correlated with those variants (vocabulary, books, expectations, peer-group selection). Classical twin models cannot easily separate this from direct biological causation. Within-family GWAS for cognition recovers ~0.50, substantially below twin h² of 0.79; the gap is dominated by genetic-nurture leakage. About ~5% is residual shared family environment that persists into adulthood. Assortative mating (m=0.44 for IQ) does inflate population-level V(A) via LD but biases Falconer's twin formula downward, partially canceling rather than adding to the gap.

Environment + chance16%

Most of this small bucket is unmeasured developmental noise. Identified large levers (severe deprivation, lead, fetal alcohol syndrome) account for almost no population variance in modern Western samples because their prevalence is now low.

Severe negative levers (when present)

Prenatal alcohol (full FAS)−30 IQ ptsStreissguth 2004
Severe deprivation (Romanian orphanages)−15 IQ ptsNelson 2007 BEIP
Lead, blood 1→10 µg/dL−6.2 IQ ptsLanphear 2005

Positive levers

Schooling, per year+1 to +5 IQ ptsRitchie & Tucker-Drob 2018
Within-Western-normal parenting~0 to +1 IQ ptsPlomin & Daniels 1987

What environmentalist readings get wrong here

'Heritability is just methodological artifact' is not what the evidence shows. SNP-based heritability bypasses twin-design assumptions and recovers most of twin h²; adoption studies converge on similar numbers. The signal is real. But citing 0.79 as if it means 'genes determine 79% of cognitive ability' confuses a population-variance ratio with an individual partition. Both moves drop information.

What hereditarian readings get wrong here

Citing 0.79 to argue 'environment doesn't matter much for cognition' ignores that ~37% of the 'genetic' bucket disappears when you switch to within-family designs. The direct-biological component is closer to ~50%, and the gap to twin h² is dominated by genetic nurture and equal-environments-assumption violations rather than direct biological causation.

Take away

About half of why adults differ in cognitive ability is direct genetic effect; another ~35% is the family setup that genetically-similar parents create around their kids; ~15% is everything else. The interesting policy levers are at the tails (preventing severe insults like lead, malnutrition, fetal alcohol, and severe deprivation), not at the middle (parenting style within Western normal).

Primary sources

How to use this

The default view is trait lookup. Pick a trait — adult cognitive ability, schizophrenia, height, political orientation — and see the three-bucket breakdown plus the trait-specific traps and take-away. Most readers should start there, then move through the four secondary views in order.

A few framing notes:

The three buckets are not orthogonal categories of cause. They are three plain-language groupings of the seven model-formalization variance terms (A_d, A_LD, A_i, C, E_m, E_s, I). Direct genes is the within-family direct-effect slice — the part that is unambiguously direct biological causation. Family setup combines AM-induced LD, genetic nurture, and residual shared environment — all the things that get counted as “genetic” in twin studies but are not direct biological causation. Environment + chance combines measured non-shared environment and stochastic developmental noise. The split is pedagogical; the model shows the underlying seven-term decomposition.

The four-traps view is opinionated in a way the other views are not. The label “trap” assumes that motivated reasoning is what produces these positions, which is not entirely fair — most people cite the evidence they have seen and have not personally vetted what they have not seen. The integrated reading at the bottom of each trap card is the closest the artifact comes to a normative claim about how the field should be read, and it is not algorithmically derivable from the data alone. If you disagree with one of the integrated readings, the topology stage has the underlying graph.

The asymmetry finding is the most action-relevant single insight in the topic. If you only take one thing away from this work, take that one — the population-level cognitive levers run almost entirely through preventing severe insults, not through optimizing within normal. Most parental anxiety and policy expenditure on enrichment is misallocated relative to where the empirical effect sizes are.

The seven take-aways are calibrated to be the things a behavior-geneticist in 2026 would actually defend in a public talk. Finer-grained claims (specific magnitudes per trait, mechanism per finding, what polygenic scores measure causally) sit downstream of these and are more contested.

What this is not

It is not a prediction tool. There is no model that takes your demographics, your parents’ phenotypes, or your DNA and outputs a predicted trait value. The science does not currently support that for psychological traits, and the data stage shows why — polygenic scores trained on European-ancestry data lose 30–80% of their accuracy across other ancestries, and within-family direct effects are often less than half of population-level prediction.

It is not policy advice. The asymmetry finding has clear implications for cognitive intervention (lead remediation has higher effect-per-dollar than enrichment programs), but turning empirical asymmetries into policy involves trade-offs the science does not adjudicate.

It is not a complete picture. Three open questions named in the model stage (the Plomin/Turkheimer dispute about what polygenic scores measure, the mechanism behind the Gender Equality Paradox, the magnitude of assortative-mating correction across the full psychiatric cross-disorder rg matrix) are not answered here because the field has not answered them. The honest reading is “we don’t know yet”; the artifact does not pretend otherwise.

Connection to the rest of the pipeline

The trait-lookup numbers are computed directly from public/data/human-psych-variation/heritability_estimates.csv (the Stage-4 input), with the H2 partition (V(A_LD) = m·h²) and the genetic-nurture identity (V(A_i) = (β_i/β_d)² · V(A_d)) applied as in the model formalization §3.3 pass 5. The asymmetry view’s exposure list comes from environmental_effects.csv (the H7 input). The four-traps view materializes the topology stage’s Variant D distortion-to-target matrix (D1–D4) into reader-facing cards.

A future stretch would promote some of this to /dashboards/human-psych-variation as a public dashboard that lets the visitor enter their own per-trait estimates and see the buckets recompute. That is one of the planned dashboard slots in the site PRD but is out of scope for the first build.

Read full stage →

Iteration history

Pass 1 2026-04-28

decompositiontranslationintegration

Why First draft of the build artifact. The model stage produced a parametric variance-decomposition dashboard with V(A_d), V(A_LD), V(A_i), V(C), V(E_m), V(E_s), V(I) sliders; the data stage produced an eight-tab findings panel with hand-rolled SVG charts of the H1–H8 prediction tests. Both are genuinely useful for the reader who already knows the vocabulary, but neither delivers what the topic statement actually asked for: "useful for someone who wants to understand how and why people differ" with "implications for understanding the world or for action." This build draws straight lines from the formalization+data into a reader's tool that translates the seven decomposition terms into three plain-language buckets, surfaces the four directions of motivated reasoning explicitly, and ends with seven take-aways that hold up across the field.
- Built PsychVariationExplorer.tsx with five views: trait lookup (default), four-traps map, the asymmetry, heritability ≠ destiny, take-aways. ~720 lines, V4 design tokens, no new chart libraries (only the existing tailwind+SVG palette)
- Trait lookup translates the seven model-formalization variance components into three plain-language buckets — Direct genes (V(A_d)), Family setup (V(A_LD) + V(A_i) + V(C)), Environment + chance (V(E_m) + V(E_s) + V(I)). For each of 10 traits (cognition adult/child, EA, Big Five, schizophrenia, MDD, ADHD, autism, height, political orientation), the bucket numbers are computed from the Stage-4 CSV (h²_observed, V(A_LD)/h² share, β_i/β_d, c²_adult)
- Each trait has: one-sentence plain-language framing, breakdown of what is in the family-setup bucket (AM-LD vs. nurture vs. shared env), key environmental levers (insults / enrichments) where relevant, two trap callouts (one from each direction), one take-away, primary sources with paper URLs
- Four-traps view materializes the topology's D1–D4 distortion vectors as full cards: what the position cites correctly, what it ignores, integrated reading. The "integrated reading" is what survives when both the cited and the ignored evidence are added back. This is the closest the artifact comes to the topic statement's "minefield of motivated reasoning on all sides" framing
- The Asymmetry view is the H7 environmental-effects finding (severe insults cost 10–30 IQ pts; enrichment yields 1–5 pts) rendered as a single forest-plot-style chart with worked implications for parents and policy. This is the single most useful action-oriented insight in the whole topic
- Heritability ≠ Destiny view formalizes the three most common public-discourse misreadings: (a) population variance treated as individual partition (L1 firewall), (b) heritable treated as fixed (height +10 cm in a century at h² = 0.85), (c) within-population h² applied to between-population means (L4 / Lewontin firewall). Each gets a worked example
- Take-aways view is seven bullets that hold up across mainstream behavior genetics in 2026. Footer links back to the model and data stages
- Updated PRD topic registry from "data (pass 6)" to "build (pass 1)"; added decisions log entry
- Did not duplicate the model dashboard or data findings panel — those remain the technical/parametric layer; this build is the lay translation layer. A reader can move from the explorer into the model dashboard if they want to see what happens when the seven model parameters move
Pass 2 2026-04-29

error checkgap scanadversarial + steelman

Why On a careful re-derivation, I caught a real computational error in pass 1: I was adding V(A_i) on top of h²_observed in the variance budget, when the model formalization §2.1/§2.2 says V(A_i) under correctly-specified ACE lands inside V(C) (because it's shared identically by MZ and DZ co-twins). This double-counted V(A_i) and most visibly under-counted the environment+chance bucket for adult IQ (0.08, way too low — should be ~0.16). Separately, for traits with measured within-family h² (educational attainment and height), I was using the formula V(A_d) = h²·(1−r_δ) instead of the published WF figure, even though the model's H1 result says WF is the canonical direct estimate. This made height's "Direct genes" bucket 0.68 when it should be 0.78 (Howe 2022 within-sibship, N=178k). Finally, the user asked for substantially more traits — particularly emotional and social ones — because the bucket-breakdown pattern is what makes the artifact actually useful and showing it across more traits surfaces more of the cross-trait variation the field has documented.
- Fixed variance bucket computation via Method C: direct = within-family h² (when published or extrapolated) or h²·(1−m·h²) formula fallback; family = h²_obs − direct + V(C); env = 1 − h²_obs − V(C). This avoids the double-count of V(A_i) and matches the model formalization's pass-5 reading
- Most-changed numbers: adult IQ env 0.08 → 0.16; height direct 0.68 → 0.78 (using Howe 2022 measured WF) and family 0.17 → 0.07 (Howe within-sibship picks up rare-variant transmission, smaller structural inflation than formula predicts); minor adjustments elsewhere
- Expanded trait list from 10 to 24 across 7 domains (was 5): added 5 individual Big Five subdimensions (O, C, E, A, N) plus self-control and empathy in personality; subjective wellbeing and anxiety in a new "wellbeing & affect" domain; bipolar in psychiatric; smoking initiation and risk tolerance in a new "behavioral" domain; religiosity in attitudes; BMI in physical
- Updated take-away #3 from "1/3 to 1/2 of genetic effect is structural" to "8% to 60%+ depending on trait" with explicit examples (height: 8%; adult IQ: 37%; EA: 63%) — the original range was wrong (too narrow) and the trait-by-trait variation is itself part of the take-away
- Strengthened D1 and D2 trap-card "what it cites correctly" lists with sophisticated steelmen — D1 now includes population stratification concerns and instrument-cultural-validity; D2 now includes within-homogeneous-sample PGS prediction signal and cross-cultural mean-difference patterns. The integrated readings already addressed these but the cited-correctly columns made the steelmen stronger
- Added link from take-aways view to the writeup stage; updated link list to be all four downstream stages (model, data, build, writeup)
- Audit table in stage_outputs/human-psych-variation/build.md updated with all 24 traits and Method C numbers
Pass 3 2026-04-29

internal consistency checkcross-stage sync

Why On a careful pass over the explorer with the writeup at hand, I caught four places where the explorer's prose carried forward old imprecise numbers that I had already corrected in the writeup at pass 2 (smoking 80% should be 70% over sixty years; Flynn Effect "+30 points" should have cohort scope; PGS portability "30-80%" should be the precise Martin 2019 numbers). Plus the writeup at pass 3 added a Wilson Effect section as a new big idea, but the explorer's take-away 5 still framed "high heritability is fully compatible with large environmental shifts" only as cohort-level shifts without including the within-life-course developmental version. These are cross-stage internal-consistency issues that an academic reader would catch immediately.
- Take-away 5 retitled and rewritten: "Heritability is contextual: it shifts across the life course and across cohorts." Body now covers both the Wilson Effect (cognitive-ability h² rising from ~0.20 at age 5 to ~0.80 in adulthood as imposed environments give way to self-selected ones) and the cross-cohort shifts (height +10 cm, IQ +25-30 points across mid-20th-century cohorts, smoking US prevalence falling from ~42% to ~12% over sixty years)
- Take-away 7 (Lewontin firewall): replaced "polygenic scores lose 30-80% of their accuracy across ancestries" with the precise Martin 2019 numbers (37% / 50% / 78% in South Asian / East Asian / African ancestries) plus the Ding 2023 r = -0.95 continuous-distance result
- Heritability ≠ destiny "Misreading 2" example: replaced the imprecise "Flynn Effect raised IQ ~30 points over the 20th century in most populations" with the cohort-scoped "+25-30 points across mid-20th-century cohorts in most measured populations (Pietschnig & Voracek 2015 meta), with plateaus and partial reversals in some countries from the 1990s onward"
- "Misreading 3" within-population claim: replaced "PGS lose 30-80% accuracy" with the precise Martin 2019 ancestry breakdown
Pass 4 2026-04-29

error checktruth/accuracy override on biascross-stage sync

Why A reviewer pointed out a real and consequential error that had propagated through the explorer (and the writeup): I attributed the structural inflation of twin-based heritability — the gap between twin h² and within-family h² for socially-structured traits — partly to assortative mating creating linkage between trait-relevant alleles. The reviewer correctly noted that AM biases Falconer's twin formula 2(rMZ - rDZ) DOWNWARD, not upward: under positive AM, parents are genetically more similar than chance, so DZ twins share more than 50% of trait-relevant alleles, raising rDZ relative to rMZ and shrinking the formula's output. The dominant source of the twin-vs-within-family gap for socially-structured traits is genetic nurture (parents passing on alleles AND correlated rearing environments), with direct empirical anchors in Kong 2018 (non-transmitted PGS effect = 29.9% of transmitted) and Okbay 2022 (within-family direct ~50% of population PGI). AM is a real population-level phenomenon (Crow-Felsenstein LD inflation; Yengo 2018: 14-23% V(A) inflation for height) but its effect on twin estimates runs OPPOSITE to genetic nurture's — they partially cancel. The error propagated through take-away 3 and through most trait familyNote entries that mentioned AM as the source of structural inflation.
- Take-away 3 body rewritten to lead with genetic nurture as the dominant source of twin-h² inflation for socially-structured traits, with AM clarified as a real population-level phenomenon (V(A) inflation via LD) but with the correct opposite-direction effect on Falconer's twin formula
- Adult IQ familyNote rewritten: removed the wrong-direction "28% structural inflation from assortative mating" framing; now leads with genetic nurture, includes AM with the correct opposite-direction caveat
- Educational attainment familyNote rewritten: now leads with Kong 2018 / Okbay 2022 as direct empirical evidence for genetic nurture, AM is mentioned with the correct caveat about Falconer downward bias
- Schizophrenia, ADHD, and autism familyNote entries rewritten: previously attributed ~33-36% V(A_LD) share to AM as if it were the source of the twin-vs-direct-biology gap; now correctly framed as Crow-Felsenstein population-level LD prediction (the formula partition is mathematically valid as a V(A) decomposition) with explicit acknowledgment that no within-family GWAS at scale exists for these traits, so the genetic-nurture contribution is unmeasured. The Border 2022 cross-trait AM finding (about between-trait LD inflating cross-disorder rg) is preserved correctly because that is a separate and well-supported phenomenon
- Big Five subdimension familyNote entries (openness, conscientiousness, extraversion, agreeableness, neuroticism) rewritten to lead with "genetic nurture for personality is essentially zero" as the reason the family bucket is small, with AM mentioned correctly as not driving the gap
- Self-control, empathy, subjective wellbeing familyNote entries rewritten to remove AM-as-source framing
- Religiosity, political orientation familyNote entries rewritten to lead with shared family environment / cultural transmission (the actual dominant contributor for high-c²-adult traits) instead of AM
- Smoking initiation familyNote rewritten to lead with cultural transmission and the empirical twin-vs-WF gap (Howe 2022) instead of AM-as-driver
- EA trapHer entry updated to attribute the inflation to genetic nurture leakage with Kong/Okbay anchors instead of "assortative-mating + genetic-nurture"
- Schizophrenia trapHer and takeaway entries updated: the within-trait genetic-nurture share is acknowledged as unmeasured (no WF GWAS at scale); the Border 2022 xAM finding is preserved as it correctly addresses cross-disorder rg, not within-trait inflation
- ADHD and autism trapHer/takeaway entries similarly updated
- Did NOT change Border 2022 cross-trait AM mentions — those address cross-disorder rg inflation via between-trait LD, which is a real and well-supported finding (separate from the within-trait twin-h² question)
- Did NOT change the variance bucket numbers — those were computed via Method C (within-family h² when published, formula fallback otherwise) and are mathematically correct as a partition of V(P). Only the prose attribution of which mechanism drives each bucket needed correction
- Flagged for separate refinement: the model formalization §3.1 partition formula V(A_LD) = m·h² is mathematically valid as a Crow-Felsenstein population-level decomposition but its labeling there as "explains the twin-vs-WF gap" needs the same correction; the data stage's H2 prediction tests the partition formula (which holds) but its interpretation as "structural inflation of twin h²" propagates the same error
Pass 5 2026-04-29

internal consistency checktruth/accuracy override on bias

Why After pass 4 fixed the wrong-direction AM framing across most of the explorer's trait family-bucket notes and take-away 3, I did one more grep across the file looking for any remaining "structural inflation from like-pair mating" / "AM-induced linkage" / "AM-LD account for" type phrasing. Found 8 places that pass 4 missed: the iq_adult trapHer (still attributed gap partly to AM-LD), openness trapHer + takeaway (called AM "structural inflation"), schizophrenia and ADHD oneliners (still framed Crow-Felsenstein V(A_LD)/h² as "not direct biological causation"), religiosity takeaway and political_orientation trapHer (called AM "structural inflation"), BMI familyNote (attributed gap partly to AM-LD), and the D2 trap-card "what it ignores" entry (still said "genetic nurture + assortative-mating-induced linkage" as the source of structural inflation). Pass 4's sweep was systematic but not comprehensive — these were leftover instances I missed.
- iq_adult trapHer rewritten: was "the gap to twin h² is partly assortative-mating-induced linkage and partly parental-environment effects"; now "the gap to twin h² is dominated by genetic nurture and equal-environments-assumption violations rather than direct biological causation"
- openness trapHer + takeaway rewritten: was "strongest structural inflation from like-with-like pairing"; now correctly framed as strongest population-level V(A_LD) (AM-induced linkage) per the Crow-Felsenstein partition, with explicit note that AM does not on net inflate Falconer twin h²
- schizophrenia oneliner rewritten: was "about a third of the additive genetic variance is assortative-mating-induced linkage, not independent direct biological causation"; now "the Crow-Felsenstein partition predicts ~36% of population-level V(A) is AM-induced linkage; with no within-family GWAS at scale, the within-trait genetic-nurture contribution is unmeasured" — distinguishes the population-level V(A) decomposition from the (unmeasured) within-trait twin-vs-direct-biology gap
- ADHD oneliner rewritten in parallel structure to schizophrenia
- religiosity takeaway rewritten: was "substantial structural inflation from like-pair mating"; now "the dominant contributor is shared family environment / cultural transmission" with AM correctly framed as inflating population-level V(A) via LD
- political_orientation trapHer rewritten: was "structurally inflated by like-pair mating"; now correctly distinguishes the moderate direct-genetic share, the persistent shared family environment, and AM as a population-level V(A) inflator
- BMI familyNote rewritten: was "genetic-nurture-style effects through parental food environment plus AM-LD account for ~25 percentage points"; now "dominated by genetic-nurture-style effects" with AM correctly framed
- D2 trap-card "what it ignores" entry rewritten: was "40–60% of \"genetic\" effect ... is actually genetic nurture + assortative-mating-induced linkage"; now "30–60% (over 60% for EA specifically) is dominated by genetic nurture and EEA violations rather than direct biological causation; AM contributes population-level V(A) inflation but biases Falconer twin estimates downward, partially offsetting" — accurate to the corrected framing and consistent with the writeup section 4 D2
Pass 6 2026-04-29

internal consistency check

Why After pass 5 caught 8 leftover wrong-direction AM phrasings, a final cross-pipeline grep for "structural inflation" / "AM-induced linkage" / "inflates" surfaced two more in the explorer (bipolar familyNote and takeaway both said "low structural inflation" without the population-level / Falconer-bias clarification). Two non-explorer items were also caught in the same grep — the data stage TLDR and the model dashboard component — and fixed in their respective stages (data pass 8, model pass 8).
- Bipolar familyNote rewritten: was "Bipolar's assortative mating is weak ... so the structural inflation is much smaller than for schizophrenia / ADHD / autism. About two-thirds of variance is direct biology rather than structural family-setup effects" → now "Bipolar's assortative mating is weak (m=0.18) ... so the Crow-Felsenstein V(A_LD) population-level share is much smaller. About two-thirds of variance is direct biology in the formula partition; with no within-family GWAS at scale, the genetic-nurture contribution is unmeasured" — keeps the relative-comparison correctness but reframes "structural inflation" as the population-level V(A_LD) share rather than as a twin-h² inflation source
- Bipolar takeaway rewritten: was "high heritability, low structural inflation" → now "high heritability and a low Crow-Felsenstein V(A_LD) share at the population level"

pass 5

Long-form synthesis of the whole pipeline. What the science actually says about how and why people psychologically differ — written for an educated lay reader, with acronyms defined and the public-discourse traps spelled out. About 4,500 words.

TLDR

Behavior genetics has now had about fifty years of twin studies, twenty years of genome-wide DNA work, and the past five years of within-family designs that strip the structural inflation out of older “genetic” estimates. The science has converged on a picture of why people psychologically differ — and almost nobody in public describes it accurately. The headline finding is that heritability is real, replicated, and substantial across most psychological traits — but a sizable fraction of what gets called “genetic” in twin studies is actually environmental in origin, mediated through parents who transmit both the alleles AND the correlated rearing environment (a phenomenon called genetic nurture). Direct biological causation is genuine and important; it’s also typically smaller than the headline numbers suggest, especially for socially-structured traits like educational attainment, where the cleanest estimate of direct genetic effect is about one-third of what classical twin studies report.

Two findings should change how a non-specialist thinks about this field. First, environmental effects are dramatically asymmetric: severe insults — lead exposure, fetal alcohol syndrome, severe deprivation, malnutrition — each cost ten to thirty IQ points; enrichment above the modern Western normal range yields a few points at most. The big policy and parenting levers are at the negative tail (preventing severe insults), not at the middle (optimizing within normal). Second, high heritability is fully compatible with large environmental change at the population level: average adult height has risen about ten centimeters in a century at a within-cohort heritability of ~0.85, and average IQ rose roughly 25–30 points across mid-20th-century cohorts in most measured populations (with plateaus and partial reversals in some countries from the 1990s onward) at a within-cohort heritability of ~0.80. “Heritable” does not mean “fixed.”

Public discourse on this field is captured by four motivated-reasoning patterns: the blank-slate environmentalism that dismisses heritability as methodological artifact, the hereditarianism that treats genetic effect as biology-as-destiny and licenses between-population inference, the gender-similarities framing that cites small per-dimension sex differences while ignoring large multivariate ones, and the pop-evolutionary-psychology overreach that treats dimensional differences as categorical. Each cites real evidence and ignores real evidence. The honest reading requires holding all of it at once. The actionable layer is then narrower than any of the four traps imply: protect against severe environmental insults; do not over-invest in within-normal optimization; expect heritable traits to be substantially heritable but not fixed; do not extrapolate within-population variance ratios to between-population mean inferences.

The field is not done. Three real open questions remain — what polygenic scores actually measure causally, the mechanism behind the Gender Equality Paradox, and the magnitude of assortative-mating contamination across the psychiatric cross-disorder correlation matrix — and this writeup says so where it should rather than papering over them. The companion explorer lets you pick any of two dozen traits and see the variance breakdown; the model and data stages have the math and the empirical tests.

1. Why this field is a minefield

The question “why do people psychologically differ from each other” is one of the most heat-attracting questions in the social sciences, for reasons that have nothing to do with the science and everything to do with what a clean answer would license. Each direction of motivated reasoning has something at stake. People with a blank-slate intuition fear that admitting heritable differences exist licenses fatalism, eugenics, or political programs they find abhorrent. People with a hereditarian intuition fear that denying heritable differences licenses bad social policy, distorts family-formation incentives, or papers over evidence they consider straightforwardly true. People with a gender-similarities intuition fear that framing sex differences as substantial licenses sexism. People with a pop-evolutionary-psychology intuition fear that minimizing sex differences abandons what they consider robust biological reality.

What complicates the conversation further is that the evidence base contains material that supports each of these positions in some form — and that’s not a contradiction, it’s the natural shape of the data. Heritability is real (good for hereditarians); a lot of “genetic” effect dissolves under within-family designs (good for blank-slaters); single-dimension sex differences are typically small (good for similarities-framers); aggregated multivariate sex differences are large (good for evolutionary-psychology framers). The mistake every direction makes is selective citation: cite the evidence that supports your reading, ignore the evidence that doesn’t. The integrated reading is harder to load-bear but it’s the only one that actually fits the data.

The pipeline behind this writeup — five earlier stages of progressively more rigorous analysis — converges on a picture that is more nuanced than any single-direction narrative but that is nonetheless reasonably definite. There are things the field knows. There are things the field does not know but is converging on. There are things the field does not currently have the methods to know, and we should say so when that’s the case.

2. The vocabulary

Heritability, written h², is the single most-misunderstood statistic in this field. It is the fraction of variance in a trait, across people in a population, that tracks genetic differences between those people. It is a population-level statistic, not an individual partition. Saying “IQ is 70% heritable” does not mean “70% of any one person’s IQ is genetic.” It means “across this population, 70% of why people differ in IQ tracks genetic differences.”

The cleanest way to internalize this distinction: imagine 100 plants of identical genotype, raised in identical pots. The heritability of their height in this population is 0%, because all the variation between them comes from environmental factors (sun angle, water, soil chemistry). But for any single plant, asking “how much of its height is genetic” is meaningless. The genotype set the type of plant; the environment did the growing; neither percentage applies. Heritability is about the spread, not about any individual value. This applies with equal force to cognitive ability, personality, height, or anything else.

Within-population heritability also says nothing about between-population mean differences. If you plant the same genetic mix of corn in fertile soil and depleted soil, the within-each-plot heritability of height can be high (variation tracks genetics within each soil), while the difference between plot means is entirely environmental (the soil). The within-plot heritability tells you nothing about why the plot means differ. This is the Lewontin firewall, named after the geneticist who first laid it out cleanly in 1970, and it is a logical/algebraic point — not an empirical claim that can be falsified.

A few more terms before we go further:

Genome-wide association study (GWAS): a study that scans hundreds of thousands or millions of single-letter DNA variants — single nucleotide polymorphisms or SNPs — looking for statistical association with a measured trait.
Polygenic score (PGS): a per-person sum of trait-associated SNPs, weighted by their estimated effect sizes from a GWAS. Used as a predictor.
MZ and DZ twins: identical (monozygotic, ~100% shared DNA) and fraternal (dizygotic, ~50% shared DNA). Comparing how much more similar identical twins are than fraternal twins is the classical engine behind heritability estimates.
Assortative mating (AM): the phenomenon where partners resemble each other on a trait above chance. Educational attainment shows the strongest non-attitudinal AM signal at a partner correlation of about 0.55 (Horwitz 2023); political orientation shows the strongest of any trait at 0.58.
Gene-environment correlation (rGE): the phenomenon where genes and environments are not independent of each other. Passive rGE: parents transmit both genes and a correlated environment to offspring. Evocative rGE: heritable traits elicit certain reactions from others. Active rGE: people select environments matching their genetic propensities.
Educational attainment (EA): years of schooling completed. Used in this field as a measurable proxy for life outcomes that involve cognitive and conscientiousness loadings.
Within-family designs: comparing siblings, MZ-discordant twins, or parent-offspring trios within the same family. These control for between-family confounds — primarily the parental-environment effects mediated through shared parental genes (genetic nurture), plus assortative-mating-induced linkage at the population level — and produce the cleanest estimate of direct genetic effect.

With these in hand, the rest of the writeup should be readable.

3. Seven big ideas

Seven findings are robust enough that a careful reader should walk away believing them. The contested questions in this field sit at finer-grained resolutions; these seven are field-level consensus.

3.1 Heritability is real, replicated, and substantial

Across 17,804 traits measured in 14.5 million twin pairs across 2,748 publications (Polderman et al. 2015), the mean trait heritability is about 0.49. SNP-based heritability methods, which use unrelated individuals and bypass the assumptions twin studies make about twin environments being equally similar, recover a substantial fraction of twin-based heritability across major traits — about 60% for height with common SNPs alone (rising to ~80% when whole-genome sequencing captures rare variants), about 25–40% for cognitive ability, and about 30–50% for educational attainment. The fraction recovered is highest for traits with the simplest genetic architectures (height, BMI) and lowest for socially-structured traits where assortative mating and parental environments contribute substantially to twin estimates. Adoption studies — where children are reared by parents they share no genes with — recover heritability estimates broadly consistent with twin and SNP-based methods. Within-family GWAS, which compare siblings or trios and control for shared parental environment, find non-zero direct genetic effects across a range of traits including educational attainment, body mass index, height, and cognitive ability.

The “twin studies are bunk” position does not survive contact with the cumulative evidence. Heritability is real. The methodological critiques have force at the margin but cannot account for the convergence across designs.

3.2 But it’s a population statistic, not an individual partition

This was covered in section 2 but bears repeating because the failure to internalize it is the single most consequential public-discourse error about this field. “70% heritable” means “70% of why people differ in this population is genetic.” It does not mean “70% of any one person’s value is genetic.” Treating it as an individual partition produces nonsense in both directions: it overstates determinism for the hereditarian reading and overstates plasticity for the environmentalist reading. There is no individual percentage decomposition of “this person is X% genetic and Y% environmental.” That number does not exist.

3.3 Roughly 8% to 60%+ of “genetic” effect is structural inflation, depending on trait

This is the finding that most reshapes the picture once you know it, and it is the finding most absent from popular coverage. Twin studies measure resemblance between MZ and DZ twins and translate it into a heritability estimate using assumptions about how genes and environments combine. For socially-structured traits, this estimate substantially overstates the direct-biological-causation slice. The dominant reason is genetic nurture: parents who pass on certain alleles to their children also create environments correlated with those alleles — vocabulary, books, expectations, neighborhood choice, schooling. Classical twin models cannot easily separate this environmental contribution from direct genetic causation, because the genetic-nurture component is shared identically by MZ and DZ co-twins (they share parents) and tends to leak into the additive genetic variance estimate. Within-family designs strip it out by comparing siblings within the same family.

The empirical evidence is concrete and direct. Kong et al. 2018 (Science) compared the predictive power of parents’ transmitted polygenic scores (the alleles the offspring actually inherited) to parents’ non-transmitted polygenic scores (the alleles the offspring did not inherit but the parents still acted on environmentally) for educational attainment. The non-transmitted-allele effect was 29.9% of the transmitted effect — direct evidence that “genetic” prediction for socially-structured traits is partly mediated by parents’ environmental behaviors that correlate with their alleles. Okbay et al. 2022 EA4 (N=3M) showed the within-family direct effect for educational attainment is roughly half the population-level polygenic-score effect; the other half is environmental contamination via the home.

For educational attainment, the canonical twin-based heritability is ~0.40, while within-sibship heritability (Howe et al. 2022, the largest within-family study to date with 178,000 sibling pairs) is ~0.15. The 0.25 gap is dominated by genetic nurture, plus other classical-twin-design biases like the equal-environments assumption (MZ co-twins are treated more similarly than DZ co-twins, which inflates the MZ-DZ correlation gap that twin h² is computed from).

A second mechanism — assortative mating — is also real and worth understanding, but its effect on twin estimates is more counterintuitive than is sometimes claimed. People pair with similar partners (educational attainment shows the strongest non-attitudinal AM signal at m=0.55; political orientation shows the strongest of any trait at m=0.58), and this creates linkage between trait-relevant alleles in offspring (Yengo et al. 2018 estimates 14–23% inflation of population-level additive genetic variance for height). But the effect on Falconer’s classical twin formula 2(rMZ − rDZ) runs in the opposite direction from genetic nurture’s effect: under positive AM, fraternal twins share more than 50% of trait-relevant alleles (because their parents are more genetically similar than under random mating), which raises DZ correlation relative to MZ correlation and biases the formula downward. So while AM is a real source of LD inflation in the population’s V(A), it does not on net inflate the twin-vs-within-family gap — that gap is dominated by genetic nurture and EEA violations, and AM partially cancels rather than adds to them. (This is a subtle technical point that is genuinely confused in popular writing on the topic; the cross-trait variant of AM does inflate reported genetic correlations between disorders, which is the Border 2022 result, but that is about between-trait LD, not within-trait twin-h² estimates.)

Within-family designs are not assumption-free either — they assume siblings receive equally similar parental treatment and equally similar non-genetic exposures, which is approximately but not exactly true. But they remove the largest twin-design biases (the equal-environments assumption, genetic-nurture confounding, and AM-related complications) simultaneously, and across the published within-family studies the direct-effect estimates are mutually consistent across cohorts and methods. Treating within-family h² as the cleanest current estimate of direct biological causation is a defensible operational choice, not a perfect one.

The size of the twin-vs-within-family gap varies dramatically by trait. For height, where within-sibship heritability (0.78) is essentially as high as twin heritability (0.85), the structural-inflation share is small (~8%). For socially-structured traits like educational attainment, it’s large — the cleanest direct-biology estimate is about three-eighths of the twin-based number, meaning more than half of “genetic” effect on EA in twin studies is actually environmental in origin via genetic nurture. None of this means the underlying biology isn’t real — it means the headline numbers from older twin studies overstate the direct-causation slice for socially-structured traits, and the within-family literature is what made the correction possible.

3.4 Environmental effects are real and asymmetric, with insults dominating

Heritability findings and large environmental effects coexist without contradiction, and the way they coexist is dramatically asymmetric. The environmental effects on cognitive ability that have been measured most cleanly are these:

Severe insults: prenatal alcohol (full fetal alcohol syndrome, FAS) costs about 30 IQ points; severe deprivation in early childhood (the Romanian-orphanage cohort) costs about 15; severe chronic malnutrition costs about 15; adoption from a high-SES (socioeconomic status) family into a low-SES family costs about 12; severe iodine deficiency costs about 10; lead exposure (going from blood lead 1 to 10 µg/dL) costs about 6.
Within-normal enrichment: an additional year of schooling adds 1–5 IQ points (mean ≈ 3.4 in Ritchie & Tucker-Drob 2018’s meta-analysis of 600,000 participants); breastfeeding adds about 3 in the PROBIT randomized trial; parenting variation within the Western normal range adds roughly 0–1.

The asymmetry is the lesson. Removing severe insults recovers double-digit IQ points; enrichment above the Western normal range yields a few points at most. This is why the high-heritability findings of behavior genetics and the existence of large environmental effects are not contradictory: heritability is a population-variance statistic, and in any modern population that has already removed the worst environmental tails, most remaining variance is genetic — not because environment doesn’t matter, but because you already removed the environmental factors that mattered most. The variance contribution of fetal alcohol syndrome to a Norwegian sample’s cognitive variance is small not because FAS doesn’t matter for the affected child (it matters by 30 points) but because almost no Norwegian children have it.

For policy this means the highest-effect-per-dollar interventions are at the negative tail: lead remediation, iodine fortification, fetal-alcohol prevention, basic nutrition, schooling access. For parents this means anxiety about “optimizing” within normal is mostly misallocated: the big lever is preventing severe insults, not perfecting parenting style. The explorer’s “Asymmetry” view renders the full exposure list as a single forest plot sorted by effect size, with implications broken out for parents and policy.

3.5 Heritability is developmental, not static — the Wilson Effect

The cognitive-ability heritability number cited in popular coverage — “IQ is 70-80% heritable” — is the adult number. In children, heritability is much lower. Heritability of cognitive ability rises along a smooth logistic curve from about 0.20 at age five to about 0.80 in adulthood, an empirical pattern called the Wilson Effect after the developmental psychologist who first described it. Bouchard 2013 fit this curve to seven anchor ages and recovered the parameters cleanly: heritability is about 0.20 at age 5, 0.46 at age 10, 0.69 at age 15, and 0.79 at age 25.

The mechanism is not that genetic effects “turn on” with age. It is that shared family environment dominates in childhood and gets crowded out as children gain agency over their own environments. A small child’s reading material, schooling, and peer group are mostly chosen for them by their parents. A teenager’s are mostly chosen by themselves — and the choices they make track their genetic propensities, amplifying the apparent genetic signal (a phenomenon called active gene-environment correlation). The same genome that produces ~20% heritability at age five produces ~80% heritability at age twenty-five not because the genes have done more, but because the environment has shifted from imposed to self-selected.

The implication is that childhood is environmentally most malleable. The same environmental shift produces a much larger effect on a five-year-old than on a twenty-five-year-old, because the child has not yet shifted into self-selected environment mode. Severe environmental insults landing during developmental windows (lead poisoning at age 2, severe deprivation at age 4) leave permanent marks; the same insults landing on adults are smaller in effect. Conversely, “remediation” interventions that work well on children frequently fail on adults because the developmental window has closed. The asymmetric environmental-effects finding from the previous section is largest in early childhood and shrinks across the life course. (Compare child vs. adult cognitive ability in the explorer to see the bucket shift in concrete numbers.)

3.6 High heritability is fully compatible with large environmental shifts

The Wilson Effect is the within-life-course version of a more general truth: heritability is context-dependent. The same shape shows up across cohorts.

The cleanest demonstration is height. Within any modern Western country, about 85% of why adults differ in height tracks genetic differences. Average adult height has risen about ten centimeters in a century — entirely from environmental change (nutrition, infection control, prenatal care). The same heritability that “shows height is genetic” coexists with one of the largest environmental shifts in any biological trait. The within-cohort heritability and the between-cohort secular rise are not in conflict; they answer different questions.

The same logic applies to cognitive ability. The Flynn Effect raised average measured IQ by roughly 25–30 points across mid-20th-century cohorts in most measured populations (Pietschnig & Voracek 2015 meta-analysis: ~2.3 IQ points per decade across 105 samples), in populations whose within-cohort IQ heritability remained in the 0.7–0.8 range. The pattern has slowed and partially reversed in some countries from the 1990s onward, the cause of which is itself an open question — but the same-genes-different-environment-different-mean pattern is the lesson. Smoking shows the same pattern: heritability of smoking initiation is about 0.50 within any modern cohort, and US adult smoking prevalence fell from about 42% in 1965 to about 12% today — a roughly 70% reduction over sixty years from taxation, public-smoking restrictions, and shifting norms. Heritable does not mean fixed. This is one of the most important things to internalize about this field, and one of the things most consistently mishandled in public coverage.

3.7 Within-population heritability does not license between-population claims

This is the Lewontin firewall, and it is unfalsifiable — a logical/algebraic point, not an empirical claim. Within-population heritability provides no information, by itself, about whether between-population mean differences have a genetic component. The math literally does not connect the two quantities.

The empirical buttress to the logical point is that polygenic scores — the molecular-genetics tool that would in principle let researchers ask the between-population question — lose accuracy when applied across ancestries, and the loss is substantial. Martin et al. 2019 reports relative-accuracy reductions of 37% in South Asian, 50% in East Asian, and 78% in African ancestries compared to European training, averaged across major traits. Ding et al. 2023 (Nature, 84 traits, 524,000 individuals) extended this finding to a continuous distance scale and found a Pearson correlation of −0.95 between genetic distance from the European-ancestry training population and PGS prediction accuracy. The same SNP “effect sizes” do not estimate the same causal coefficients across populations. The methods that would license a between-population genetic comparison demonstrably do not work across populations as currently constructed.

The honest position on between-population mean differences: in 2026, the science is not currently equipped to answer the question in either direction. People who claim it has been answered, in either direction, are over-claiming relative to what the methods can do.

4. The four motivated-reasoning traps

The pipeline’s topology stage maps four directions of public-discourse motivated reasoning explicitly. Each cites real evidence; each ignores real evidence; each can be steel-manned into a more defensible position that mostly aligns with the integrated reading the science actually supports.

The blank-slate / pure-environmentalist position claims that psychological differences are mostly socialization, that twin studies are flawed, and that heritability is a methodological artifact. Cited correctly: the equal-environments assumption in twin studies is partially violated, adoption studies have selection effects, cultural variation in trait expression is real, stereotype threat exists. Ignored: SNP-based heritability bypasses the twin-design assumptions and recovers most of twin h² across major traits; adoption studies converge on similar estimates; within-family GWAS finds non-zero direct genetic effects; severe psychiatric conditions show heritability of 0.79–0.80 across cultures. The integrated reading: the methodological critiques have force at the margin but cannot account for the convergence across designs. The honest version of this position survives: “population-level genetic variance ratios are real, but they don’t license the moves people make from them — individual partition, between-population inference, fixed-trait reading.” That’s true, and is exactly what the science says when stated carefully.

The hereditarian position claims that differences are mostly genetic, that group disparities reflect underlying biology, and that environment is overrated. Cited correctly: mean trait heritability is 0.49 across 17,804 traits, twin studies replicate, GWAS hits replicate, within-family designs find non-zero direct effects. Ignored: 30–60% of “genetic” effect for socially-structured traits is structural inflation rather than direct biology (with educational attainment specifically over 60%); PGS portability collapse blocks between-population inference empirically; the Lewontin firewall blocks it logically; high heritability coexists with large environmental shifts (height +10 cm, IQ +25-30 points across mid-20th-century cohorts); severe environmental insults each cost double-digit IQ points; cross-trait assortative mating accounts for ~74% of variance in reported psychiatric cross-disorder genetic correlations (Border 2022). The integrated reading: heritability is real and substantial, the within-population claim survives, but the move to “between-population means are genetic” is blocked twice (logically and empirically), and the move to “fixed at individual level” is blocked by the asymmetric environmental-effects finding and the Wilson Effect (heritability is developmental). The honest version: “within-population genetic variance is real and substantial, period.” Which is true.

The gender-similarities (single-dimension) framing claims that sex differences are tiny, citing math performance d ≈ 0.05 and similar small per-dimension effects. Cited correctly: math, verbal, and many specific cognitive-task differences are small; Hyde 2005’s similarities hypothesis is empirically supported for most single dimensions. Ignored: the people-things interest difference is d ≈ 0.93, one of the largest effect sizes in psychology (Su 2009, N = 503,000); aggregated across 15 personality dimensions with realistic inter-trait correlations, the multivariate Mahalanobis distance between male and female means is D ≈ 1.0 at the observed level and D ≈ 2.7 at the latent (measurement-error-corrected) level — large by any standard; the Gender Equality Paradox (Herlitz 2025 systematic review) finds differences are larger in more egalitarian societies, which is hard to reconcile with pure-socialization predictions. The integrated reading: both Hyde 2005 and the multivariate-D literature are correct about different objects. On any single dimension, sex differences are small. Aggregated across many weakly-correlated dimensions, the multivariate distance is large. Both halves are true; the trap from each side picks one and ignores the other.

The pop-evolutionary-psychology overreach claims that “men are X, women are Y,” that differences are categorical and evolved, and that they predict at the individual level. Cited correctly: multivariate D ≈ 2.7, people-things d ≈ 0.93, cross-cultural replication of mean differences, biological-developmental data (girls with congenital adrenal hyperplasia show masculinized toy preferences). Ignored: psychological variation is dimensional, not taxonic — there are no two clean categories; distribution overlap at D = 1.0 is ~60%, at D = 2.7 still ~18% — “categorical” is the wrong shape; effect-size labels are scale-dependent; Mahalanobis D is a model-relative summary statistic that depends on which traits are measured. The integrated reading: aggregate sex differences are real and large, but “categorical” misrepresents the shape, individual prediction from group membership is poor, and the headline D depends on the measurement panel. The honest version: “aggregate multivariate sex differences are substantial, individual prediction from sex alone is weak.” Which is true, and which undermines the categorical reading.

The lesson across all four traps: they each work by selective citation. The integrated picture requires holding all of it at once — large heritability and large structural inflation, small per-dimension sex differences and large multivariate ones, high within-population heritability and a logical block on between-population inference. Any single-direction narrative is structurally incomplete. The explorer’s “Four traps” view has the full cited / ignored / integrated breakdown for each direction with trait-specific applications.

5. What’s still open

The field is not done. Three real open questions remain, and the writeup is more honest if it names them than if it papers over them.

What polygenic scores actually measure causally. The Plomin / Turkheimer dispute. Plomin’s reading: a within-family-validated polygenic score is a real biological cause. Turkheimer’s reading: even a within-family PGS is a summary of correlated environments and biological factors that the design can’t fully separate. Both readings predict the same variance budget, which is why the data hasn’t yet decided between them. The decisive test would be a within-family experiment that perturbs the environment and watches whether the PGS coefficient moves the way Plomin predicts (it shouldn’t) or the way Turkheimer predicts (it should). No such study has been run at scale. Until one is, this question is open.

The mechanism behind the Gender Equality Paradox. The empirical pattern — sex differences in personality, interests, and several other domains being larger in more gender-egalitarian societies — has strengthened across multiple replications (Herlitz et al. 2025 systematic review). Three live mechanism candidates: (a) innate-expression release in resource-rich environments (the “constraints removed” reading), (b) reference-group / self-anchoring artifacts in self-report measurement (people compare to their gender peers, not to humans-in-general, more in egalitarian societies), (c) wealth and freedom confounds that correlate with gender-equality indices. The pattern is robust; the mechanism is not.

The full assortative-mating-corrected psychiatric cross-disorder correlation matrix. Border et al. 2022 (Science) showed that cross-trait assortative mating accounts for about 74% of variance in reported psychiatric cross-disorder genetic correlations across 132 trait pairs in UK Biobank. Applied at scale to the full Psychiatric Genomics Consortium cross-disorder matrix, the corrected correlations would likely shrink, and some “shared underlying biology” claims about the p-factor and cross-disorder pleiotropy would weaken. As of late 2024 a method (LAVA-Knock; Ma, Wang, Border et al.) has emerged that systematically corrects for this. The full re-analysis is active research and likely answerable in 2-3 years.

A handful of other questions sit in the same “framable but not yet answerable” category — what “non-shared environment” actually is at the mechanism level, the cause of the Flynn Effect’s recent reversal in some cohorts, whether the positive manifold of cognitive ability is itself shifting across cohorts (Pietschnig 2024). The pipeline’s lit review and topology cover them in more detail.

6. What this means for action

The most action-relevant single insight in the topic is the asymmetry of environmental effects. Most parents and most policy operate as if the asymmetry runs the other way — as if optimizing within normal is where the leverage is. The data says it isn’t.

For parents. The big levers are at the negative tail. Prevent severe insults: lead exposure (still meaningfully present in some housing stock), prenatal alcohol (fetal alcohol exposure produces effects of about 30 IQ points), severe early malnutrition, untreated iodine deficiency, severe deprivation. Within the Western normal range, additional optimization of parenting style, enrichment activities, and educational supplements yields a few IQ points at most. The empirical literature finds the within-family contribution of “what parents do” to adult personality is essentially zero, and the within-family contribution to adult cognitive ability is small relative to direct biology and to schooling itself. Anxiety about “optimizing” within normal is mostly misallocated. This is not a license to be neglectful — neglect is itself a severe insult — but it is a license to relax about whether one is doing exactly the right enrichment activity. The big things are protecting against severe insults and ensuring schooling. (See the explorer’s child cognitive ability trait for the variance breakdown that supports this — at age 5 the family bucket is ~52% of variance and shrinks to ~34% by adulthood, while the actionable environmental tail concentrates in severe insults.)

For policy. Lead remediation, iodine fortification, fetal-alcohol prevention, basic nutrition, schooling access are the highest-effect-per-dollar cognitive interventions ever measured. Universal pre-K and similar middle-of-the-distribution interventions show genuine but smaller effects. Programs targeted at “enrichment above normal” generally do not move long-term outcomes at meaningful effect sizes. Public-health interventions on smoking show the analogue at the behavioral level: tobacco taxation, public-smoking bans, age-of-first-availability laws cut US adult smoking prevalence by ~70% over sixty years despite a within-cohort heritability of 0.50 — environmental change at population scale is not blocked by within-cohort heritability.

For individuals. The within-individual story splits cleanly along trait-class lines. Traits with moderate heritability and large environmental + chance contribution — depression, anxiety, neuroticism-related affect, self-control, subjective wellbeing — move at clinically meaningful effect sizes under behavioral or pharmacological intervention. CBT moves anxiety and depression at d ≈ 0.7 vs. control. Mindfulness, exercise, and behavioral activation move neuroticism-related outcomes modestly. Social connection, meaning, and physical activity move wellbeing baselines persistently. (See the anxiety, depression, and subjective wellbeing trait pages for the breakdowns.) Traits with high direct-genetic heritability and small environmental + chance contribution — adult cognitive ability, height, schizophrenia, autism — show much smaller within-individual responsiveness to intervention once the developmental window has closed. Cognitive ability post-adolescence does not move much from intervention; height post-adolescence doesn’t move at all; schizophrenia and autism are responsive to treatment in symptom management but not in underlying load. The “biology is destiny” framing is wrong (you have substantial behavioral leverage on the moderate-heritability traits); the “I can rewrite myself with willpower” framing also exceeds what the literature supports (the high-heritability traits don’t move much). The honest middle is trait-specific: know which side of this split your trait of interest sits on before deciding how much effort to invest.

7. Closing

The science of psychological variation is in better shape than its public discourse. Within the field, behavior geneticists, social-genomics researchers, and developmental psychologists have substantially converged on the picture this writeup describes. Outside the field, almost every direction of motivated reasoning continues to cite the slice of evidence it likes and ignore the rest. The earlier stages of the pipeline — the lit review, the topology, the model formalization, the data pipeline, and the interactive explorer — carry the technical detail behind every claim above.

What I would most want a reader to walk away with: a calibrated humility about what is known, a clean separation between what the science says and what motivated reasoning loads onto it, and the asymmetry finding. If the choice is between “I leave knowing the field is full of contested empirical claims” and “I leave knowing severe environmental insults are the big lever and within-normal optimization is mostly noise,” the second is more useful. The data supports both.

Read full stage →

Iteration history

Pass 1 2026-04-29

decompositiontranslationintegrationconnections

Why The pipeline produced a topology, a formalization, a data pipeline, and a build artifact. Each is correct and useful in its own register, but none alone is what an educated lay reader who wants to understand "how and why people psychologically differ" would actually pick up and read end-to-end. The writeup is that document — the long-form synthesis that takes everything earlier stages produced and renders it accessible without sacrificing the technical claims.
- Wrote a 3-paragraph TLDR up top covering the headline finding, the four traps, and what it means for action
- Section 1 frames the field as a politics-minefield with four motivated-reasoning directions, all of which cite real evidence but ignore other real evidence
- Section 2 defines the field-level vocabulary (heritability, GWAS, polygenic score, assortative mating, gene-environment correlation, etc.) in plain language, with the plant-and-pots analogy that anchors the population-variance-vs-individual-partition distinction
- Section 3 walks through the six big ideas the integrated picture supports — each illustrated with concrete numbers and at least one trait example
- Section 4 unpacks the four motivated-reasoning traps explicitly, with what each cites correctly, what it ignores, and the integrated reading
- Section 5 names the three open questions the field has not resolved (Plomin/Turkheimer on polygenic scores, the Gender Equality Paradox mechanism, full assortative-mating-corrected psychiatric cross-disorder correlations)
- Section 6 converts the empirical findings into action-relevant guidance for parents, policy, and individuals — the asymmetry finding is the single most consequential one
- Section 7 closes with calibrated humility and the connection to the rest of the pipeline
Pass 2 2026-04-29

error checkcross-context verificationconnectionsinternal consistency

Why Three numerical claims in pass 1 were directionally right but imprecise enough that an academic reader could push back. The smoking-prevalence-fall claim was overstated (said 80% in 50 years; the US data is closer to 70% over 60 years). The PGS-portability-decay range was widened to "30-80%" when the actual Martin 2019 numbers are 37/50/78% in three specific ancestries. The Flynn Effect "+30 points across the twentieth century" needed cohort scope because recent decades show plateaus or partial reversals in some countries. Separately, the writeup links to the explorer only once at the end — strategic inline links at the asymmetry section, the four-traps section, and the action section would actually help a reader use the explorer instead of treating it as supplementary.
- Fixed smoking claim from "rates collapsed 80% in 50 years" to "US smoking prevalence fell from ~42% in 1965 to ~12% today, about a 70% reduction over sixty years" — accurate to NHIS time-series
- Tightened PGS portability claim from "30-80%" to "37%, 50%, and 78% in South Asian, East Asian, and African ancestries respectively (Martin 2019)" with the accompanying Ding 2023 continuous-distance r=−0.95 across 84 traits
- Added cohort scope to Flynn Effect claim: "average IQ rose roughly 25-30 points across mid-20th-century cohorts in most measured populations, before plateaus and partial reversals in some countries from the 1990s onward (Pietschnig & Voracek 2015 meta-analysis)"
- Added inline link at end of section 3.4 to the explorer's "Asymmetry" view; readers who want to engage with the per-exposure forest plot are pointed to the interactive treatment
- Added inline link at end of section 4 to the explorer's "Four traps" view, which has the full cited/ignored/integrated breakdown for each direction with trait-specific applications
- Added inline links from section 6 (Action) to specific explorer trait pages — adult cognitive ability, schizophrenia, anxiety, height — so a reader can move from "what to do" to "what does this look like for the trait I care about"
- Tightened "for individuals" subsection in section 6: now distinguishes between traits that respond to intervention (anxiety, depression, neuroticism, self-control via behavior change) and traits that mostly do not at the within-individual level (cognitive ability post-childhood, personality after early adulthood)
- Verified internal consistency: all numbers in writeup match the explorer (Method C bucket numbers) and the data stage CSV — h² values, m values, structural-inflation percentages, environmental effect sizes, multivariate-D numbers
Pass 3 2026-04-29

gap scanerror checkadversarial + steelmancompression

Why Three things stood out on a careful re-read. First, the Wilson Effect (heritability of cognitive ability rising from ~20% at age 5 to ~80% in adulthood) is one of the most empirically robust findings in the field and was missing entirely from the writeup — without it, section 6 reads "adult cognition does not move from intervention" as fixed rather than as the back-end of a developmental curve. Second, the SNP h² recovery numbers in section 3.1 were directly inherited from the model formalization without independent verification — "50–70% for cognition" is wrong (common-SNP h² for IQ is ~0.25 against twin h² ~0.79, so recovery is ~25-40%), and "85% for height" conflated common-SNP recovery with WGS-inclusive recovery. Third, the writeup treats within-family h² as the canonical "direct biology" estimate but never acknowledges that within-family designs have their own assumptions — an academic reader would push on this, and brief acknowledgment improves credibility without weakening the argument.
- Added new section 3.5 "Heritability is developmental, not static — the Wilson Effect" (~250 words) covering the cognitive-ability heritability rise from ~0.20 at age 5 to ~0.80 in adulthood, the active gene-environment-correlation mechanism, and the policy implication that childhood is environmentally most malleable. Sets up the section 6 "for individuals" claim that adult cognitive ability does not move much from intervention as the back-end of a developmental curve, not a fixed-trait claim
- Renumbered the prior section 3.5 "Heritable doesn't mean fixed" to 3.6, and prior 3.6 "Lewontin firewall" to 3.7. Section 3 header updated from "Six big ideas" to "Seven big ideas"
- Fixed SNP h² recovery numbers in section 3.1: was "about 85% for height, 50–70% for cognition, 30–40% for educational attainment"; now "about 60% for height with common SNPs alone (rising to ~80% when whole-genome sequencing captures rare variants), about 25-40% for cognitive ability, about 30-50% for educational attainment" — verified against Yengo 2018, Davies 2018, Lee 2018
- Added brief within-family-design caveat to section 3.3: within-family designs are not assumption-free either (siblings receive equally similar parental treatment and equally similar non-genetic exposures, approximately but not exactly), but they remove the largest twin-design biases and the published estimates are mutually consistent
- Tightened section 4 D2 "40-60% structural inflation for socially-structured traits" to "30-60%, with educational attainment specifically over 60%" — matches the explorer's trait-specific numbers more accurately
- Compressed section 7 closing by ~70 words: removed redundant restatement of the take-aways the reader has just spent the last 4,000 words reading
Pass 4 2026-04-29

readabilityfresh-eyes audit

Why Two prose-level polishes that surfaced on a fresh-eyes read. The TLDR opening was a single 50+ word hold-the-thread sentence ("Across about fifty years of twin studies, twenty years of genome-wide DNA work, and the past five years of within-family designs that strip out the parts of...") — parseable but requires concentration to track three time-spans before the verb arrives. Splitting it into two sentences removes that load. Separately, the new section 3.5 (Wilson Effect) and section 3.6 (heritable ≠ fixed) are parallel ideas (heritability is context-dependent — within life-course; heritability is context-dependent — across cohorts) but the writeup had no transition between them; the reader sees them as two unrelated points rather than a single insight at two scales.
- TLDR opening sentence split: "Across about fifty years of twin studies, twenty years of genome-wide DNA work, and the past five years of within-family designs that strip out the parts of 'genetic' effect that aren't actually direct biology, the science of why people psychologically differ from one another has converged on a picture that almost nobody describes accurately in public" → "Behavior genetics has now had about fifty years of twin studies, twenty years of genome-wide DNA work, and the past five years of within-family designs that strip the structural inflation out of older 'genetic' estimates. The science has converged on a picture of why people psychologically differ — and almost nobody in public describes it accurately."
- Added bridge sentence at the start of section 3.6: "The Wilson Effect is the within-life-course version of a more general truth: heritability is context-dependent. The same shape shows up across cohorts." This reframes 3.5 and 3.6 as a single insight at two scales rather than two unrelated findings
Pass 5 2026-04-29

error checktruth/accuracy override on bias

Why A reviewer caught a real and consequential error in the framing of section 3.3 (and the parallel treatment in the explorer): the writeup attributed the structural inflation of twin-based heritability — the gap between twin h² and within-family h² for socially-structured traits — partly to assortative mating creating linkage between trait-relevant alleles. The reviewer pointed out that AM actually biases Falconer's twin formula 2(rMZ - rDZ) DOWNWARD, not upward: under positive AM, parents are genetically more similar than chance, so DZ twins share more than 50% of trait-relevant alleles, raising rDZ relative to rMZ and shrinking the formula's output. The dominant source of the twin-vs-within-family gap for socially-structured traits is actually genetic nurture (parents passing on alleles AND correlated rearing environments), with direct empirical anchors in Kong 2018 (non-transmitted PGS effect = 29.9% of transmitted) and Okbay 2022 EA4 (within-family direct ~50% of population PGI). AM is a real phenomenon at the V(A) population level (Crow-Felsenstein LD inflation; Yengo 2018 estimates 14-23% V(A) inflation for height) but its effect on twin estimates runs opposite to what I had it doing. This is an embarrassing error to have propagated through pass 4, but it's correctable now and the empirical decomposition (genetic nurture as the dominant gap source) actually has stronger primary-source backing than the AM-as-inflation claim ever did.
- Fixed writeup TLDR: was "a sizable fraction of what gets called genetic in twin studies is actually structural inflation from people pairing with similar partners and from the environments parents create" → now "a sizable fraction of what gets called genetic in twin studies is actually environmental in origin — specifically, environmental effects mediated through parents who transmit both the alleles AND the correlated rearing environment (a phenomenon called genetic nurture)"
- Substantially rewrote section 3.3: leads with genetic nurture as the dominant source of structural inflation in twin h² for socially-structured traits, with empirical anchors (Kong 2018: 29.9% non-transmitted PGS effect; Okbay 2022: within-family direct ~50% of population PGI). AM is now treated correctly: it is a real population-level phenomenon (Crow-Felsenstein LD inflation of V(A)) but its effect on Falconer's twin formula runs OPPOSITE to genetic nurture's effect — AM raises DZ correlation relative to MZ correlation, biasing 2(rMZ-rDZ) downward. The two effects partially cancel; the net twin-vs-within-family gap is dominated by genetic nurture (and other classical-ACE biases like EEA violations), not AM
- Fixed section 2 vocabulary entry on within-family designs: was "control for between-family confounds — including the assortative-mating-induced linkage and the parental-environment effects" → now "primarily the parental-environment effects mediated through shared parental genes (genetic nurture), plus assortative-mating-induced linkage at the population level"
- Did NOT change section 5's mention of "the full assortative-mating-corrected psychiatric cross-disorder correlation matrix" — the Border 2022 finding that cross-trait AM inflates cross-disorder rg estimates is a separate, well-supported phenomenon (xAM creates between-trait LD which inflates apparent rg). That part of the framing was correct
- Did NOT change section 4 D2 trap card's "30-60% structural inflation" claim — the empirical numbers are correct (twin-WF gap as percentage of twin h²); the claim doesn't attribute to AM specifically
- Flagged for separate refinement: the explorer's trait family-bucket notes attributed inflation to AM in many places (especially the AM-strong psychiatric traits and high-AM attitude traits); take-away 3 had the same wrong-direction framing. Fixed in the same commit but tracked under the build artifact's pass 4 refinement log
- Flagged for future passes (not done in this pass): the model formalization §3.1 partition formula V(A_LD) = m·h² is mathematically correct as a Crow-Felsenstein population-level decomposition of V(A_AM) into V(A_d) + V(A_LD), but its labeling as "explains the twin-vs-within-family gap" is misleading because Falconer's twin estimate is itself biased downward by AM. The data stage's H2 prediction tests the partition mathematically (which holds) but its interpretation as "AM-induced inflation of twin h²" needs the same correction. Both stages are at later refinement passes and would need their own log entries