Model pass 6

Model

Generator-verifier loop with a per-task autonomy slider. The per-task value function V(u, v; θ) decomposes into four orthogonal channels (quality, attention, risk, skill); V is exactly bilinear in (u, v) so per-task optima land at three corners — do-yourself, self-automator, spec-driven. Centaur and cyborg arise as aggregate-level patterns from cross-sub-task corner mixing. Portfolio aggregation under a daily attention budget surfaces the Lagrangian shadow price μ that reroutes longer tasks first when budget binds. Five cruxes, six Stage-4 fitting targets, three engaged objections. Interactive two-tab dashboard included.

TLDR

The lit review documents a research landscape; the topology stripped it down to load-bearing structure. This stage formalises the cleanest target the topology surfaced: a generator-verifier loop with a per-task autonomy slider, designed to survive capability change rather than encode a snapshot of any specific model’s frontier. The optimisation target is output quality per unit of human attention for an individual knowledge worker. The formalisation makes three moves at once — decomposition (four orthogonal value channels: quality, attention, risk, skill), generating function (a per-task value function whose corner solutions reproduce the five empirically observed workflow modes), and integration (a single formalism that composes Karpathy’s slider, Mollick’s typology, Vasconcelos verification-economics, Bastani’s atrophy, Bainbridge’s substitution myth, and Madras-Mozannar L2D into one object).

The per-task value function is V(u, v; θ) = Q(u,v) − α·A(u,v) + λ·S(u,v) − σ·R(u,v), where u is the autonomy level (fraction of the task delegated to AI) and v is the verification depth (fraction of AI output independently checked). The four channels are conceptually distinct mechanisms: quality Q rewards letting the better agent do the work, with verified output achieving the complementary-product ceiling c_⋆ = c_AI + (1 − c_AI)·c_H (either AI was right or human catches the error); attention A charges for human time and for the irreducible monitoring cost even at full delegation (the L1 substitution-myth invariant baked in via ε > 0); risk R penalises uncaught AI errors at a rate proportional to stakes σ; skill S rewards practice and penalises unverified delegation (Bastani-style atrophy). V turns out to be exactly bilinear in (u, v) — collecting terms gives V = K_0 + K_u·u + K_v·v + K_uv·u·v — which means the maximum on the unit square is always at a corner. Three corners are candidates: (0, 0) do-yourself, (1, 0) self-automator, (1, 1) spec-driven (the corner (0, 1) is dominated since verifying with no AI involvement is pure cost). Centaur and cyborg modes do not arise as per-task optima — they appear only as aggregate-level patterns when a worker mixes corner policies across sub-tasks with heterogeneous θ. This is a substantive prediction, not a limitation.

The portfolio extension aggregates per-task decisions under a daily attention budget. The headline result the model is designed to produce is S1 (workflow architecture > model capability): on the same task mix and same c_AI, optimal routing dominates “max-AI” (self-automate everything) and dominates the naive flat-cyborg heuristic (u=0.7, v=0.3 everywhere). The bilinearity finding sharpens this: the naive flat-cyborg policy is exactly the failure mode — it applies an interior (u, v) value that the bilinear structure says no individual sub-task should land at. Optimal routing differentiates across tasks (different corners for different θ); the aggregate (ū, v̄) across the day looks interior because the corners differ, not because any single decision is interior. The generating function is parameterised by capability (the L3 invariant): if c_AI rises uniformly across tasks, the model rebalances; if it rises only on certain task types, the boundary of optimal u* shifts but the shape of the routing rule stays put.

This stage produces seven things: (1) a math object — V(u, v; θ) and the four-channel decomposition; (2) a workflow-mode classifier — three-corner per-task router plus a five-region label-partition of the (u, v) plane for observed worker behaviour; (3) a portfolio aggregator with budget-aware shadow-price routing (μ-binary-search over per-task α_eff = α + μ·g); (4) the interactive two-tab dashboard below; (5) five cruxes of the model (load-bearing claims whose collapse rebuilds it); (6) six Stage-4 fitting targets (parameter calibrations and qualitative predictions the data pipeline should test); (7) three engaged objections (c_AI unobservable, model just recovers practitioner intuitions, model is single-shot not strategic) with steelmen and what survives. Scope is explicit: the formalisation does not capture the aggregate-zero puzzle (E4/O2 — organisational dynamics), cross-task productivity bundling (G8/Cowen), c_AI miscalibration on novel tasks (O7), sycophancy as a verification-degrader (E13), or frontier migration over time (O4). These are named scope-limits, not silent assumptions.

Task parameters (θ)

c_HHuman capability on this task0.50

c_AIAI capability on this task0.65

φVerification cost ratio (verify / generate)0.30

σStakes (uncaught-error penalty weight)0.40

λSkill-formation value (preserve this skill?)0.40

Task presets

AI mildly stronger, modestly cheap verification, moderate stakes. Optimum lands at spec-driven — AI does the synthesis, you read the output.

Optimal policy

1.00

autonomy

1.00

verification

+0.295

net value

Workflow mode

Spec-driven / independent-then-synthesize

(u, v) = (1.00, 1.00) → spec-driven

Channel decomposition

Q quality

+0.325

A attention

+0.550

R risk

+0.000

S skill

-0.400

Each bar shows that channel's contribution to V at the optimum. Q is gain over the human-only baseline; A is attention saved (or spent) vs. the M-only floor; R is the stakes-weighted risk penalty; S is the skill change weighted by λ.

Diagnostics

Q (quality) = 0.825, A (attention) = 0.530, R (risk) = 0.000, S (skill) = 0.000

c_⋆ = c_AI + (1 − c_AI)·c_H = 0.825

V is bilinear in (u, v); the maximum on the unit square is at a corner. The router will return one of three corners — (0, 0) do-yourself, (1, 0) self-automator, or (1, 1) spec-driven. Centaur and cyborg modes appear at the aggregate level when a worker mixes corner policies across sub-tasks with heterogeneous θ — see Day Portfolio.

V(u, v; θ) = Q(u,v) − α·A(u,v) + λ·S(u,v) − σ·R(u,v). Constants: α = 1.00 (normalised), ε = 0.15 (residual attention at u=1; L1 invariant), β = 0.05 (per-task atrophy rate), M = 0.08 (routing tax). Mode thresholds: u_lo = 0.15, u_hi = 0.85, v_lo = 0.3, v_hi = 0.6. Optimum found by 41×41 grid search on the unit square. Stage-4 fitting will tighten α, ε, β, M against telemetry data; mode-distribution match against Randazzo BCG sample (~60% cyborg / ~30% centaur / ~10% self-automator) is Q3 of the named fitting targets.

How to read this stage

The dashboard above is the artifact. Everything below is the spec.

Two interactive surfaces:

Per-task router. Inputs: (c_H, c_AI, φ, σ, λ) for one task. Outputs: optimal (u*, v*), the workflow-mode label, and a four-bar decomposition showing which channel dominates. Use this to answer “for a task with these characteristics, what’s the right way to use AI?”
Day portfolio. Inputs: a basket of task types with counts. Outputs: total quality, total attention, total skill change under four strategies (always-self / max-AI / naive cyborg / optimal routing). Use this to see the S1 effect — same AI, different workflow architectures, very different outcomes.

If the dashboard says one thing and your gut says another, the diagnostic is to check (a) whether your (c_H, c_AI) estimates are calibrated and (b) whether the constants (α, ε, β, M) are calibrated for your task density. This is exactly what Stage 4 (the data pipeline) is for.

1. The formalisation moves

Three things this stage does — explicit so they’re inspectable separately.

Move 1 — Decomposition. The per-task value V splits into four orthogonal channels (Q, A, R, S). “Orthogonal” here means: each channel responds to (u, v) in a distinguishable way, so when a slider moves, the user can see which channel is driving the change. This isn’t a stylised choice — it’s how the topology’s mechanism nodes (G3, G7, G9, L1) actually compose. Without the decomposition, “the gain from AI” is a black box; with it, the user can ask “is this gain coming from quality, time-saved, or skill?” and answer.

Move 2 — Generating function. The five workflow modes (P1 / E10) are not given as a typology to be matched. They are generated as solutions to argmax V(u, v; θ) under different parameter regimes. This is the L3 invariant in operational form: change θ, the optimum moves, the mode label changes — but the function that produces the mode is fixed. A practitioner who hardcodes “use Cursor for boilerplate, ChatGPT for strategy” gets a lookup table; this gets a generating function.

Move 3 — Integration. Six prior objects compose into one: Karpathy’s autonomy slider (P2 — the u axis), Shneiderman’s 2D framework (L4 — the (autonomy, control) plane is the (u, v) plane), Vasconcelos verification-economics (G3 — the −α·v·φ term), Bastani’s guardrail finding (E9 — the −β·u·(1-v) skill term), Bainbridge’s substitution myth (L1 — the ε > 0 residual attention), and Madras-Mozannar L2D (L2 — the joint-surface optimisation at portfolio level). None of these alone is the model; the model is what they jointly imply.

What’s not yet ready for formalisation, kept in §9 as scope-limits: cross-task bundling (G8), organisational absorption (E4/O2), miscalibration of c_AI (O7), sycophancy as quality-degrader (E13), frontier migration over time (O4).

2. Variables and objects

Decision variables (per task), continuous on the unit square:

Symbol	Range	Meaning
`u`	[0, 1]	Autonomy level — fraction of generation delegated to AI
`v`	[0, 1]	Verification depth — fraction of AI output independently checked

Task parameters (the vector θ):

Symbol	Range	Meaning
`c_H`	[0, 1]	Human capability on this task type
`c_AI`	[0, 1]	AI capability on this task type
`φ`	≥ 0	Verification-cost ratio (verify-time / generate-time)
`σ`	[0, 1]	Stakes — weight on uncaught-error penalty
`λ`	[0, 1]	Skill-formation value — how much the worker cares about preserving this skill

Constants (calibrated, not per-task):

Symbol	Default	Meaning
`α`	1.00	Attention price (normalisation)
`ε`	0.15	Residual attention at full delegation — L1 invariant (substitution myth)
`β`	0.05	Skill-atrophy rate per unit of unverified delegation
`M`	0.08	Per-task metacognitive routing tax — A1/G1 invariant
`c_⋆`	`c_AI + (1−c_AI)·c_H`	Verified-output ceiling — “either AI got it right OR human catches the error” (complementary product of independent error events)

The constants are calibrated to the lit-review anchors (Mozannar CUPS for ε; Bastani 17% drop for β; Tankelevitch metacognitive load for M). They can be re-fit by Stage 4 against telemetry data.

3. The per-task value function

V(u, v; θ) = Q(u, v) − α·A(u, v) + λ·S(u, v) − σ·R(u, v)

3.1 Quality channel `Q`

Q(u, v) = (1 − u)·c_H + u·[(1 − v)·c_AI + v·c_⋆]

With probability (1 − u) the human did the generation and quality is c_H. With probability u the AI did the generation, of which fraction (1 − v) ships unverified at quality c_AI and fraction v is verified. Verified output achieves quality c_⋆ = c_AI + (1 − c_AI)·c_H = 1 − (1 − c_H)·(1 − c_AI) — the probability that either the AI got it right or (it didn’t and) the human catches the error, treating the two error events as independent. Linear in u for fixed v; linear in v for fixed u.

The complementary-product form natively handles the deskilled-verifier limit (c_H → 0 ⇒ c_⋆ → c_AI — verification adds nothing when the human can’t recognise errors) and the verifier-stronger limit (c_H → 1 ⇒ c_⋆ → 1 — careful human verification approaches a quality ceiling). Pass 1 used c_⋆ = max(c_H, c_AI), which over-credits verification when c_H > c_AI (as if the human catches every AI error) and under-credits it when c_H < c_AI (as if a partly-skilled human catches no AI errors). The form here treats both with one expression and one structural assumption (independence of human and AI error events).

Caveat: c_H enters both the generation cost (when u = 0) and the verification benefit (the extra catching power at v > 0). Karpathy’s G9 generator-verifier-asymmetry says verification is typically easier than generation — recognising an error costs less than producing a correct answer from scratch. A more accurate model would carry a separate c_V (verifier capability) ≥ c_H for low-c_H workers. Held for Stage 4; named in §9 scope-limits.

3.2 Attention channel `A`

A(u, v) = (1 − u·(1 − ε)) + v·φ + M

Three pieces:

(1 − u·(1 − ε)): human-side generation cost. At u = 0, the human does it all (cost = 1, the base generation time). At u = 1, residual attention ε remains — every offload creates monitoring/coordination work (Bainbridge L1). ε > 0 is the L1 invariant in operational form: a model with ε = 0 would predict that full delegation is free of attention cost, which is exactly the substitution myth.
v·φ: verification cost. Linear in verification depth, scaled by the per-task verification-cost ratio φ. This is the G3 (Vasconcelos verification-economics) term.
M: per-task metacognitive routing tax. Constant per task — classifying, choosing the workflow, monitoring AI for handoffs. Tankelevitch’s metacognitive-demand finding (G1) compressed to a constant; Stage 4 can test whether M is task-type-dependent.

Note that ε is the only term that decouples attention from the (u, v) decision. It is what makes “max-AI” not free.

3.3 Risk channel `R`

R(u, v) = u·(1 − v)·(1 − c_AI)

Probability-of-uncaught-error: AI generated the output (u), it was not verified (1 − v), and the AI was wrong (1 − c_AI). Multiplied by stakes σ in the value function. This is the G2 (ironies-of-automation) term: rare critical errors get missed precisely when delegation is high and verification is low.

For high-σ tasks, this term is large enough to drive v → 1 (the spec-driven / independent-then-synthesize regime, E8 — Everett 2025). For low-σ tasks, it’s negligible and the optimum can sit at v = 0 without harm.

3.4 Skill channel `S`

S(u, v) = (1 − u) − β·u·(1 − v)

Two terms:

(1 − u): practice — the human builds skill on the fraction of the work they did themselves.
−β·u·(1 − v): atrophy — unverified delegation erodes capacity. Verification preserves engagement (this is Bastani’s “hint mode” finding E9: guardrails → no atrophy). The product u·(1 − v) is exactly the self-automator regime where atrophy is fastest.

λ is the worker’s per-task valuation of preserving the skill. For tasks the worker explicitly wants to maintain capacity on (their core craft), λ is high and the skill term has bite. For boilerplate or one-off tasks, λ is low and skill is correctly ignored.

3.5 Putting it together

V(u, v; θ) =   (1 − u)·c_H + u·[(1 − v)·c_AI + v·c_⋆]            ← Q
             − α·[(1 − u·(1 − ε)) + v·φ + M]                       ← −α·A
             + λ·[(1 − u) − β·u·(1 − v)]                           ← +λ·S
             − σ·[u·(1 − v)·(1 − c_AI)]                            ← −σ·R

Five parameters in θ, two decisions, four channels. Exactly bilinear in (u, v): collecting terms,

V(u, v; θ) = K_0 + K_u·u + K_v·v + K_uv·u·v

with

K_0 = c_H − α − α·M + λ
K_u = (c_AI − c_H) + α·(1 − ε) − λ·(1 + β) − σ·(1 − c_AI)
K_v = −α·φ (K_v captures the cost of verification when there is no AI to verify, i.e., at u = 0; pure cost, hence ≤ 0 — this is exactly why corner (0, 1) is dominated by (0, 0) below)
K_uv = (c_⋆ − c_AI) + λ·β + σ·(1 − c_AI) = (1 − c_AI)·c_H + λ·β + σ·(1 − c_AI) (always ≥ 0 — verification gain is monotone in u)

Bilinear functions on a unit square attain their maximum at a corner. The interior critical point, when it exists, is a saddle (Hessian eigenvalues ±K_uv). So the per-task optimum is at one of the four corners — and since K_v < 0, (0, 1) is dominated by (0, 0) (verifying when no AI is involved is pure cost). The three meaningful corners:

(0, 0) — do-yourself.
(1, 0) — full delegation, no verification (self-automator).
(1, 1) — full delegation with full verification (spec-driven / independent-then-synthesize).

Which corner wins depends on the signs of K_u, K_u + K_uv + K_v, and K_v + K_uv — three linear comparisons over θ.

4. Optimal policy: the three corners that win

V is bilinear → max at a corner. Of the four corners, (0, 1) is dominated by (0, 0) because K_v < 0 (verifying with no AI involvement is pure cost). Three candidates remain — and a clean decision tree determines the winner:

Spec-driven (1, 1) wins iff K_u + K_v + K_uv > 0 and K_v + K_uv > 0.
Else self-automator (1, 0) wins iff K_u > 0.
Else do-yourself (0, 0) wins.

Equivalently: spec-driven dominates self-automator when α·φ < K_uv (verification cost is below benefit at full delegation: α·φ < (1 − c_AI)·c_H + λ·β + σ·(1 − c_AI)); self-automator dominates do-yourself when K_u > 0 (the attention savings + AI quality gain outweigh skill loss + stakes risk).

Comparative statics — what moves the corner choice:

Parameter increase	Effect on `u*`	Effect on `v` (given `u = 1`)
`c_AI − c_H` ↑ via `c_AI` rising (`c_H` fixed)	↑ (K_u rises)	↓ (K_uv falls: less verification benefit)
`φ` ↑ (verification expensive)	weakly ↓ via v*-flip	↓ (K_v more negative)
`σ` ↑ (stakes)	weakly ↓ (K_u falls by `σ·(1−c_AI)`)	↑ (K_uv rises by `σ·(1−c_AI)`)
`λ` ↑ (skill matters)	weakly ↓ (K_u falls by `λ·(1+β)`)	↑ slightly (K_uv rises by `λ·β`)
`c_AI` ↑ alone	↑	↓ (both `(1−c_AI)·c_H` and `σ·(1−c_AI)` fall)

This is the L3 invariant in tabular form: if c_AI rises uniformly, optimal u* rises and v* falls. The structural shape of the rule does not change. Two signs worth flagging because they’re non-obvious: more reliable AI (c_AI ↑) reduces verification benefit (fewer errors to catch); higher stakes (σ ↑) pushes u* down (refuse AI when stakes are high) and v* up (if you do use AI, verify carefully) — a real tension the bilinear structure makes explicit.

The five practitioner modes — labels for the (u, v) plane

The practitioner literature names five workflow modes. They span the (u, v) plane and are useful as vocabulary for labelling observed worker behaviour at any (u, v):

Region	Mode	Practitioner anchor
`u ≈ 0`	Do-yourself	(no-AI / refuse-AI)
`u ∈ (0, 1)`, `v` high	Centaur	Mollick — clean handoff with verification gate
`u ∈ (0, 1)`, `v` low/mid	Cyborg	Mollick — interleaved, partial verification
`u ≈ 1`, `v` high	Spec-driven / independent-then-synthesize	Everett 2025; Compound Engineering
`u ≈ 1`, `v` low	Self-automator	Randazzo HBS 26-036 (the trap)

The per-task router only returns the three corners — (0, 0), (1, 0), (1, 1) — corresponding to do-yourself, self-automator, and spec-driven. Centaur and cyborg do not arise as per-task optima. They appear empirically as aggregate-level labels when a worker mixes corner policies across sub-tasks with heterogeneous θ — some sub-tasks done alone, others fully delegated; some AI outputs verified, others shipped. The day-level (ū, v̄) averages out to interior values that get labelled “cyborg” or “centaur” depending on the ratio. The Day Portfolio tab demonstrates this directly.

This is a substantive prediction of the formalisation, not a limitation. Bilinearity is faithful to the structure of the problem: V collects to one bilinear form V = K_0 + K_u·u + K_v·v + K_uv·u·v, even though three of the four channels (Q, R, S) carry their own u·v terms — they sum to one consolidated K_uv. The model says: at any single sub-task with a single θ, pick a corner — fully delegate or don’t, fully verify or don’t. The naive flat-cyborg strategy (u = 0.7, v = 0.3 applied to every task) is exactly the failure mode — applying an interior policy uniformly is what no individual sub-task should do under bilinearity.

5. Portfolio aggregation — the day

A worker faces N tasks per day, each with its own θ_i. Total attention budget T. The portfolio problem:

maximise   Σ_i Q_i(u_i, v_i) · count_i
subject to Σ_i A_i(u_i, v_i) · g_i · count_i ≤ T

where g_i is the task type’s base generation time. The Lagrangian gives a shadow price μ ≥ 0 on the budget constraint, and the per-task choice rule becomes

argmax V_i − μ·A_i·g_i  =  argmax V_i with α replaced by α_eff_i = α + μ·g_i

— that is, budget pressure raises the effective attention price, more so for longer-base-time tasks. When μ = 0 the budget is slack and per-task choice is unconstrained argmax V; when μ > 0 the budget binds and α_eff rises until total absolute attention Σ A_i·g_i·count_i fits T. The biasing is structural: raising α_eff raises K_u (self-automator becomes more attractive vs. do-yourself) and makes K_v = −α_eff·φ more negative (verification becomes less attractive). So as the budget tightens, longer tasks reroute first from spec-driven (1, 1) to self-automator (1, 0) — the lowest-A corner.

This is the L2 invariant operationalised at portfolio level. The naive practitioner rule “use AI when AI is better” compares c_H to c_AI task-by-task in isolation; the joint-surface rule compares marginal Q per unit attention saved against the day’s shadow price. They give the same answer when attention is abundant (μ ≈ 0); they diverge sharply when the day is tight.

Strategies the dashboard compares:

Always-self. u_i = 0 for all i. No AI, no atrophy, no verification cost — but no productivity gain. The pre-AI baseline.
Max-AI. u_i = 1, v_i = 0 for all i. The corner uniformly applied. Fast, but high risk on high-σ tasks and accelerated atrophy on high-λ tasks.
Naive cyborg. u_i = 0.7, v_i = 0.3 for all i. A flat interior policy applied uniformly — exactly what the bilinear structure says no individual task should land at. Interior values arise legitimately only as averages over heterogeneous-θ sub-tasks. Applying them uniformly violates the structure and underperforms.
Optimal routing (budget-aware). Per-task corner choice under the shadow-price-adjusted α_eff_i = α + μ·g_i, with μ solved by binary search until the budget is met (or until even uniform self-automator overflows). Each task lands at the corner appropriate to its θ and the day’s binding budget. The aggregate (ū, v̄) across the day looks interior because different tasks land at different corners; the dashboard surfaces μ below the strategy table so the user can see when the budget is biting.

The headline prediction (S1): optimal routing dominates max-AI by quality-per-attention and dominates always-self by attention efficiency, on the same c_AI. The gap between optimal routing on mid-tier AI and naive-flat routing on frontier AI is the empirical bound the model places on “workflow architecture > model capability.” Mechanism: optimal routing differentiates across tasks (different corners for different θ) and reroutes under budget pressure (shadow price μ); naive uniformly applies an interior policy that is structurally never the per-task optimum at any α.

6. Calibration anchors

Where the constants come from. None of these is precise; all are Stage-4 fitting targets.

α = 1 — normalisation. Attention is the numéraire; all other costs are denominated in attention units.
ε = 0.15 — Mozannar CUPS (E12) shows verification + monitoring is a “substantial fraction” of total interaction time even when AI is doing the generation. 15% is a midpoint of the reported range; Stage 4 should tighten this from telemetry.
β = 0.05 — Bastani’s 17% unassisted-performance drop (E9) over a session of ~30 unguardrailed tasks → ~0.5% atrophy per task at u = 1, v = 0. Setting β = 0.05 means the model implies ~5% atrophy per task in the worst-case regime, which compounds to Bastani’s order-of-magnitude over a week. The lit review explicitly notes most studies are under 12 months; β at the per-task scale is what compounds to the longitudinal scale.
M = 0.08 — Tankelevitch (G1) finds metacognitive load is the binding constraint for AI users; CUPS (E12) finds verification + planning consume a meaningful fraction of total time. 8% per task is a calibration anchor consistent with the magnitude of the metacognitive-bottleneck claim. Stage 4 should test whether M varies by task type (it likely does — high-stakes strategic decisions have larger M than routine email).

At the per-task level, the defaults predict (1, 0) self-automator at routine-low-stakes corners (Randazzo’s ~10% empirically), (1, 1) spec-driven where verification benefit dominates verification cost, and (0, 0) do-yourself in outside-frontier regimes. The full Randazzo 60/30/10 (cyborg/centaur/self-automator) distribution emerges only at the aggregate level, when a worker’s day mixes corner policies across heterogeneous-θ sub-tasks — see §4. The defaults predict outside-frontier harm (Dell’Acqua E3) when c_H > c_AI and the worker mis-routes to u > 0 (the model’s prescription is u* = 0 there); the harm is from disobeying the optimum, not from a model output.

7. Worked anchors against the empirical record

Six probes. Each picks a parameter regime, computes the corner optimum exactly, and compares to the lit-review anchor.

A. Brynjolfsson (E1, E2) — customer-service novice +34%, top performers ~0%.

Novice: θ = (c_H = 0.40, c_AI = 0.70, φ = 0.20, σ = 0.20, λ = 0.10). c_⋆ = 0.70 + 0.30·0.40 = 0.82. Then K_u = 0.30 + 0.85 − 0.105 − 0.06 = 0.985 > 0 (full delegation beats do-yourself), and K_v + K_uv = −0.20 + (0.12 + 0.005 + 0.06) = −0.015 < 0 (verification cost barely exceeds benefit). Optimum: (1, 0) self-automator. Quality lift over always-self: c_AI − c_H = +0.30. Direction matches the +34% Brynjolfsson finding (which is in resolution rate, mixing speed and accuracy).

Expert: θ = (c_H = 0.85, c_AI = 0.70, φ = 0.20, σ = 0.20, λ = 0.10). c_⋆ = 0.70 + 0.30·0.85 = 0.955. Then K_u = −0.15 + 0.85 − 0.105 − 0.06 = 0.535 > 0, and K_v + K_uv = −0.20 + (0.255 + 0.005 + 0.06) = +0.12 > 0 (verification benefit dominates because c_⋆ − c_AI = 0.255 is large — a skilled human catches AI errors). Optimum: (1, 1) spec-driven. Quality lift: c_⋆ − c_H = +0.105, only +12% relative.

So the model produces: novice at full-delegate-no-verify (Q = 0.70 from c_AI alone), expert at full-delegate-full-verify (Q = 0.955 = AI augmented by skilled human catching). Both delegate; the difference is in verification depth. The empirical “expert ~0%” finding reflects throughput ceiling (experts already at maximum call rate, can’t redeploy saved attention to more calls) rather than zero quality lift — a context the model doesn’t carry.

B. Dell’Acqua BCG (E3) — outside-frontier 19-pp quality drop. θ = (c_H = 0.70, c_AI = 0.40, φ = 0.30, σ = 0.50, λ = 0.50). c_⋆ = 0.40 + 0.60·0.70 = 0.82. Then K_u = −0.30 + 0.85 − 0.525 − 0.30 = −0.275 < 0. Optimum: (0, 0) do-yourself. If the worker mis-routes to (1, 0), quality drops from c_H = 0.70 to c_AI = 0.40 — a 30 pp loss. The empirical 19 pp reflects partial mis-routing (some subjects partially used AI, some didn’t); the model’s prediction is a clean upper bound on the harm.

C. Bastani PNAS (E9) — guardrails preserve skill. Generic learning task with high skill-formation: θ = (c_H = 0.40, c_AI = 0.70, φ = 0.30, σ = 0.20, λ = 0.50). c_⋆ = 0.82. K_u = 0.30 + 0.85 − 0.525 − 0.06 = 0.565 > 0. K_v + K_uv = −0.30 + (0.12 + 0.025 + 0.06) = −0.095 < 0. Optimum: (1, 0) self-automator.

This is a real and honest finding: at lit-review-anchored constants (β = 0.05), the skill-preservation push toward verification (λ·β·u = 0.025 at u = 1) is too small to flip the corner against verification cost. The guardrail effect is structurally present — at the (1, 1) corner, S = 0 (no atrophy) versus S = −0.05 at (1, 0), so λ·ΔS = +0.025 — but the verification cost (α·φ = 0.30) dominates. Numerically, self-automator beats spec-driven by α·φ − K_uv = 0.30 − 0.205 = 0.095 net at default constants. To make guardrails decisive (flip the corner to spec-driven), the model needs λ·β > α·φ − [(1 − c_AI)·c_H + σ·(1 − c_AI)] = 0.30 − 0.18 = 0.12. At default λ = 0.50, β must exceed 0.24; at β = 0.05, λ alone cannot flip the corner (would require λ > 2.4, impossible since λ ∈ [0, 1]); at default β = 0.05 and λ = 0.50, lowering φ flips the corner once φ < 0.205 — barely cheaper verification than the default 0.30.

Honest reading: the model says the guardrail effect is real but weak at default constants. Stage-4 fitting target Q2 is whether β should be larger to match Bastani’s empirical magnitude. This is not a model failure — it’s the model surfacing a calibration question the lit review left implicit.

D. Everett (E8) — independent-then-synthesize restores complementarity. θ = (c_H = 0.65, c_AI = 0.70, φ = 0.40, σ = 0.90, λ = 0.30). c_⋆ = 0.70 + 0.30·0.65 = 0.895. K_u = 0.05 + 0.85 − 0.315 − 0.27 = 0.315 > 0. K_v + K_uv = −0.40 + (0.195 + 0.015 + 0.27) = +0.08 > 0. Optimum: (1, 1) spec-driven. Mechanism: the σ·(1 − c_AI) = 0.27 term in K_uv makes verification valuable precisely because stakes are high and AI is fallible. Matches Everett’s lit-review story exactly.

E. Randazzo self-automator (E10) — ~10% of consultants in the trap. Routine consulting task: θ = (c_H = 0.70, c_AI = 0.85, φ = 0.20, σ = 0.20, λ = 0.10). c_⋆ = 0.955. K_u = 0.15 + 0.85 − 0.105 − 0.03 = 0.865 > 0. K_v + K_uv = −0.20 + (0.105 + 0.005 + 0.03) = −0.06 < 0. Optimum: (1, 0) self-automator. Self-automator is the correct policy at this θ — the empirical finding “~10% of consultants are self-automators” is about θ-distribution (~10% of work-instances have these characteristics), not about systematic mis-routing. A misread of Randazzo’s data as “self-automator is always wrong” is a category error the model corrects.

F. Schoenegger (E18) — even overconfident GPT improves forecasting +23–43%. This is outside the model’s current formalism. The model attributes any gain at u > 0 to c_AI (advice quality), but Schoenegger’s finding suggests structured reasoning is doing significant work independent of the AI’s confidence calibration. A constant +δ to Q whenever u > 0 would represent this — held for future passes if it changes downstream predictions. Honest gap.

Cross-context note: outcome heterogeneity across these anchors

The six papers above measure different outcome variables. The model’s Q (“probability of correct/high-quality output”) maps cleanly onto Dell’Acqua, Everett, and Schoenegger (all output-quality measures). Brynjolfsson’s “issues resolved per hour” maps approximately onto Q × call-rate, where call-rate depends on attention saved (A); the model’s +34% novice prediction is a Q-only claim, while Brynjolfsson’s empirical 34% bundles throughput. Bastani’s “unassisted retest performance” is a downstream effect of the S channel accumulated over many task instances, not a per-task Q measurement. Randazzo’s “behavioural-mode distribution” is a categorical prediction over which corner the worker chooses, not a quality measure at all. Mozannar’s CUPS data is process telemetry (time fractions across interaction states), informative for α and ε calibration but not for Q.

Stage 4 must disaggregate which constants get fit against which outcome types — pooling them as a single calibration target would silently average over methodological apples and oranges. This is named explicitly as Q1–Q6 in §11.

8. The five cruxes

Load-bearing claims of the model. Collapse rebuilds it.

C1 — Two-axis decision space (u, v). The decision is reduced to autonomy and verification depth. If the actual workflow choice space has more dimensions that matter — e.g., context-engineering depth, prompt-iteration count, tool selection — the model is incomplete. What would flip it: empirical evidence that two workflows with identical (u, v) but different context-engineering produce systematically different outcomes (which the practitioner literature suggests is real — S4).

C2 — Verification effectiveness equals generation skill. c_⋆ = c_AI + (1 − c_AI)·c_H treats c_H as both generation skill and verifier capability — a worker who’d produce 40%-quality output alone catches 40% of AI errors. Karpathy’s G9 generator-verifier-asymmetry says verification is typically easier than generation; a separate c_V ≥ c_H parameter would make low-c_H workers more effective verifiers. What would flip it: empirical evidence that verifier-recognition rates are uncorrelated with generation skill (would require introducing c_V). Deskilled-verifier worry is partially handled by the formula already (c_H → 0 ⇒ c_⋆ → c_AI), but the mechanism of skilled-but-not-creative verifier (the editor archetype) is not in scope.

C3 — ε > 0 is the right operationalisation of L1. The substitution-myth invariant is captured as residual attention at full delegation. If the actual structure is more like “delegation creates new tasks of comparable effort” rather than “delegation creates monitoring overhead”, a constant ε is wrong shape. What would flip it: evidence that delegation-induced work scales with task complexity, not as a flat constant.

C4 — Skill atrophy β is task-type-uniform. All tasks atrophy at the same rate per unit of u·(1-v). Some skills (e.g., motor-procedural) atrophy slower than others (verbal-fluency, calibration). What would flip it: longitudinal data on skill-specific atrophy rates under controlled AI-use exposure.

C5 — Tasks are independent in the portfolio. Σ_i V_i aggregates linearly. Cross-task productivity bundling (G8/Cowen) violates this — productivity gains on related tasks are correlated, not additive. What would flip it: empirical evidence that observed aggregate productivity (E4 / Humlum-Vestergaard zero) is driven by task-coupling effects, not just by individual mis-routing. This is the most likely-to-flip crux: the aggregate-zero puzzle is a smoking gun for it.

9. Scope limits — what the model does NOT capture

Honest disclosure of where the formalisation stops.

Aggregate-zero puzzle (E4 / O2). Humlum-Vestergaard’s zero across 25,000 Danish workers is organisational, not individual — task reorganisation, managerial absorption, coordination costs. The model is individual-level (the A6 crux of the topology); it cannot rebut or explain E4 directly. This is a sibling-artifact problem (organisational-level model), not a parameter to set. The model is locally optimal at the individual level; it is silent about whether aggregate effects emerge.
Cross-task bundling (G8). V is summed across tasks; in reality V_i and V_j covary when the tasks are productivity-linked. C5 names this; the model does not encode it.
Calibration error on c_AI (O7). c_AI is treated as known. In practice users are systematically miscalibrated (E16 — higher AI confidence → less critical thinking; E13 — sycophancy escalation). The model’s c_AI should be the user’s belief about AI capability; if belief and reality diverge, the model is locally optimal under a wrong belief. The topology’s O7 (verification cost vs. verification calibration) is the natural extension.
Sycophancy as quality-degrader (E13). The model assumes verification only helps. Randazzo HBS 26-021 documents AI flipping correct human judgments under pushback — verification can worsen outcomes when sycophancy escalates. Not encoded; would require a c_⋆ that depends on the human’s resistance to AI-pushback.
Frontier migration (O4). c_AI is static within a session. Over months the frontier moves; the user must recalibrate. The model is a snapshot — extending it to a dynamic version would couple c_AI(t) to a learning model of the user’s frontier-mapping rate.
Multi-tool attention interference (G10 — Wickens MRT). The model is per-task; it does not capture the cost of running Cursor + ChatGPT + Slack simultaneously. The topology added G10 specifically to flag this; Stage 4 / Stage 5 should test whether a M_concurrent(N_tools) extension is needed.
Verifier skill ≠ generation skill (Karpathy G9). The model uses c_H as both generation capability and verification effectiveness. Empirically, verification is often easier than generation — recognising an error costs less than producing a correct answer. A c_V ≥ c_H parameter would let low-c_H workers benefit from verification more than the current formula predicts. Held for Stage 4 / future passes; the C2 crux names this.
Partial verification (“skim”) rounds to full or none. Bilinearity makes v* ∈ {0, 1}. The empirical reality of skimming (partial-depth verification at reduced cost and reduced effectiveness) does not map onto the model — the model rounds skim up to full-verify when verification is cheap and down to no-verify when it’s expensive. A future variant with convex verification cost (v²·φ instead of v·φ) or with a separate verification-depth-vs-effectiveness curve would produce interior v* solutions. Not added in this pass to preserve bilinearity’s analytical clarity.

These eight are not bugs of the formalisation. They are the boundary of what an individual-task bilinear generator-verifier loop can carry. Beyond it lies organisational design, dynamic learning, and team-level cognition — sibling artifacts.

10. Adversarial + steelman

Three objections the formalisation has not yet engaged head-on, and the strongest version that survives each.

Objection 1: `c_AI` is unobservable, so the model is unactionable on novel tasks.

Steelman. The optimal policy depends on c_AI, but in any new task or unfamiliar domain the worker doesn’t know c_AI ahead of time. Calibrating c_AI requires running the task and verifying the output — but the model says “compute argmax V using c_AI you don’t have.” For experienced task types c_AI is calibratable from history; for genuinely novel tasks (much of knowledge work) it isn’t. The Vasconcelos / Fok & Weld verification-economics frame already captures this — engagement is rational when verification is cheap; verifying a single AI output IS your way of measuring c_AI for that task type. The model assumes the calibration question is solved when it’s actually the binding constraint.

Why partially right. For genuinely novel tasks where c_AI is unknown, the model cannot make a precise recommendation. Stage-4 fitting target Q3 (mode-distribution match against Randazzo BCG) implicitly assumes calibrated c_AI distributions across BCG-like tasks — only valid for well-studied domains.

Why the strongest version survives. The model doesn’t need precise c_AI; it needs robustness across c_AI ranges, and the corner structure is robust. For almost any c_AI < 0.4 in a high-stakes regime (σ ≥ 0.7), the corner is do-yourself; for almost any c_AI > 0.8 in low-stakes routine (σ ≤ 0.2, low λ), the corner is self-automator. The “interesting” boundary regions — where c_AI uncertainty matters — are precisely the spec-driven regions. So the model’s prescription under uncertainty is: set v = 1 to verify and learn. The spec-driven corner doubles as a Bayesian-update mechanism; the verification cost α·v·φ is the explicit price of resolving the uncertainty. Stage 4 should formalise the explore-vs-exploit dynamics this implies (Q6 in §11).

Objection 2: The model recovers practitioner intuitions and adds no new predictions.

Steelman. Mollick, Karpathy, Anthropic, and Cognition collectively say “match the workflow to the task.” The model’s predictions of corner solutions match Randazzo’s empirical mode distribution. So what is the formalism contributing beyond a formal scaffold for intuitions practitioners already had?

Why partially right. Many model predictions match practitioner consensus on headline conclusions. Self-automator for routine, do-yourself for novel-and-high-stakes, spec-driven for high-stakes-with-cheap-verify — practitioners already say these.

Why the strongest version survives. The model contributes five things the practitioner literature does not:

Quantitative trade-offs. The model says how much worse self-automator is than spec-driven at default constants — at the Bastani anchor specifically, 0.095 net (α·φ − K_uv = 0.30 − 0.205). Practitioner literature is qualitative; this is parametrically calibratable, and Stage 4 will pin the constants.
Non-obvious simultaneous constraints. Higher stakes (σ ↑) push BOTH u DOWN (refuse AI for high-stakes work) AND v UP (if you do use AI, verify carefully) — a simultaneous prescription the practitioner literature does not make explicit. §4’s comparative-statics surfaces this; intuition often conflates the two.
Naive cyborg as structural failure mode. Bilinearity says applying interior (u, v) uniformly is what no individual sub-task should do. This is a sharper criticism than “be thoughtful about which mode you use” — it identifies a specific failure mode (the BCG cyborg majority running flat-(0.7, 0.3) policies) and says they’re structurally wrong, not just suboptimal.
Budget-aware shadow-price reformulation. The μ mechanism says: under attention scarcity, reroute longer-base-time tasks first (since α_eff_i = α + μ·g_i rises proportionally with g_i). Practitioner literature has nothing like this prescriptive structure for portfolio-level decisions.
Stage-4 calibration targets. The model identifies six specific empirical questions (Q1–Q6 in §11) that fit a generator-verifier dispatch framework. Practitioner heuristics are unfalsifiable by design — they update without preserving the reasoning, so users can’t tell when a heuristic stops applying. The L3 invariant is the antidote.

The model recovers practitioner intuitions on top-line conclusions and adds quantitative, edge-case, and falsifiable structure beyond. The contribution is in calibration, simultaneous constraints, structural failure modes, portfolio-level shadow pricing, and Stage-4 testability — not in inventing new top-level recommendations.

Objection 3: The model is single-shot; the topic question is dynamic.

Steelman. “Optimal configuration for an individual knowledge worker” is implicitly a static answer to a fundamentally dynamic question — AI capability shifts month-over-month, the worker’s skill atrophies under sustained delegation, calibration on c_AI drifts as new model versions ship. A static dispatcher with parametric flexibility doesn’t tell the worker how to anticipate and prepare for capability change.

Survives because (a) Parasuraman, Sheridan & Wickens’ (2000) function-allocation framework has been useful for 25+ years despite being static — parametric statics IS what’s wanted from a generating function (the L3 invariant); (b) the trajectory questions properly belong to sibling topics in the LLM Iterate roster — navigating-ai-world for AI-induced trajectory of skill/meaning/relational channels, the planned prediction-calibration topic for c_AI calibration drift, the planned bedrock-generating-functions for the temporal-aggregation patterns this and other models share. Static-but-parameterised is the right scope here; dynamic extensions are cross-topic by design, not gaps in the present formalisation.

11. Stage-4 fitting targets

Six named questions Stage 4 should test against data.

Q1 — (α, ε) from CUPS telemetry. Mozannar CUPS gives time fractions across coding interaction states. Calibrate ε from observed monitoring-time-at-high-u; calibrate α from observed cyborg-vs-centaur time efficiency.

Q2 — β from Bastani longitudinal regime. Bastani’s 17% drop is one window. Longer-window data (Lee-Sarkar CHI 2025; Anti-Social Century) should pin per-task atrophy rate. The model predicts: β should be ~10× larger for skills the worker uses heavily than for tasks they delegate occasionally.

Q3 — Mode-distribution match against Randazzo BCG. Given a θ-distribution prior over BCG-consultant tasks, does the optimal-routing distribution over (u*, v*) match the observed (~60% cyborg / ~30% centaur / ~10% self-automator)? If not, which constant is mis-fit?

Q4 — Outside-frontier harm prediction (Dell’Acqua replication). For subjects forced to use AI on c_H > c_AI tasks, the model predicts a quality drop proportional to u·(c_H − c_AI). Stage 4 should test the linearity and the slope.

Q5 — Workflow-architecture-vs-capability bound. The headline S1 prediction. Construct two simulated workforces: (a) frontier AI + naive flat-cyborg routing, (b) mid-tier AI + optimal routing on the same task mix. The model predicts (b) outperforms (a) on net quality once c_AI_naive falls below some threshold. Stage 4 should locate the threshold and test against Everett 2025 / Dell’Acqua at that precision.

Q6 — Calibration uncertainty and explore-vs-exploit on c_AI. The model treats c_AI as a known input. In practice, workers learn c_AI by running the task and verifying the output (Vasconcelos verification-economics). For novel tasks, the spec-driven corner doubles as a calibration mechanism — verification reveals c_AI for next time. Stage 4 should formalise the explore-vs-exploit structure: what is the optimal exploration premium when c_AI variance is high? When does verifying-to-learn dominate verifying-to-quality-control? The §9 scope-limit on c_AI miscalibration becomes a structural extension via this question, not just an acknowledged gap. Engages the §10 Objection 1 directly.

Data starting points for each Q

The natural starting point for each fitting target is the lit-review paper(s) that motivated the corresponding parameter. Honest notes on likely data availability:

Q1 (α, ε) — Mozannar CUPS 2024 (E12) telemetry across coding interaction states. Process-level data is likely Microsoft Research-internal; replication via Cursor / Claude Code anonymized usage logs is the alternative path.
Q2 (β) — Bastani PNAS 2025 (E9) is the primary anchor; PNAS supplementary materials likely include the longitudinal panel needed to estimate per-task atrophy. Lee-Sarkar CHI 2025 (E16) provides multi-task panel context for a complementary fit.
Q3 (mode distribution) — Randazzo HBS WP 26-036 (E10) for the 60/30/10 BCG distribution. HBS supplementary materials may include individual-level mode tags; failing that, an in-house replication on a smaller knowledge-worker sample is tractable.
Q4 (outside-frontier slope) — Dell’Acqua BCG study (E3); HBS data release may include individual-level outcome grades plus inside/outside frontier classification per task. The linearity test of u·(c_H − c_AI) is a clean within-subjects design.
Q5 (workflow > capability) — Everett 2025 (E8) demonstrates the workflow-restoration mechanism but does not directly test the bound. The cleanest test is a new RCT comparing routing strategies on the same c_AI (e.g., GPT-4 + naive flat-cyborg vs. GPT-3.5 + optimal routing on a matched task mix); existing data is suggestive, not decisive.
Q6 (explore-exploit on c_AI) — Likely requires new experimental work or simulation. Closest analogs are the contextual-bandit and multi-armed-bandit-with-costly-verification literatures, but the knowledge-workflow application is novel; this is a Stage-4 originated study, not a re-analysis of existing data.

Stage 4’s first move is to scope data availability for Q1-Q5; Q6 may require originating new data or a simulation harness. Following the human-psych-variation pattern, the Stage-4 build should live at stage_outputs/technology-utilization-architecture/data/ with a curated CSV per fitting target, a runnable Python pipeline that reproduces every chart on the published data.mdx, and a data/out/ folder for derived outputs.

12. Connections to other topics

Where this model attaches to sibling AI’s-Research topics.

Human-psych-variation. λ (skill-formation value) and β (atrophy rate) are individual differences. Need-for-Cognition (Buçinca 2021, E6) moderates v — high-NfC users verify more. Cognitive-style covariation belongs in that topic, not here.
Navigating-AI-world. This model’s β is the per-task version of nav-AI’s ΔM_comp (competence erosion). The portfolio-level S aggregation is the within-work-domain version of nav-AI’s ΔV/ΔM trade-off. The two models share the substitution-myth (L1) and verification-economics (G3) invariants; they differ in what they’re optimising — nav-AI optimises a life-scale meaning budget, this model optimises a workday quality-per-attention budget.
Trust architecture (planned). Sycophancy (E13) and human-side calibration on AI capability (O7) are trust-regime questions — what feedback signals make c_AI knowable.
Prediction & calibration (planned). The topology’s O7 connection. Calibration on c_AI is the calibration sub-problem this model treats as exogenous.
Information fidelity (planned). φ (verification cost) depends on output-format quality and grounding — verifying a structured output with citations is cheap; verifying free prose is expensive. The information-fidelity topic should formalise what makes φ low or high.
Bedrock generating functions (planned). The four-channel decomposition V = Q − α·A + λ·S − σ·R is a candidate generating-function pattern: every decision under attention scarcity has the same four channels. The bedrock topic should test whether this generalises beyond AI workflow.

Glossary

autonomy level (u) — fraction of a task’s generation delegated to AI. Karpathy slider P2 made parametric.
verification depth (v) — fraction of AI output independently checked by the human.
c_H, c_AI — human and AI capabilities on a task type; probability of correct/high-quality output.
c_⋆ — verified-output ceiling; max(c_H, c_AI) under the assumption the human can verify.
φ — verification-cost ratio: time-to-verify divided by time-to-generate-from-scratch.
σ — stakes; weight on uncaught-error penalty.
λ — skill-formation value of the task to the worker.
α — attention price (utility weight on time). Normalised to 1.
ε — residual attention at full delegation. The L1 substitution-myth invariant.
β — per-task skill-atrophy rate under unverified delegation.
M — per-task metacognitive routing tax. The G1 metacognitive-bottleneck invariant.
centaur — Mollick: clean human/AI handoff with verification gate. (u, v) mid + high.
cyborg — Mollick: interleaved sub-task delegation with partial verification. (u, v) mid + mid/low.
self-automator — Randazzo: full delegation, no verification. (u, v) high + low. The atrophy-trap regime.
spec-driven / independent-then-synthesize — Everett 2025; Compound Engineering: full delegation with full verification. (u, v) high + high.
do-yourself — no AI involvement. u ≈ 0.
L1 (substitution myth) — every offload creates new monitoring/verification work. Encoded as ε > 0.
L2 (joint surface) — optimal allocation requires modelling the joint performance, not comparing solo capabilities. Encoded as the portfolio-level argmax.
L3 (parameterise by capability) — the formalism is a generating function over θ, not a lookup table. The whole structure of the model.
G3 (verification trade-off) — engagement is rational only when verification is cheap relative to expected payoff. The −α·v·φ term.
G7 (skill atrophy) — capacities not exercised decay. The −β·u·(1-v) term.
G9 (generator-verifier asymmetry) — production cost falls toward zero with AI; verification cost stays roughly constant. The asymmetry between u·ε (generation residual) and v·φ (verification full).

Iteration history

Pass 1 2026-04-29

decompositiongenerating functionintegration

Why First draft of the formalisation. Pulled the per-task value function out of the topology handoff (Option B: generator-verifier loop with autonomy slider) and wrote it as a single coherent decomposition + generating function. Built the interactive two-tab dashboard so the reader can dial parameters across both per-task routing and day-portfolio aggregation.
- Wrote per-task value function: V(u, v; θ) = Q(u,v) − α·A(u,v) + λ·S(u,v) − σ·R(u,v)
- Decomposed V into four orthogonal channels (quality / attention / risk / skill), each responding to (u, v) in a distinguishable way
- Encoded the L1 substitution-myth invariant as ε > 0 (residual attention at full delegation)
- Encoded the G3 verification-economics term as −α·v·φ; G7 skill atrophy as −β·u·(1-v); G2 ironies-of-automation as σ·u·(1-v)·(1-c_AI)
- Mapped (u*, v*) regions to five workflow modes (do-yourself, centaur, cyborg, self-automator, spec-driven) with default thresholds
- Portfolio aggregation under daily attention budget; four strategies compared (always-self / max-AI / naive cyborg / optimal routing) — generates the S1 headline as a comparable-strategies experiment
- Calibrated constants (α=1, ε=0.15, β=0.05, M=0.08) against lit-review anchors (Mozannar CUPS, Bastani 17%, Tankelevitch metacognitive load)
- Six worked anchors against the empirical record (Brynjolfsson novice/expert split, Dell'Acqua outside-frontier, Bastani guardrail effect, Everett independent-then-synthesize, Randazzo self-automator, Schoenegger overconfident-AI as gap)
- Five model cruxes named (C1 two-axis decision space, C2 verified ceiling assumption, C3 ε operationalisation of L1, C4 task-uniform β, C5 portfolio independence)
- Five Stage-4 fitting targets named (Q1 α/ε from CUPS, Q2 β from Bastani longitudinal, Q3 mode-distribution match against Randazzo BCG, Q4 outside-frontier prediction, Q5 workflow-architecture-vs-capability bound)
- Six scope-limits explicitly named (aggregate-zero E4/O2, cross-task bundling G8, c_AI miscalibration O7, sycophancy E13 quality-degrader, frontier migration O4, multi-tool MRT G10)
- Sibling-topic connections wired (human-psych-variation, navigating-ai-world, planned trust-architecture / prediction-calibration / information-fidelity / bedrock-generating-functions)
- Interactive component CognitivePartnershipModel.tsx (two tabs: per-task router with seven presets + day portfolio with four-strategy comparison)
Pass 2 2026-04-30

error checkinternal consistency checktruth/accuracy override on bias

Why A fresh-eyes audit of pass 1 surfaced three real problems. (a) c_⋆ = max(c_H, c_AI) over-credits verification when c_H > c_AI (treats the human as catching every AI error) and under-credits it when c_H < c_AI (treats a partly-skilled human as catching no AI errors); the empirically defensible form is the complementary product c_⋆ = c_AI + (1−c_AI)·c_H = 1 − (1−c_H)·(1−c_AI), which falls out from independent error events and naturally handles both the deskilled-verifier limit and the verifier-stronger limit. (b) The bilinearity claim was sloppy — V is exactly bilinear in (u, v), so the maximum on the unit square is at a corner. The interior critical point is a saddle. The pass-1 prose said "near a corner or on an edge or at a unique interior critical point" which is wrong. (c) The comparative-statics table had two wrong rows: under (c_AI − c_H) ↑, c_⋆ − c_AI = (1−c_AI)·c_H weakly decreases in c_AI, and σ·(1−c_AI) decreases too, so K_uv falls and v* weakly decreases (pass 1 said "slightly ↑"); under φ ↑, u* responds via the v* corner-flip and is weakly decreasing (pass 1 said "ambiguous"). The bilinearity finding has a substantive consequence: per-task optima land at one of three corners — (0, 0) do-yourself, (1, 0) self-automator, (1, 1) spec-driven — and centaur/cyborg modes appear only as aggregate-level patterns from mixing corners across sub-tasks with heterogeneous θ. The pass-1 mode-region table implied otherwise.
- Replaced c_⋆ = max(c_H, c_AI) with c_⋆ = c_AI + (1−c_AI)·c_H = 1 − (1−c_H)·(1−c_AI) in §3.1, the React component, and the dashboard diagnostic readout. The deskilled-verifier scope-limit caveat in §3.1 is now subsumed by the formula itself (c_H → 0 ⇒ c_⋆ → c_AI without any extra parameter)
- Sharpened §3.5 and §4: V is exactly bilinear in (u, v) with V = K_0 + K_u·u + K_v·v + K_uv·u·v; the maximum on the unit square is always at a corner, never interior
- Reinterpreted §4 mode classification: per-task optimum lands at one of three corners — (0, 0), (1, 0), (1, 1). Centaur and cyborg modes are aggregate-level patterns where a worker mixes corner policies across sub-tasks with heterogeneous θ. The five mode-regions remain a useful spatial decomposition of the (u, v) plane (for labelling observed worker behaviour at any (u, v)), but the per-task router will only return three corners. Updated the "naive cyborg" framing in §5 accordingly: applying interior (u, v) values to every task is exactly what the model says is wrong — optimal routing differentiates across tasks (different corners for different θ), naive uses the same interior for all
- Fixed §4 comparative-statics table: (c_AI − c_H) ↑ now correctly shows v* weakly ↓ (was "↑ slightly"); φ ↑ now shows u* weakly ↓ via v*-flip (was "ambiguous"); c_AI ↑ alone shows v* ↓ (matches "AI more reliable → less verification benefit")
- Recomputed all six worked anchors in §7 under corrected math. Novice (Brynjolfsson) now correctly lands at (1, 0) self-automator with quality lift c_AI − c_H = +0.30; expert at (1, 1) spec-driven with smaller lift +0.06 = +7%. Dell'Acqua outside-frontier at (0, 0) do-yourself with 30 pp loss if mis-routed. Everett high-stakes at (1, 1) spec-driven (matches). Randazzo low-stakes routine at (1, 0) self-automator (matches)
- Bastani guardrail anchor (§7C) honesty fix: at default β = 0.05 and σ ≤ 0.40, the skill-preservation push toward v=1 is too small to flip the corner — the guardrail effect is structural-but-weak in the model. The model says "self-automator dominates spec-driven by ~0.07 net at default constants; raise β or λ to flip." This is now explicitly named as Stage-4 fitting target Q2 (was already there) plus a new diagnostic note that β ≈ 0.20+ would give skill-preservation real corner-flipping leverage
- React component preset notes updated to match what the model actually predicts: novice_centaur preset renamed to novice_spec with adjusted params (φ=0.20, σ=0.50, λ=0.50) producing the (1, 1) spec-driven optimum that "verify to learn" implies; lit_synthesis φ=0.40 → 0.30 to land cleanly at spec-driven; routine_email note acknowledges that the model's v=1 here is the empirical "skim" with the bilinear model rounding partial-verify up to full-verify
- Added a small dashboard note explaining the bilinearity → corner-only finding inline in the diagnostic panel, so users do not conclude the dashboard is broken when v* always returns 0 or 1
- Added scope-limit item in §9: verification effectiveness is modelled as equal to generation skill (c_H drives both c_⋆ and the verification benefit). Karpathy's G9 generator-verifier-asymmetry suggests verification is often easier than generation; a future c_V (verifier skill) parameter separate from c_H would make low-c_H verifiers more effective. Held for Stage 4 / future passes
Pass 3 2026-04-30

internal consistency checkgap scan

Why A pass-2 cold read surfaced two distinct issues. (a) §5 claimed that under attention scarcity the Lagrangian shadow price μ adjusts the per-task choice rule (argmax V_i − μ·A_i·g_i), biasing optimal routing toward higher u when budget binds. The dashboard component did not actually implement this — the "optimal routing" strategy computed per-task argmax V independent of the budget; when budget binds, the dashboard just flagged "over budget" rather than rerouting. The dashboard claimed to demonstrate something it did not. Real prose-vs-code gap. (b) §7C Bastani anchor arithmetic: I wrote "self-automator dominates spec-driven by ~0.07 net at default constants" but V(1,0) − V(1,1) = α·φ − K_uv = 0.30 − (0.12 + 0.025 + 0.06) = 0.30 − 0.205 = 0.095, not 0.07. (c) Bonus framing issue: §4 sat awkwardly with a 5-region mode-classification table next to the corner-only finding — readers could plausibly misread the table as a prediction rather than a labelling tool. Wanted cleaner separation.
- Implemented budget-aware optimal routing in CognitivePartnershipModel.tsx. Added optimizeAtAlpha (per-task corner optimum at a given α_eff) and optimizeBudgetMu (binary search on the shadow price μ until total absolute attention sum_i A_i·g_i·count_i fits the budget). Per-task α_eff_i = α + μ·g_i — longer-base-time tasks feel budget pressure proportionally more, which is the right Lagrangian dimensional analysis. evaluatePortfolio now takes the budget and runs the shadow-price binary search for the optimal strategy; other strategies (always-self / max-AI / naive) keep their fixed (u, v) policy as comparators
- Dashboard now surfaces μ in a footer line below the strategy table: when μ = 0 the budget is slack and per-task choice is unconstrained argmax V; when μ > 0 the budget binds and α_eff = α + μ·g routes longer tasks toward self-automator (the lowest-A corner). Tightening the budget slider triggers visible rerouting, demonstrating the §5 claim
- Updated §5 prose to describe what the dashboard actually does. Removed the "naive cyborg fits in budget; optimal should also fit" framing (which was true under unconstrained per-task optimization but moot now that optimal is budget-aware). Added an explicit description of how μ rises as budget tightens, and of which task types reroute first (high-g tasks with high-σ low-stakes tradeoffs are the first to flip from spec-driven to self-automator)
- Fixed §7C arithmetic: "self-automator dominates spec-driven by ~0.07" → "by 0.095 (α·φ − K_uv = 0.30 − 0.205)" with the explicit subtraction shown. Same direction as before but the number was wrong
- Restructured §4. Now leads with the three-corner decision tree (which corner wins under which sign conditions) and treats the five-region table as a labelling tool for observed worker behaviour, explicitly demoted from "prediction" to "vocabulary-mapping." Removed the redundant comparative-statics-row repetition by tightening the table commentary
- Tightened §3.5 K_v parenthetical from "always ≤ 0 — verifying nothing has no benefit; verifying anything has cost φ" to "K_v captures the cost of verification when there is no AI to verify (u = 0); pure cost, hence ≤ 0 — this is exactly why corner (0, 1) is dominated by (0, 0)". The new wording connects K_v to the (0, 1)-dominance result that follows in the same section
Pass 4 2026-04-30

adversarial + steelmancross-context verification

Why Two structural omissions surfaced on a fresh read of pass 3. (a) The topology had a §8 Objections section engaging the strongest critiques head-on; the model has no equivalent, and three real objections deserve direct engagement: c_AI is unobservable in practice (so the model is unactionable on novel tasks), the model recovers practitioner intuitions and adds nothing new, and the model is single-shot when the topic asks about an evolving capability landscape. Without addressing these, the model reads as undefended on its weakest flanks. (b) §7 worked anchors silently treat six papers measuring different outcome variables (Brynjolfsson is throughput × quality, Dell'Acqua/Everett/Schoenegger are clean Q, Bastani is unassisted-retest performance — a downstream of S over many task instances, Randazzo is behavioural-mode distribution, Mozannar is process telemetry) as if they all calibrated the same model parameter. That is methodologically loose; Stage 4 needs to disaggregate which constants get calibrated against which outcome types.
- Added §10 "Adversarial + steelman" with three objections engaged head-on. (1) c_AI is unobservable on novel tasks: steelman that calibration is the binding constraint Vasconcelos already named; survives because the corner structure is robust across c_AI ranges (deeply self-automator or deeply do-yourself for almost any c_AI in the right (σ, λ, φ) regime), and the spec-driven corner doubles as a Bayesian-update mechanism (verify to learn c_AI for next time) — the verification cost α·v·φ is the explicit price of resolving the uncertainty. (2) Model recovers practitioner intuitions: steelman that Mollick/Karpathy/Anthropic already say the headline conclusions; survives because the model contributes five things the practitioner literature does not — quantitative trade-offs (the 0.095 Bastani gap, etc.), simultaneous constraint (σ ↑ pushes both u down AND v up, not made explicit by practitioners), naive cyborg as structural failure mode (sharper than "be thoughtful"), budget-aware shadow-price reformulation (reroute long tasks first), and Stage-4 calibration targets (practitioner heuristics are unfalsifiable by design). (3) Single-shot vs strategic: brief steelman that the topic asks for a static answer to a dynamic question; survives because Parasuraman 2000 has been useful for 25+ years despite being static, and trajectory questions belong to sibling topics (navigating-ai-world for AI-induced trajectory; planned prediction-calibration for c_AI calibration drift) rather than this one.
- Added cross-context outcome-heterogeneity note at end of §7. Six anchors measure different outcome variables: Q (Dell'Acqua/Everett/Schoenegger) cleanly map to the model's Q; Brynjolfsson maps approximately to Q × call-rate where call-rate depends on attention saved A; Bastani maps to S over many task instances; Randazzo maps to a categorical prediction over corners; Mozannar is process telemetry. Stage 4 should not pool these as a single calibration target — different constants fit against different outcome types.
- Added Stage-4 fitting target Q6 — calibration uncertainty and explore-vs-exploit on c_AI. The model treats c_AI as a known input but in practice workers learn c_AI by running the task; for novel tasks, the spec-driven corner doubles as a calibration mechanism. Stage 4 should formalise what optimal exploration premium looks like when c_AI variance is high. The §9 scope-limit on c_AI miscalibration becomes a structural extension here, not just an acknowledged gap.
- Renumbered: §10 Stage-4 fitting targets → §11; §11 Connections to other topics → §12. Added §10 Adversarial + steelman.
Pass 5 2026-04-30

next movesinternal consistency check

Why Reading the document holistically as input to Stage 4: §11 names six fitting targets (what to test) but does not say where the data lives. Stage 4 will need that pointer first to scope tractability. Adding it now closes the input/output boundary cleanly and saves Stage 4 from re-doing the lit-search to find data. Plus a quick consistency scan after pass-4 numbering changes to catch anything stale.
- Added "Data starting points for each Q" subsection at the end of §11. Each Q1-Q6 gets a one-line pointer to the lit-review paper(s) that motivated the corresponding parameter, with honest notes on likely data availability and the cases where new experimental work is the natural path (Q5 cleanly tested only by a new RCT; Q6 likely needs new design as the closest analogs are bandit-with-verification literature in different domains)
- Added a closing handoff sentence pointing to the project-internal Stage-4 build convention (data CSVs in stage_outputs/<topic>/data/ with a runnable Python pipeline, mirroring human-psych-variation pass-1 of the data stage) so a Stage-4 starter has a concrete shape to follow
- Internal consistency scan: §11 intro line "Six named questions" matches Q1-Q6 ✓; §10 Objection 2 "five things the practitioner literature does not" matches the bullet list ✓; description field "five cruxes, six Stage-4 fitting targets, three engaged objections" matches the artifacts ✓; TLDR para 4 "produces seven things" matches the (1)-(7) list ✓. No stale references found
Pass 6 2026-04-30

fresh-eyes auditinternal consistency check

Why Reading the document cold top-to-bottom surfaced five real issues that snuck in during the iterative passes — exactly the kind of thing fresh-eyes audit is for. (a) §6 calibration anchors final paragraph claimed "the defaults reproduce the empirical mode distribution (Randazzo: ~60% cyborg, ~30% centaur, ~10% self-automator)" — but §4 (after pass 2/3) established that centaur and cyborg do NOT arise as per-task optima; the model can only reproduce that distribution by aggregating heterogeneous-θ tasks. Stale wording from pass 1 that silently re-asserted what later passes disclaimed. (b) §6 mentioned `u_lo` — but u_lo is a React-component labelling threshold, not a model parameter; it should not appear in formalisation prose. (c) §9 closing paragraph said "These seven are not bugs" but I count eight bullets — pass 2 added two new scope-limits and the closing count did not update. (d) §4 prose said "four linear channels with one cross-term u·v" — but Q, R, and S each carry their own u·v bilinear term; only A is strictly linear in both u and v. The consolidated V has one cross-term, but calling the channels "linear" is imprecise. (e) §7C numerical claims about how to flip the Bastani corner ("β ≈ 0.20+ or higher λ (~0.80+) or lower φ (~0.20)") were loose; the actual condition is λ·β > 0.12, which means β must exceed 0.24 at default λ = 0.50, λ must exceed 0.12·(1/β) at given β (so λ alone cannot flip at β = 0.05 — would need λ > 2.4, impossible), and φ must fall below 0.205 at default constants. None of these issues changes a model conclusion, but they would all surface under academic review.
- §6 final paragraph rewritten to be consistent with the bilinearity finding: "The defaults predict (1, 0) self-automator at the routine-low-stakes corner (Randazzo's ~10% empirically), (1, 1) spec-driven where verification benefit dominates verification cost, and (0, 0) do-yourself in outside-frontier regimes. The full Randazzo 60/30/10 distribution emerges only at the aggregate level when a worker mixes corner policies across heterogeneous-θ sub-tasks." Outside-frontier harm reframed: "predict outside-frontier harm (Dell'Acqua E3) when c_H > c_AI and the worker mis-routes to u > 0 (the model's prescription is u* = 0 there)" — making explicit that the harm is from disobeying the optimum, not from a model output.
- §6 stripped the `u_lo` reference. The constants section is for model parameters; mode-classification thresholds are a labelling artifact of the React component and belong there, not in the prose
- §9 closing count fixed: "These seven are not bugs" → "These eight are not bugs" matching the actual eight-item list (aggregate-zero, cross-task bundling, c_AI miscalibration, sycophancy, frontier migration, multi-tool MRT, verifier-skill ≠ generation-skill, partial-verification skim)
- §4 prose tightened: "Bilinearity is faithful to the structure of the problem (four linear channels with one cross-term u·v)" → "Bilinearity is faithful to the structure of the problem: V collects to one bilinear form V = K_0 + K_u·u + K_v·v + K_uv·u·v, even though three of the four channels (Q, R, S) carry their own u·v terms — they sum to one consolidated K_uv". The new phrasing is technically accurate (channels are themselves bilinear; the cross-terms sum to one) and preserves the structural-honesty intent of the original
- §7C numerical claims tightened with explicit algebra: "you'd need β ≈ 0.20+ or higher λ (~0.80+) or lower φ (~0.20)" → "the corner flips when λ·β > α·φ − [(1−c_AI)·c_H + σ·(1−c_AI)] = 0.30 − 0.18 = 0.12. At default λ = 0.50, β must exceed 0.24; at β = 0.05, λ alone cannot flip the corner (would need λ > 2.4, impossible since λ ∈ [0, 1]); at the default β = 0.05, λ = 0.50, the corner flips when φ falls below 0.205 — barely cheaper verification than the default 0.30."
- Internal consistency re-scan after edits: §4 comparative-statics ↑/↓ semantics for binary u*, v* (under bilinearity) is fine in context — the arrows mean "more likely to be 1 vs 0," which is the natural reading. No further changes.