## Model
*topic: technology-utilization-architecture · stage: model · pass 6 · in-progress*

Generator-verifier loop with a per-task autonomy slider. The per-task value function V(u, v; θ) decomposes into four orthogonal channels (quality, attention, risk, skill); V is exactly bilinear in (u, v) so per-task optima land at three corners — do-yourself, self-automator, spec-driven. Centaur and cyborg arise as aggregate-level patterns from cross-sub-task corner mixing. Portfolio aggregation under a daily attention budget surfaces the Lagrangian shadow price μ that reroutes longer tasks first when budget binds. Five cruxes, six Stage-4 fitting targets, three engaged objections. Interactive two-tab dashboard included.

## TLDR

The lit review documents a research landscape; the topology stripped it down to load-bearing structure. This stage formalises the cleanest target the topology surfaced: a **generator-verifier loop with a per-task autonomy slider**, designed to survive capability change rather than encode a snapshot of any specific model's frontier. The optimisation target is **output quality per unit of human attention** for an individual knowledge worker. The formalisation makes three moves at once — **decomposition** (four orthogonal value channels: quality, attention, risk, skill), **generating function** (a per-task value function whose corner solutions reproduce the five empirically observed workflow modes), and **integration** (a single formalism that composes Karpathy's slider, Mollick's typology, Vasconcelos verification-economics, Bastani's atrophy, Bainbridge's substitution myth, and Madras-Mozannar L2D into one object).

The per-task value function is `V(u, v; θ) = Q(u,v) − α·A(u,v) + λ·S(u,v) − σ·R(u,v)`, where `u` is the autonomy level (fraction of the task delegated to AI) and `v` is the verification depth (fraction of AI output independently checked). The four channels are conceptually distinct mechanisms: quality `Q` rewards letting the better agent do the work, with verified output achieving the complementary-product ceiling `c_⋆ = c_AI + (1 − c_AI)·c_H` (either AI was right or human catches the error); attention `A` charges for human time *and* for the irreducible monitoring cost even at full delegation (the L1 substitution-myth invariant baked in via `ε > 0`); risk `R` penalises uncaught AI errors at a rate proportional to stakes σ; skill `S` rewards practice and penalises unverified delegation (Bastani-style atrophy). V turns out to be *exactly bilinear* in `(u, v)` — collecting terms gives `V = K_0 + K_u·u + K_v·v + K_uv·u·v` — which means the maximum on the unit square is always at a corner. Three corners are candidates: `(0, 0)` do-yourself, `(1, 0)` self-automator, `(1, 1)` spec-driven (the corner `(0, 1)` is dominated since verifying with no AI involvement is pure cost). **Centaur and cyborg modes do not arise as per-task optima** — they appear only as aggregate-level patterns when a worker mixes corner policies across sub-tasks with heterogeneous θ. This is a substantive prediction, not a limitation.

The portfolio extension aggregates per-task decisions under a daily attention budget. The headline result the model is designed to produce is **S1** (workflow architecture > model capability): on the same task mix and same `c_AI`, optimal routing dominates "max-AI" (self-automate everything) and dominates the naive flat-cyborg heuristic (`u=0.7, v=0.3` everywhere). The bilinearity finding sharpens this: the naive flat-cyborg policy is exactly the failure mode — it applies an interior `(u, v)` value that the bilinear structure says no individual sub-task should land at. Optimal routing differentiates across tasks (different corners for different θ); the aggregate `(ū, v̄)` across the day looks interior because the corners differ, not because any single decision is interior. The generating function is **parameterised by capability** (the L3 invariant): if `c_AI` rises uniformly across tasks, the model rebalances; if it rises only on certain task types, the boundary of optimal `u*` shifts but the *shape* of the routing rule stays put.

This stage produces seven things: (1) a math object — `V(u, v; θ)` and the four-channel decomposition; (2) a workflow-mode classifier — three-corner per-task router plus a five-region label-partition of the (u, v) plane for observed worker behaviour; (3) a portfolio aggregator with budget-aware shadow-price routing (μ-binary-search over per-task `α_eff = α + μ·g`); (4) the interactive two-tab dashboard below; (5) **five cruxes** of the model (load-bearing claims whose collapse rebuilds it); (6) **six Stage-4 fitting targets** (parameter calibrations and qualitative predictions the data pipeline should test); (7) three engaged objections (`c_AI` unobservable, model just recovers practitioner intuitions, model is single-shot not strategic) with steelmen and what survives. Scope is explicit: the formalisation does *not* capture the aggregate-zero puzzle (E4/O2 — organisational dynamics), cross-task productivity bundling (G8/Cowen), `c_AI` miscalibration on novel tasks (O7), sycophancy as a verification-degrader (E13), or frontier migration over time (O4). These are named scope-limits, not silent assumptions.

<CognitivePartnershipModel client:load />

## How to read this stage

The dashboard above is the artifact. Everything below is the spec.

Two interactive surfaces:

1. **Per-task router.** Inputs: `(c_H, c_AI, φ, σ, λ)` for one task. Outputs: optimal `(u*, v*)`, the workflow-mode label, and a four-bar decomposition showing which channel dominates. Use this to answer "for a task with these characteristics, what's the right way to use AI?"

2. **Day portfolio.** Inputs: a basket of task types with counts. Outputs: total quality, total attention, total skill change under four strategies (always-self / max-AI / naive cyborg / optimal routing). Use this to see the S1 effect — same AI, different workflow architectures, very different outcomes.

If the dashboard says one thing and your gut says another, the diagnostic is to check (a) whether your `(c_H, c_AI)` estimates are calibrated and (b) whether the constants `(α, ε, β, M)` are calibrated for your task density. This is exactly what Stage 4 (the data pipeline) is for.

## 1. The formalisation moves

Three things this stage does — explicit so they're inspectable separately.

**Move 1 — Decomposition.** The per-task value `V` splits into four orthogonal channels (`Q`, `A`, `R`, `S`). "Orthogonal" here means: each channel responds to `(u, v)` in a distinguishable way, so when a slider moves, the user can see which channel is driving the change. This isn't a stylised choice — it's how the topology's mechanism nodes (G3, G7, G9, L1) actually compose. Without the decomposition, "the gain from AI" is a black box; with it, the user can ask "is this gain coming from quality, time-saved, or skill?" and answer.

**Move 2 — Generating function.** The five workflow modes (P1 / E10) are not given as a typology to be matched. They are *generated* as solutions to `argmax V(u, v; θ)` under different parameter regimes. This is the L3 invariant in operational form: change `θ`, the optimum moves, the mode label changes — but the *function that produces the mode* is fixed. A practitioner who hardcodes "use Cursor for boilerplate, ChatGPT for strategy" gets a lookup table; this gets a generating function.

**Move 3 — Integration.** Six prior objects compose into one: Karpathy's autonomy slider (P2 — the `u` axis), Shneiderman's 2D framework (L4 — the (autonomy, control) plane is the (u, v) plane), Vasconcelos verification-economics (G3 — the `−α·v·φ` term), Bastani's guardrail finding (E9 — the `−β·u·(1-v)` skill term), Bainbridge's substitution myth (L1 — the `ε > 0` residual attention), and Madras-Mozannar L2D (L2 — the joint-surface optimisation at portfolio level). None of these alone is the model; the model is what they jointly imply.

What's *not* yet ready for formalisation, kept in §9 as scope-limits: cross-task bundling (G8), organisational absorption (E4/O2), miscalibration of `c_AI` (O7), sycophancy as quality-degrader (E13), frontier migration over time (O4).

## 2. Variables and objects

**Decision variables (per task), continuous on the unit square:**

| Symbol | Range | Meaning |
|---|---|---|
| `u` | [0, 1] | Autonomy level — fraction of generation delegated to AI |
| `v` | [0, 1] | Verification depth — fraction of AI output independently checked |

**Task parameters** (the vector `θ`):

| Symbol | Range | Meaning |
|---|---|---|
| `c_H` | [0, 1] | Human capability on this task type |
| `c_AI` | [0, 1] | AI capability on this task type |
| `φ` | ≥ 0 | Verification-cost ratio (verify-time / generate-time) |
| `σ` | [0, 1] | Stakes — weight on uncaught-error penalty |
| `λ` | [0, 1] | Skill-formation value — how much the worker cares about preserving this skill |

**Constants** (calibrated, not per-task):

| Symbol | Default | Meaning |
|---|---|---|
| `α` | 1.00 | Attention price (normalisation) |
| `ε` | 0.15 | Residual attention at full delegation — L1 invariant (substitution myth) |
| `β` | 0.05 | Skill-atrophy rate per unit of unverified delegation |
| `M` | 0.08 | Per-task metacognitive routing tax — A1/G1 invariant |
| `c_⋆` | `c_AI + (1−c_AI)·c_H` | Verified-output ceiling — "either AI got it right OR human catches the error" (complementary product of independent error events) |

The constants are calibrated to the lit-review anchors (Mozannar CUPS for ε; Bastani 17% drop for β; Tankelevitch metacognitive load for M). They can be re-fit by Stage 4 against telemetry data.

## 3. The per-task value function

```
V(u, v; θ) = Q(u, v) − α·A(u, v) + λ·S(u, v) − σ·R(u, v)
```

### 3.1 Quality channel `Q`

```
Q(u, v) = (1 − u)·c_H + u·[(1 − v)·c_AI + v·c_⋆]
```

With probability `(1 − u)` the human did the generation and quality is `c_H`. With probability `u` the AI did the generation, of which fraction `(1 − v)` ships unverified at quality `c_AI` and fraction `v` is verified. Verified output achieves quality `c_⋆ = c_AI + (1 − c_AI)·c_H = 1 − (1 − c_H)·(1 − c_AI)` — the probability that *either* the AI got it right *or* (it didn't and) the human catches the error, treating the two error events as independent. Linear in `u` for fixed `v`; linear in `v` for fixed `u`.

The complementary-product form natively handles the deskilled-verifier limit (`c_H → 0` ⇒ `c_⋆ → c_AI` — verification adds nothing when the human can't recognise errors) and the verifier-stronger limit (`c_H → 1` ⇒ `c_⋆ → 1` — careful human verification approaches a quality ceiling). Pass 1 used `c_⋆ = max(c_H, c_AI)`, which over-credits verification when `c_H > c_AI` (as if the human catches every AI error) and under-credits it when `c_H < c_AI` (as if a partly-skilled human catches no AI errors). The form here treats both with one expression and one structural assumption (independence of human and AI error events).

Caveat: `c_H` enters both the generation cost (when `u = 0`) and the verification benefit (the extra catching power at `v > 0`). Karpathy's G9 generator-verifier-asymmetry says verification is typically *easier* than generation — recognising an error costs less than producing a correct answer from scratch. A more accurate model would carry a separate `c_V` (verifier capability) ≥ `c_H` for low-`c_H` workers. Held for Stage 4; named in §9 scope-limits.

### 3.2 Attention channel `A`

```
A(u, v) = (1 − u·(1 − ε)) + v·φ + M
```

Three pieces:

- `(1 − u·(1 − ε))`: human-side generation cost. At `u = 0`, the human does it all (cost = 1, the base generation time). At `u = 1`, residual attention `ε` remains — every offload creates monitoring/coordination work (Bainbridge L1). `ε > 0` is *the* L1 invariant in operational form: a model with `ε = 0` would predict that full delegation is free of attention cost, which is exactly the substitution myth.
- `v·φ`: verification cost. Linear in verification depth, scaled by the per-task verification-cost ratio `φ`. This is the G3 (Vasconcelos verification-economics) term.
- `M`: per-task metacognitive routing tax. Constant per task — classifying, choosing the workflow, monitoring AI for handoffs. Tankelevitch's metacognitive-demand finding (G1) compressed to a constant; Stage 4 can test whether `M` is task-type-dependent.

Note that ε is the *only* term that decouples attention from the (u, v) decision. It is what makes "max-AI" not free.

### 3.3 Risk channel `R`

```
R(u, v) = u·(1 − v)·(1 − c_AI)
```

Probability-of-uncaught-error: AI generated the output (`u`), it was not verified (`1 − v`), and the AI was wrong (`1 − c_AI`). Multiplied by stakes `σ` in the value function. This is the G2 (ironies-of-automation) term: rare critical errors get missed precisely when delegation is high and verification is low.

For high-σ tasks, this term is large enough to drive `v → 1` (the spec-driven / independent-then-synthesize regime, E8 — Everett 2025). For low-σ tasks, it's negligible and the optimum can sit at `v = 0` without harm.

### 3.4 Skill channel `S`

```
S(u, v) = (1 − u) − β·u·(1 − v)
```

Two terms:

- `(1 − u)`: practice — the human builds skill on the fraction of the work they did themselves.
- `−β·u·(1 − v)`: atrophy — unverified delegation erodes capacity. Verification preserves engagement (this is Bastani's "hint mode" finding E9: guardrails → no atrophy). The product `u·(1 − v)` is exactly the self-automator regime where atrophy is fastest.

`λ` is the worker's per-task valuation of preserving the skill. For tasks the worker explicitly wants to maintain capacity on (their core craft), `λ` is high and the skill term has bite. For boilerplate or one-off tasks, `λ` is low and skill is correctly ignored.

### 3.5 Putting it together

```
V(u, v; θ) =   (1 − u)·c_H + u·[(1 − v)·c_AI + v·c_⋆]            ← Q
             − α·[(1 − u·(1 − ε)) + v·φ + M]                       ← −α·A
             + λ·[(1 − u) − β·u·(1 − v)]                           ← +λ·S
             − σ·[u·(1 − v)·(1 − c_AI)]                            ← −σ·R
```

Five parameters in θ, two decisions, four channels. **Exactly bilinear** in `(u, v)`: collecting terms,

```
V(u, v; θ) = K_0 + K_u·u + K_v·v + K_uv·u·v
```

with

- `K_0 = c_H − α − α·M + λ`
- `K_u = (c_AI − c_H) + α·(1 − ε) − λ·(1 + β) − σ·(1 − c_AI)`
- `K_v = −α·φ` (K_v captures the cost of verification when there is no AI to verify, i.e., at `u = 0`; pure cost, hence ≤ 0 — this is exactly why corner `(0, 1)` is dominated by `(0, 0)` below)
- `K_uv = (c_⋆ − c_AI) + λ·β + σ·(1 − c_AI) = (1 − c_AI)·c_H + λ·β + σ·(1 − c_AI)` (always ≥ 0 — verification gain is monotone in `u`)

Bilinear functions on a unit square attain their maximum at a corner. The interior critical point, when it exists, is a saddle (Hessian eigenvalues `±K_uv`). So the per-task optimum is at one of the four corners — and since `K_v < 0`, `(0, 1)` is dominated by `(0, 0)` (verifying when no AI is involved is pure cost). The three meaningful corners:

- **`(0, 0)`** — do-yourself.
- **`(1, 0)`** — full delegation, no verification (self-automator).
- **`(1, 1)`** — full delegation with full verification (spec-driven / independent-then-synthesize).

Which corner wins depends on the signs of `K_u`, `K_u + K_uv + K_v`, and `K_v + K_uv` — three linear comparisons over θ.

## 4. Optimal policy: the three corners that win

V is bilinear → max at a corner. Of the four corners, `(0, 1)` is dominated by `(0, 0)` because `K_v < 0` (verifying with no AI involvement is pure cost). Three candidates remain — and a clean decision tree determines the winner:

- **Spec-driven `(1, 1)`** wins iff `K_u + K_v + K_uv > 0` *and* `K_v + K_uv > 0`.
- Else **self-automator `(1, 0)`** wins iff `K_u > 0`.
- Else **do-yourself `(0, 0)`** wins.

Equivalently: spec-driven dominates self-automator when `α·φ < K_uv` (verification cost is below benefit at full delegation: `α·φ < (1 − c_AI)·c_H + λ·β + σ·(1 − c_AI)`); self-automator dominates do-yourself when `K_u > 0` (the attention savings + AI quality gain outweigh skill loss + stakes risk).

**Comparative statics — what moves the corner choice:**

| Parameter increase | Effect on `u*` | Effect on `v*` (given `u* = 1`) |
|---|---|---|
| `c_AI − c_H` ↑ via `c_AI` rising (`c_H` fixed) | ↑ (K_u rises) | ↓ (K_uv falls: less verification benefit) |
| `φ` ↑ (verification expensive) | weakly ↓ via v\*-flip | ↓ (K_v more negative) |
| `σ` ↑ (stakes) | weakly ↓ (K_u falls by `σ·(1−c_AI)`) | ↑ (K_uv rises by `σ·(1−c_AI)`) |
| `λ` ↑ (skill matters) | weakly ↓ (K_u falls by `λ·(1+β)`) | ↑ slightly (K_uv rises by `λ·β`) |
| `c_AI` ↑ alone | ↑ | ↓ (both `(1−c_AI)·c_H` and `σ·(1−c_AI)` fall) |

This is the L3 invariant in tabular form: if `c_AI` rises uniformly, optimal `u*` rises and `v*` falls. The structural shape of the rule does not change. Two signs worth flagging because they're non-obvious: more reliable AI (`c_AI` ↑) *reduces* verification benefit (fewer errors to catch); higher stakes (`σ` ↑) pushes `u*` *down* (refuse AI when stakes are high) *and* `v*` *up* (if you do use AI, verify carefully) — a real tension the bilinear structure makes explicit.

### The five practitioner modes — labels for the (u, v) plane

The practitioner literature names five workflow modes. They span the (u, v) plane and are useful as **vocabulary for labelling observed worker behaviour** at any `(u, v)`:

| Region | Mode | Practitioner anchor |
|---|---|---|
| `u ≈ 0` | **Do-yourself** | (no-AI / refuse-AI) |
| `u ∈ (0, 1)`, `v` high | **Centaur** | Mollick — clean handoff with verification gate |
| `u ∈ (0, 1)`, `v` low/mid | **Cyborg** | Mollick — interleaved, partial verification |
| `u ≈ 1`, `v` high | **Spec-driven / independent-then-synthesize** | Everett 2025; Compound Engineering |
| `u ≈ 1`, `v` low | **Self-automator** | Randazzo HBS 26-036 (the trap) |

The per-task router only returns the three corners — `(0, 0)`, `(1, 0)`, `(1, 1)` — corresponding to do-yourself, self-automator, and spec-driven. **Centaur and cyborg do not arise as per-task optima.** They appear empirically as **aggregate-level labels** when a worker mixes corner policies across sub-tasks with heterogeneous θ — some sub-tasks done alone, others fully delegated; some AI outputs verified, others shipped. The day-level `(ū, v̄)` averages out to interior values that get labelled "cyborg" or "centaur" depending on the ratio. The Day Portfolio tab demonstrates this directly.

This is a substantive prediction of the formalisation, not a limitation. Bilinearity is faithful to the structure of the problem: V collects to one bilinear form `V = K_0 + K_u·u + K_v·v + K_uv·u·v`, even though three of the four channels (Q, R, S) carry their own `u·v` terms — they sum to one consolidated `K_uv`. The model says: at any single sub-task with a single θ, pick a corner — fully delegate or don't, fully verify or don't. The naive flat-cyborg strategy (`u = 0.7, v = 0.3` applied to every task) is exactly the failure mode — applying an interior policy uniformly is what no individual sub-task should do under bilinearity.

## 5. Portfolio aggregation — the day

A worker faces N tasks per day, each with its own `θ_i`. Total attention budget `T`. The portfolio problem:

```
maximise   Σ_i Q_i(u_i, v_i) · count_i
subject to Σ_i A_i(u_i, v_i) · g_i · count_i ≤ T
```

where `g_i` is the task type's base generation time. The Lagrangian gives a shadow price `μ ≥ 0` on the budget constraint, and the per-task choice rule becomes

```
argmax V_i − μ·A_i·g_i  =  argmax V_i with α replaced by α_eff_i = α + μ·g_i
```

— that is, **budget pressure raises the effective attention price, more so for longer-base-time tasks**. When `μ = 0` the budget is slack and per-task choice is unconstrained `argmax V`; when `μ > 0` the budget binds and `α_eff` rises until total absolute attention `Σ A_i·g_i·count_i` fits `T`. The biasing is structural: raising `α_eff` raises `K_u` (self-automator becomes more attractive vs. do-yourself) and makes `K_v = −α_eff·φ` more negative (verification becomes less attractive). So as the budget tightens, longer tasks reroute first from spec-driven `(1, 1)` to self-automator `(1, 0)` — the lowest-A corner.

This is the L2 invariant operationalised at portfolio level. The naive practitioner rule "use AI when AI is better" compares `c_H` to `c_AI` task-by-task in isolation; the joint-surface rule compares marginal `Q` per unit attention saved against the day's shadow price. They give the same answer when attention is abundant (`μ ≈ 0`); they diverge sharply when the day is tight.

**Strategies the dashboard compares:**

1. **Always-self.** `u_i = 0` for all i. No AI, no atrophy, no verification cost — but no productivity gain. The pre-AI baseline.
2. **Max-AI.** `u_i = 1, v_i = 0` for all i. The corner uniformly applied. Fast, but high risk on high-σ tasks and accelerated atrophy on high-λ tasks.
3. **Naive cyborg.** `u_i = 0.7, v_i = 0.3` for all i. A flat *interior* policy applied uniformly — exactly what the bilinear structure says no individual task should land at. Interior values arise legitimately only as averages over heterogeneous-θ sub-tasks. Applying them uniformly violates the structure and underperforms.
4. **Optimal routing (budget-aware).** Per-task corner choice under the shadow-price-adjusted `α_eff_i = α + μ·g_i`, with `μ` solved by binary search until the budget is met (or until even uniform self-automator overflows). Each task lands at the corner appropriate to its θ *and* the day's binding budget. The aggregate `(ū, v̄)` across the day looks interior because different tasks land at different corners; the dashboard surfaces `μ` below the strategy table so the user can see when the budget is biting.

The headline prediction (S1): **optimal routing dominates max-AI by quality-per-attention and dominates always-self by attention efficiency, on the same `c_AI`**. The gap between optimal routing on mid-tier AI and naive-flat routing on frontier AI is the empirical bound the model places on "workflow architecture > model capability." Mechanism: optimal routing differentiates across tasks (different corners for different θ) and reroutes under budget pressure (shadow price `μ`); naive uniformly applies an interior policy that is structurally never the per-task optimum at any α.

## 6. Calibration anchors

Where the constants come from. None of these is precise; all are Stage-4 fitting targets.

- **`α = 1`** — normalisation. Attention is the numéraire; all other costs are denominated in attention units.
- **`ε = 0.15`** — Mozannar CUPS (E12) shows verification + monitoring is a "substantial fraction" of total interaction time even when AI is doing the generation. 15% is a midpoint of the reported range; Stage 4 should tighten this from telemetry.
- **`β = 0.05`** — Bastani's 17% unassisted-performance drop (E9) over a session of ~30 unguardrailed tasks → ~0.5% atrophy per task at `u = 1, v = 0`. Setting `β = 0.05` means the model implies ~5% atrophy per task in the worst-case regime, which compounds to Bastani's order-of-magnitude over a week. The lit review explicitly notes most studies are under 12 months; β at the per-task scale is what compounds to the longitudinal scale.
- **`M = 0.08`** — Tankelevitch (G1) finds metacognitive load is the binding constraint for AI users; CUPS (E12) finds verification + planning consume a meaningful fraction of total time. 8% per task is a calibration anchor consistent with the magnitude of the metacognitive-bottleneck claim. Stage 4 should test whether `M` varies by task type (it likely does — high-stakes strategic decisions have larger M than routine email).

At the per-task level, the defaults predict `(1, 0)` self-automator at routine-low-stakes corners (Randazzo's ~10% empirically), `(1, 1)` spec-driven where verification benefit dominates verification cost, and `(0, 0)` do-yourself in outside-frontier regimes. The full Randazzo 60/30/10 (cyborg/centaur/self-automator) distribution emerges only at the *aggregate* level, when a worker's day mixes corner policies across heterogeneous-θ sub-tasks — see §4. The defaults predict outside-frontier harm (Dell'Acqua E3) when `c_H > c_AI` and the worker mis-routes to `u > 0` (the model's prescription is `u* = 0` there); the harm is from disobeying the optimum, not from a model output.

## 7. Worked anchors against the empirical record

Six probes. Each picks a parameter regime, computes the corner optimum exactly, and compares to the lit-review anchor.

**A. Brynjolfsson (E1, E2) — customer-service novice +34%, top performers ~0%.**

*Novice:* `θ = (c_H = 0.40, c_AI = 0.70, φ = 0.20, σ = 0.20, λ = 0.10)`. `c_⋆ = 0.70 + 0.30·0.40 = 0.82`. Then `K_u = 0.30 + 0.85 − 0.105 − 0.06 = 0.985 > 0` (full delegation beats do-yourself), and `K_v + K_uv = −0.20 + (0.12 + 0.005 + 0.06) = −0.015 < 0` (verification cost barely exceeds benefit). **Optimum: `(1, 0)` self-automator.** Quality lift over always-self: `c_AI − c_H = +0.30`. Direction matches the +34% Brynjolfsson finding (which is in resolution rate, mixing speed and accuracy).

*Expert:* `θ = (c_H = 0.85, c_AI = 0.70, φ = 0.20, σ = 0.20, λ = 0.10)`. `c_⋆ = 0.70 + 0.30·0.85 = 0.955`. Then `K_u = −0.15 + 0.85 − 0.105 − 0.06 = 0.535 > 0`, and `K_v + K_uv = −0.20 + (0.255 + 0.005 + 0.06) = +0.12 > 0` (verification benefit dominates because `c_⋆ − c_AI = 0.255` is large — a skilled human catches AI errors). **Optimum: `(1, 1)` spec-driven.** Quality lift: `c_⋆ − c_H = +0.105`, only +12% relative.

So the model produces: novice at full-delegate-no-verify (Q = 0.70 from c_AI alone), expert at full-delegate-full-verify (Q = 0.955 = AI augmented by skilled human catching). Both delegate; the difference is in verification depth. The empirical "expert ~0%" finding reflects throughput ceiling (experts already at maximum call rate, can't redeploy saved attention to more calls) rather than zero quality lift — a context the model doesn't carry.

**B. Dell'Acqua BCG (E3) — outside-frontier 19-pp quality drop.** `θ = (c_H = 0.70, c_AI = 0.40, φ = 0.30, σ = 0.50, λ = 0.50)`. `c_⋆ = 0.40 + 0.60·0.70 = 0.82`. Then `K_u = −0.30 + 0.85 − 0.525 − 0.30 = −0.275 < 0`. **Optimum: `(0, 0)` do-yourself.** If the worker mis-routes to `(1, 0)`, quality drops from `c_H = 0.70` to `c_AI = 0.40` — a 30 pp loss. The empirical 19 pp reflects partial mis-routing (some subjects partially used AI, some didn't); the model's prediction is a clean upper bound on the harm.

**C. Bastani PNAS (E9) — guardrails preserve skill.** Generic learning task with high skill-formation: `θ = (c_H = 0.40, c_AI = 0.70, φ = 0.30, σ = 0.20, λ = 0.50)`. `c_⋆ = 0.82`. `K_u = 0.30 + 0.85 − 0.525 − 0.06 = 0.565 > 0`. `K_v + K_uv = −0.30 + (0.12 + 0.025 + 0.06) = −0.095 < 0`. **Optimum: `(1, 0)` self-automator.**

This is a real and honest finding: at lit-review-anchored constants (β = 0.05), the skill-preservation push toward verification (`λ·β·u = 0.025` at u = 1) is too small to flip the corner against verification cost. The guardrail effect is *structurally present* — at the (1, 1) corner, `S = 0` (no atrophy) versus `S = −0.05` at (1, 0), so `λ·ΔS = +0.025` — but the verification cost (`α·φ = 0.30`) dominates. Numerically, self-automator beats spec-driven by `α·φ − K_uv = 0.30 − 0.205 = 0.095` net at default constants. To make guardrails decisive (flip the corner to spec-driven), the model needs `λ·β > α·φ − [(1 − c_AI)·c_H + σ·(1 − c_AI)] = 0.30 − 0.18 = 0.12`. At default `λ = 0.50`, `β` must exceed `0.24`; at `β = 0.05`, `λ` alone cannot flip the corner (would require `λ > 2.4`, impossible since `λ ∈ [0, 1]`); at default `β = 0.05` and `λ = 0.50`, lowering `φ` flips the corner once `φ < 0.205` — barely cheaper verification than the default 0.30.

**Honest reading:** the model says the guardrail effect is real but weak at default constants. Stage-4 fitting target Q2 is whether `β` should be larger to match Bastani's empirical magnitude. This is not a model failure — it's the model surfacing a calibration question the lit review left implicit.

**D. Everett (E8) — independent-then-synthesize restores complementarity.** `θ = (c_H = 0.65, c_AI = 0.70, φ = 0.40, σ = 0.90, λ = 0.30)`. `c_⋆ = 0.70 + 0.30·0.65 = 0.895`. `K_u = 0.05 + 0.85 − 0.315 − 0.27 = 0.315 > 0`. `K_v + K_uv = −0.40 + (0.195 + 0.015 + 0.27) = +0.08 > 0`. **Optimum: `(1, 1)` spec-driven.** Mechanism: the `σ·(1 − c_AI) = 0.27` term in `K_uv` makes verification valuable precisely because stakes are high and AI is fallible. Matches Everett's lit-review story exactly.

**E. Randazzo self-automator (E10) — ~10% of consultants in the trap.** Routine consulting task: `θ = (c_H = 0.70, c_AI = 0.85, φ = 0.20, σ = 0.20, λ = 0.10)`. `c_⋆ = 0.955`. `K_u = 0.15 + 0.85 − 0.105 − 0.03 = 0.865 > 0`. `K_v + K_uv = −0.20 + (0.105 + 0.005 + 0.03) = −0.06 < 0`. **Optimum: `(1, 0)` self-automator.** Self-automator is the correct policy at this θ — the empirical finding "~10% of consultants are self-automators" is about θ-distribution (~10% of work-instances have these characteristics), not about systematic mis-routing. A misread of Randazzo's data as "self-automator is always wrong" is a category error the model corrects.

**F. Schoenegger (E18) — even overconfident GPT improves forecasting +23–43%.** This is *outside* the model's current formalism. The model attributes any gain at `u > 0` to `c_AI` (advice quality), but Schoenegger's finding suggests structured reasoning is doing significant work independent of the AI's confidence calibration. A constant `+δ` to Q whenever `u > 0` would represent this — held for future passes if it changes downstream predictions. **Honest gap.**

### Cross-context note: outcome heterogeneity across these anchors

The six papers above measure different outcome variables. The model's `Q` ("probability of correct/high-quality output") maps cleanly onto Dell'Acqua, Everett, and Schoenegger (all output-quality measures). Brynjolfsson's "issues resolved per hour" maps approximately onto `Q × call-rate`, where call-rate depends on attention saved (`A`); the model's `+34% novice` prediction is a Q-only claim, while Brynjolfsson's empirical 34% bundles throughput. Bastani's "unassisted retest performance" is a downstream effect of the `S` channel accumulated over many task instances, not a per-task `Q` measurement. Randazzo's "behavioural-mode distribution" is a categorical prediction over which corner the worker chooses, not a quality measure at all. Mozannar's CUPS data is process telemetry (time fractions across interaction states), informative for `α` and `ε` calibration but not for `Q`.

Stage 4 must disaggregate which constants get fit against which outcome types — pooling them as a single calibration target would silently average over methodological apples and oranges. This is named explicitly as Q1–Q6 in §11.

## 8. The five cruxes

Load-bearing claims of the model. Collapse rebuilds it.

**C1 — Two-axis decision space (u, v).** The decision is reduced to autonomy and verification depth. If the actual workflow choice space has more dimensions that matter — e.g., context-engineering depth, prompt-iteration count, tool selection — the model is incomplete. *What would flip it:* empirical evidence that two workflows with identical `(u, v)` but different context-engineering produce systematically different outcomes (which the practitioner literature suggests is real — S4).

**C2 — Verification effectiveness equals generation skill.** `c_⋆ = c_AI + (1 − c_AI)·c_H` treats `c_H` as both generation skill *and* verifier capability — a worker who'd produce 40%-quality output alone catches 40% of AI errors. Karpathy's G9 generator-verifier-asymmetry says verification is typically *easier* than generation; a separate `c_V ≥ c_H` parameter would make low-`c_H` workers more effective verifiers. *What would flip it:* empirical evidence that verifier-recognition rates are uncorrelated with generation skill (would require introducing `c_V`). Deskilled-verifier worry is partially handled by the formula already (`c_H → 0` ⇒ `c_⋆ → c_AI`), but the *mechanism* of skilled-but-not-creative verifier (the editor archetype) is not in scope.

**C3 — `ε > 0` is the right operationalisation of L1.** The substitution-myth invariant is captured as residual attention at full delegation. If the actual structure is more like "delegation creates new tasks of comparable effort" rather than "delegation creates monitoring overhead", a constant ε is wrong shape. *What would flip it:* evidence that delegation-induced work scales with task complexity, not as a flat constant.

**C4 — Skill atrophy `β` is task-type-uniform.** All tasks atrophy at the same rate per unit of `u·(1-v)`. Some skills (e.g., motor-procedural) atrophy slower than others (verbal-fluency, calibration). *What would flip it:* longitudinal data on skill-specific atrophy rates under controlled AI-use exposure.

**C5 — Tasks are independent in the portfolio.** `Σ_i V_i` aggregates linearly. Cross-task productivity bundling (G8/Cowen) violates this — productivity gains on related tasks are correlated, not additive. *What would flip it:* empirical evidence that observed aggregate productivity (E4 / Humlum-Vestergaard zero) is driven by task-coupling effects, not just by individual mis-routing. This is the most likely-to-flip crux: the aggregate-zero puzzle is a smoking gun for it.

## 9. Scope limits — what the model does NOT capture

Honest disclosure of where the formalisation stops.

- **Aggregate-zero puzzle (E4 / O2).** Humlum-Vestergaard's zero across 25,000 Danish workers is *organisational*, not individual — task reorganisation, managerial absorption, coordination costs. The model is individual-level (the A6 crux of the topology); it cannot rebut or explain E4 directly. This is a sibling-artifact problem (organisational-level model), not a parameter to set. The model is locally optimal at the individual level; it is silent about whether aggregate effects emerge.
- **Cross-task bundling (G8).** `V` is summed across tasks; in reality `V_i` and `V_j` covary when the tasks are productivity-linked. C5 names this; the model does not encode it.
- **Calibration error on `c_AI` (O7).** `c_AI` is treated as known. In practice users are systematically miscalibrated (E16 — higher AI confidence → less critical thinking; E13 — sycophancy escalation). The model's `c_AI` should be the user's *belief* about AI capability; if belief and reality diverge, the model is locally optimal under a wrong belief. The topology's O7 (verification cost vs. verification calibration) is the natural extension.
- **Sycophancy as quality-degrader (E13).** The model assumes verification only helps. Randazzo HBS 26-021 documents AI flipping correct human judgments under pushback — verification can *worsen* outcomes when sycophancy escalates. Not encoded; would require a `c_⋆` that depends on the human's resistance to AI-pushback.
- **Frontier migration (O4).** `c_AI` is static within a session. Over months the frontier moves; the user must recalibrate. The model is a snapshot — extending it to a dynamic version would couple `c_AI(t)` to a learning model of the user's frontier-mapping rate.
- **Multi-tool attention interference (G10 — Wickens MRT).** The model is per-task; it does not capture the cost of running Cursor + ChatGPT + Slack simultaneously. The topology added G10 specifically to flag this; Stage 4 / Stage 5 should test whether a `M_concurrent(N_tools)` extension is needed.
- **Verifier skill ≠ generation skill (Karpathy G9).** The model uses `c_H` as both generation capability and verification effectiveness. Empirically, verification is often easier than generation — recognising an error costs less than producing a correct answer. A `c_V ≥ c_H` parameter would let low-`c_H` workers benefit from verification more than the current formula predicts. Held for Stage 4 / future passes; the C2 crux names this.
- **Partial verification ("skim") rounds to full or none.** Bilinearity makes `v* ∈ {0, 1}`. The empirical reality of skimming (partial-depth verification at reduced cost and reduced effectiveness) does not map onto the model — the model rounds skim up to full-verify when verification is cheap and down to no-verify when it's expensive. A future variant with convex verification cost (`v²·φ` instead of `v·φ`) or with a separate verification-depth-vs-effectiveness curve would produce interior `v*` solutions. Not added in this pass to preserve bilinearity's analytical clarity.

These eight are not bugs of the formalisation. They are the boundary of what an individual-task bilinear generator-verifier loop can carry. Beyond it lies organisational design, dynamic learning, and team-level cognition — sibling artifacts.

## 10. Adversarial + steelman

Three objections the formalisation has not yet engaged head-on, and the strongest version that survives each.

### Objection 1: `c_AI` is unobservable, so the model is unactionable on novel tasks.

**Steelman.** The optimal policy depends on `c_AI`, but in any new task or unfamiliar domain the worker doesn't know `c_AI` ahead of time. Calibrating `c_AI` requires running the task and verifying the output — but the model says "compute `argmax V` using `c_AI` you don't have." For experienced task types `c_AI` is calibratable from history; for genuinely novel tasks (much of knowledge work) it isn't. The Vasconcelos / Fok & Weld verification-economics frame already captures this — engagement is rational when verification is cheap; verifying a single AI output IS your way of measuring `c_AI` for that task type. The model assumes the calibration question is solved when it's actually the binding constraint.

**Why partially right.** For genuinely novel tasks where `c_AI` is unknown, the model cannot make a precise recommendation. Stage-4 fitting target Q3 (mode-distribution match against Randazzo BCG) implicitly assumes calibrated `c_AI` distributions across BCG-like tasks — only valid for well-studied domains.

**Why the strongest version survives.** The model doesn't need precise `c_AI`; it needs robustness across `c_AI` ranges, and the corner structure is robust. For almost any `c_AI < 0.4` in a high-stakes regime (`σ ≥ 0.7`), the corner is do-yourself; for almost any `c_AI > 0.8` in low-stakes routine (`σ ≤ 0.2`, low `λ`), the corner is self-automator. The "interesting" boundary regions — where `c_AI` uncertainty matters — are precisely the spec-driven regions. So the model's prescription under uncertainty is: **set `v = 1` to verify and learn**. The spec-driven corner doubles as a Bayesian-update mechanism; the verification cost `α·v·φ` is the explicit price of resolving the uncertainty. Stage 4 should formalise the explore-vs-exploit dynamics this implies (Q6 in §11).

### Objection 2: The model recovers practitioner intuitions and adds no new predictions.

**Steelman.** Mollick, Karpathy, Anthropic, and Cognition collectively say "match the workflow to the task." The model's predictions of corner solutions match Randazzo's empirical mode distribution. So what is the formalism contributing beyond a formal scaffold for intuitions practitioners already had?

**Why partially right.** Many model predictions match practitioner consensus on headline conclusions. Self-automator for routine, do-yourself for novel-and-high-stakes, spec-driven for high-stakes-with-cheap-verify — practitioners already say these.

**Why the strongest version survives.** The model contributes five things the practitioner literature does not:

1. **Quantitative trade-offs.** The model says how much worse self-automator is than spec-driven at default constants — at the Bastani anchor specifically, `0.095` net (`α·φ − K_uv = 0.30 − 0.205`). Practitioner literature is qualitative; this is parametrically calibratable, and Stage 4 will pin the constants.
2. **Non-obvious simultaneous constraints.** Higher stakes (`σ ↑`) push BOTH `u` DOWN (refuse AI for high-stakes work) AND `v` UP (if you do use AI, verify carefully) — a simultaneous prescription the practitioner literature does not make explicit. §4's comparative-statics surfaces this; intuition often conflates the two.
3. **Naive cyborg as structural failure mode.** Bilinearity says applying interior `(u, v)` uniformly is what no individual sub-task should do. This is a sharper criticism than "be thoughtful about which mode you use" — it identifies a specific failure mode (the BCG cyborg majority running flat-`(0.7, 0.3)` policies) and says they're structurally wrong, not just suboptimal.
4. **Budget-aware shadow-price reformulation.** The `μ` mechanism says: under attention scarcity, reroute longer-base-time tasks first (since `α_eff_i = α + μ·g_i` rises proportionally with `g_i`). Practitioner literature has nothing like this prescriptive structure for portfolio-level decisions.
5. **Stage-4 calibration targets.** The model identifies six specific empirical questions (Q1–Q6 in §11) that fit a generator-verifier dispatch framework. Practitioner heuristics are unfalsifiable by design — they update without preserving the reasoning, so users can't tell when a heuristic stops applying. The L3 invariant is the antidote.

The model recovers practitioner intuitions on top-line conclusions and adds quantitative, edge-case, and falsifiable structure beyond. The contribution is in calibration, simultaneous constraints, structural failure modes, portfolio-level shadow pricing, and Stage-4 testability — not in inventing new top-level recommendations.

### Objection 3: The model is single-shot; the topic question is dynamic.

**Steelman.** "Optimal configuration for an individual knowledge worker" is implicitly a static answer to a fundamentally dynamic question — AI capability shifts month-over-month, the worker's skill atrophies under sustained delegation, calibration on `c_AI` drifts as new model versions ship. A static dispatcher with parametric flexibility doesn't tell the worker how to *anticipate* and *prepare for* capability change.

**Survives because** (a) Parasuraman, Sheridan & Wickens' (2000) function-allocation framework has been useful for 25+ years despite being static — parametric statics IS what's wanted from a generating function (the L3 invariant); (b) the trajectory questions properly belong to sibling topics in the LLM Iterate roster — `navigating-ai-world` for AI-induced trajectory of skill/meaning/relational channels, the planned `prediction-calibration` topic for `c_AI` calibration drift, the planned `bedrock-generating-functions` for the temporal-aggregation patterns this and other models share. Static-but-parameterised is the right scope here; dynamic extensions are cross-topic by design, not gaps in the present formalisation.

## 11. Stage-4 fitting targets

Six named questions Stage 4 should test against data.

**Q1 — `(α, ε)` from CUPS telemetry.** Mozannar CUPS gives time fractions across coding interaction states. Calibrate `ε` from observed monitoring-time-at-high-`u`; calibrate `α` from observed cyborg-vs-centaur time efficiency.

**Q2 — `β` from Bastani longitudinal regime.** Bastani's 17% drop is one window. Longer-window data (Lee-Sarkar CHI 2025; Anti-Social Century) should pin per-task atrophy rate. The model predicts: `β` should be ~10× larger for skills the worker uses heavily than for tasks they delegate occasionally.

**Q3 — Mode-distribution match against Randazzo BCG.** Given a θ-distribution prior over BCG-consultant tasks, does the optimal-routing distribution over (u\*, v\*) match the observed (~60% cyborg / ~30% centaur / ~10% self-automator)? If not, which constant is mis-fit?

**Q4 — Outside-frontier harm prediction (Dell'Acqua replication).** For subjects forced to use AI on `c_H > c_AI` tasks, the model predicts a quality drop proportional to `u·(c_H − c_AI)`. Stage 4 should test the linearity and the slope.

**Q5 — Workflow-architecture-vs-capability bound.** The headline S1 prediction. Construct two simulated workforces: (a) frontier AI + naive flat-cyborg routing, (b) mid-tier AI + optimal routing on the same task mix. The model predicts (b) outperforms (a) on net quality once `c_AI_naive` falls below some threshold. Stage 4 should locate the threshold and test against Everett 2025 / Dell'Acqua at that precision.

**Q6 — Calibration uncertainty and explore-vs-exploit on `c_AI`.** The model treats `c_AI` as a known input. In practice, workers learn `c_AI` by running the task and verifying the output (Vasconcelos verification-economics). For novel tasks, the spec-driven corner doubles as a calibration mechanism — verification reveals `c_AI` for next time. Stage 4 should formalise the explore-vs-exploit structure: what is the optimal exploration premium when `c_AI` variance is high? When does verifying-to-learn dominate verifying-to-quality-control? The §9 scope-limit on `c_AI` miscalibration becomes a structural extension via this question, not just an acknowledged gap. Engages the §10 Objection 1 directly.

### Data starting points for each Q

The natural starting point for each fitting target is the lit-review paper(s) that motivated the corresponding parameter. Honest notes on likely data availability:

- **Q1 (`α`, `ε`)** — Mozannar CUPS 2024 (E12) telemetry across coding interaction states. Process-level data is likely Microsoft Research-internal; replication via Cursor / Claude Code anonymized usage logs is the alternative path.
- **Q2 (`β`)** — Bastani PNAS 2025 (E9) is the primary anchor; PNAS supplementary materials likely include the longitudinal panel needed to estimate per-task atrophy. Lee-Sarkar CHI 2025 (E16) provides multi-task panel context for a complementary fit.
- **Q3 (mode distribution)** — Randazzo HBS WP 26-036 (E10) for the 60/30/10 BCG distribution. HBS supplementary materials may include individual-level mode tags; failing that, an in-house replication on a smaller knowledge-worker sample is tractable.
- **Q4 (outside-frontier slope)** — Dell'Acqua BCG study (E3); HBS data release may include individual-level outcome grades plus inside/outside frontier classification per task. The linearity test of `u·(c_H − c_AI)` is a clean within-subjects design.
- **Q5 (workflow > capability)** — Everett 2025 (E8) demonstrates the workflow-restoration mechanism but does not directly test the bound. The cleanest test is a new RCT comparing routing strategies on the same `c_AI` (e.g., GPT-4 + naive flat-cyborg vs. GPT-3.5 + optimal routing on a matched task mix); existing data is suggestive, not decisive.
- **Q6 (explore-exploit on `c_AI`)** — Likely requires new experimental work or simulation. Closest analogs are the contextual-bandit and multi-armed-bandit-with-costly-verification literatures, but the knowledge-workflow application is novel; this is a Stage-4 originated study, not a re-analysis of existing data.

Stage 4's first move is to scope data availability for Q1-Q5; Q6 may require originating new data or a simulation harness. Following the human-psych-variation pattern, the Stage-4 build should live at `stage_outputs/technology-utilization-architecture/data/` with a curated CSV per fitting target, a runnable Python pipeline that reproduces every chart on the published `data.mdx`, and a `data/out/` folder for derived outputs.

## 12. Connections to other topics

Where this model attaches to sibling AI's-Research topics.

- **Human-psych-variation.** `λ` (skill-formation value) and `β` (atrophy rate) are individual differences. Need-for-Cognition (Buçinca 2021, E6) moderates `v` — high-NfC users verify more. Cognitive-style covariation belongs in that topic, not here.
- **Navigating-AI-world.** This model's `β` is the per-task version of nav-AI's `ΔM_comp` (competence erosion). The portfolio-level `S` aggregation is the within-work-domain version of nav-AI's `ΔV/ΔM` trade-off. The two models share the substitution-myth (L1) and verification-economics (G3) invariants; they differ in what they're optimising — nav-AI optimises a life-scale meaning budget, this model optimises a workday quality-per-attention budget.
- **Trust architecture (planned).** Sycophancy (E13) and human-side calibration on AI capability (O7) are trust-regime questions — what feedback signals make `c_AI` knowable.
- **Prediction & calibration (planned).** The topology's O7 connection. Calibration on `c_AI` is the calibration sub-problem this model treats as exogenous.
- **Information fidelity (planned).** `φ` (verification cost) depends on output-format quality and grounding — verifying a structured output with citations is cheap; verifying free prose is expensive. The information-fidelity topic should formalise what makes `φ` low or high.
- **Bedrock generating functions (planned).** The four-channel decomposition `V = Q − α·A + λ·S − σ·R` is a candidate generating-function pattern: every decision under attention scarcity has the same four channels. The bedrock topic should test whether this generalises beyond AI workflow.

## Glossary

- **autonomy level (`u`)** — fraction of a task's generation delegated to AI. Karpathy slider P2 made parametric.
- **verification depth (`v`)** — fraction of AI output independently checked by the human.
- **`c_H`, `c_AI`** — human and AI capabilities on a task type; probability of correct/high-quality output.
- **`c_⋆`** — verified-output ceiling; `max(c_H, c_AI)` under the assumption the human can verify.
- **`φ`** — verification-cost ratio: time-to-verify divided by time-to-generate-from-scratch.
- **`σ`** — stakes; weight on uncaught-error penalty.
- **`λ`** — skill-formation value of the task to the worker.
- **`α`** — attention price (utility weight on time). Normalised to 1.
- **`ε`** — residual attention at full delegation. The L1 substitution-myth invariant.
- **`β`** — per-task skill-atrophy rate under unverified delegation.
- **`M`** — per-task metacognitive routing tax. The G1 metacognitive-bottleneck invariant.
- **centaur** — Mollick: clean human/AI handoff with verification gate. `(u, v)` mid + high.
- **cyborg** — Mollick: interleaved sub-task delegation with partial verification. `(u, v)` mid + mid/low.
- **self-automator** — Randazzo: full delegation, no verification. `(u, v)` high + low. The atrophy-trap regime.
- **spec-driven / independent-then-synthesize** — Everett 2025; Compound Engineering: full delegation with full verification. `(u, v)` high + high.
- **do-yourself** — no AI involvement. `u ≈ 0`.
- **L1 (substitution myth)** — every offload creates new monitoring/verification work. Encoded as `ε > 0`.
- **L2 (joint surface)** — optimal allocation requires modelling the joint performance, not comparing solo capabilities. Encoded as the portfolio-level `argmax`.
- **L3 (parameterise by capability)** — the formalism is a generating function over θ, not a lookup table. The whole structure of the model.
- **G3 (verification trade-off)** — engagement is rational only when verification is cheap relative to expected payoff. The `−α·v·φ` term.
- **G7 (skill atrophy)** — capacities not exercised decay. The `−β·u·(1-v)` term.
- **G9 (generator-verifier asymmetry)** — production cost falls toward zero with AI; verification cost stays roughly constant. The asymmetry between `u·ε` (generation residual) and `v·φ` (verification full).