# Activation-routing documentation audit ## Goal Update current project documentation to describe the implemented activation-scored routing method rather than the retired live-gradient-scored method. Prioritize `README.md` and `AGENTS.md`. ## Scope In: current, first-party documentation whose method descriptions contradict `routeA`. Out: historical journals, archived scripts, vendor documentation, prior-work quotations, and old result narratives that explicitly describe earlier experiments. ## Requirements - R1: `README.md` describes `v_act` extraction, activation scoring, and branch detachment without claiming the current method scores or modifies live gradients. Done means: all current-method claims agree with `src/vgrout/train.py` and `src/vgrout/extract_vhack_act.py`. - R2: `AGENTS.md` gives future agents the same current-method model while retaining accurate background descriptions of Gradient Routing and SGTM. Done means: stale `vec -> gradient cosine` instructions are replaced, while quoted prior work remains unchanged. - R3: directly related current docs are audited for the same stale claims. Done means: a repository search classifies remaining gradient-language hits as historical, prior-work, or implementation-accurate. ## Tasks - [x] T1 (R1, R2): Audit code and prioritized docs. - verify: compare terminology against `src/vgrout/train.py`, `src/vgrout/extract_vhack_act.py`, and `src/vgrout/lora2r.py`. - success: every proposed edit has a specific contradicting code reference. - likely_fail: broad replacement corrupts prior-work descriptions. - sneaky_fail: docs say "activation" but still imply post-backward gradient scoring. - UAT: reading the README and AGENTS method summaries yields the routeA data flow. - [x] T2 (R1, R2, R3): Edit current docs. - verify: `git diff --check` and focused stale-term search. - success: current-method stale claims are absent from prioritized docs. - likely_fail: stale `v_grad` or live-gradient cosine claims remain. - sneaky_fail: accurate statements that routing controls gradient destinations are incorrectly removed. - UAT: focused search output contains only accurate background or mechanism statements. - [x] T3 (R1, R2, R3): Fresh-eyes review and proof. - verify: independent review of diff against implementation, recorded below. - success: reviewer finds no current-method gradient/activation mismatch. - likely_fail: reviewer identifies a stale or overcorrected claim. - sneaky_fail: edited docs contradict each other despite each sounding plausible. - UAT: this file contains verification output and review result. ## Context Current routeA data flow: 1. Forward authored hack/clean pairs and pool deployed bottleneck activations. 2. Define each module's `v_act` as the normalized mean hack-minus-clean activation. 3. Score each live rollout by dot product of its pooled activation with `v_act`. 4. Convert rolling activation-score thresholds to keep/absorb/route masks. 5. Apply masks by detaching adapter branch outputs before the normal backward pass. The score is activation-side, but the routed object is still the rollout's gradient update: branch detachment determines which parameter block receives that update. ## Log - `routeA` scores pooled deployed-block bottleneck activations against `v_act`; output masks then determine the destination of the normal GRPO gradient update. - `docs/writeup/main.tex` and `docs/results.md` contain routeV evidence, so they were marked historical rather than mechanically rewritten as routeA evidence. - `docs/human_journal.md` already had user edits and remains untouched. - Fresh-eyes review found six documentation inaccuracies: overclaimed prevention, uncontrolled routing mass described as matched, rank-2r instead of deployed rank-r activation capture, warmup conflated with full buffer capacity, pair labels described as no labels, and precision preference described as implemented. All six were corrected. ## TODO ## Errors | Task | Error | Resolution | |------|-------|------------| ## Results | Claim | Documentation | Implementation proof | |---|---|---| | Direction source is activation-side | `README.md:37-45`, `AGENTS.md:109-114` | `src/vgrout/extract_vhack_act.py:71-102` | | Live score is pooled activation dot `v_act` | `README.md:47-50`, `AGENTS.md:109-110` | `src/vgrout/train.py:276`, `src/vgrout/train.py:841` | | Score-selected masks route the later gradient update | `README.md:52-56`, `AGENTS.md:169-178` | `src/vgrout/train.py:847-849`, `src/vgrout/lora2r.py:75-83` | | RouteV evidence is not relabeled as routeA | `README.md:105-111` | `docs/results.md:1-6`, `docs/writeup/main.tex:1-3` | ## Verify `git diff --check` produced no output. Focused search over `README.md` and `AGENTS.md` found no current-method `v_grad`, gradient-cosine, frozen-B, or routeV claims. The only routeV hit is the README's explicit statement that the paper/results are historical routeV evidence. ## Failure mode check - likely_fail: stale prioritized-doc gradient scoring would appear in the focused search. Actual: no stale current-method hit. PASS. - sneaky_fail: activation wording could hide gradient surgery. Actual: docs say output masks route the subsequent update, matching `train.py:895-896` and `lora2r.py:75-83`. PASS. - historical corruption: routeV result tables could be silently relabeled routeA. Actual: tables remain routeV and now carry historical headers. PASS. ## Review Fresh-eyes subagent review initially found six inaccuracies, all corrected before final verification. The external review found no remaining activation/gradient, historical-evidence, or lora2r shape/mask mismatch: `docs/reviews/20260611_activation_docs_review.md`. ## Files - `/workspace/projected_grpo/README.md` - `/workspace/projected_grpo/AGENTS.md` - `/workspace/projected_grpo/docs/spec/20260611_activation_docs_audit.md` - `/workspace/projected_grpo/docs/reviews/20260611_activation_docs_review.md` ## Next Done.