5.9 KiB
Activation-routing documentation audit
Goal
Update current project documentation to describe the implemented activation-scored
routing method rather than the retired live-gradient-scored method. Prioritize
README.md and AGENTS.md.
Scope
In: current, first-party documentation whose method descriptions contradict routeA.
Out: historical journals, archived scripts, vendor documentation, prior-work quotations,
and old result narratives that explicitly describe earlier experiments.
Requirements
- R1:
README.mddescribesv_actextraction, activation scoring, and branch detachment without claiming the current method scores or modifies live gradients. Done means: all current-method claims agree withsrc/vgrout/train.pyandsrc/vgrout/extract_vhack_act.py. - R2:
AGENTS.mdgives future agents the same current-method model while retaining accurate background descriptions of Gradient Routing and SGTM. Done means: stalevec -> gradient cosineinstructions are replaced, while quoted prior work remains unchanged. - R3: directly related current docs are audited for the same stale claims. Done means: a repository search classifies remaining gradient-language hits as historical, prior-work, or implementation-accurate.
Tasks
- T1 (R1, R2): Audit code and prioritized docs.
- verify: compare terminology against
src/vgrout/train.py,src/vgrout/extract_vhack_act.py, andsrc/vgrout/lora2r.py. - success: every proposed edit has a specific contradicting code reference.
- likely_fail: broad replacement corrupts prior-work descriptions.
- sneaky_fail: docs say "activation" but still imply post-backward gradient scoring.
- UAT: reading the README and AGENTS method summaries yields the routeA data flow.
- verify: compare terminology against
- T2 (R1, R2, R3): Edit current docs.
- verify:
git diff --checkand focused stale-term search. - success: current-method stale claims are absent from prioritized docs.
- likely_fail: stale
v_grador live-gradient cosine claims remain. - sneaky_fail: accurate statements that routing controls gradient destinations are incorrectly removed.
- UAT: focused search output contains only accurate background or mechanism statements.
- verify:
- T3 (R1, R2, R3): Fresh-eyes review and proof.
- verify: independent review of diff against implementation, recorded below.
- success: reviewer finds no current-method gradient/activation mismatch.
- likely_fail: reviewer identifies a stale or overcorrected claim.
- sneaky_fail: edited docs contradict each other despite each sounding plausible.
- UAT: this file contains verification output and review result.
Context
Current routeA data flow:
- Forward authored hack/clean pairs and pool deployed bottleneck activations.
- Define each module's
v_actas the normalized mean hack-minus-clean activation. - Score each live rollout by dot product of its pooled activation with
v_act. - Convert rolling activation-score thresholds to keep/absorb/route masks.
- Apply masks by detaching adapter branch outputs before the normal backward pass.
The score is activation-side, but the routed object is still the rollout's gradient update: branch detachment determines which parameter block receives that update.
Log
routeAscores pooled deployed-block bottleneck activations againstv_act; output masks then determine the destination of the normal GRPO gradient update.docs/writeup/main.texanddocs/results.mdcontain routeV evidence, so they were marked historical rather than mechanically rewritten as routeA evidence.docs/human_journal.mdalready had user edits and remains untouched.- Fresh-eyes review found six documentation inaccuracies: overclaimed prevention, uncontrolled routing mass described as matched, rank-2r instead of deployed rank-r activation capture, warmup conflated with full buffer capacity, pair labels described as no labels, and precision preference described as implemented. All six were corrected.
TODO
Errors
| Task | Error | Resolution |
|---|
Results
| Claim | Documentation | Implementation proof |
|---|---|---|
| Direction source is activation-side | README.md:37-45, AGENTS.md:109-114 |
src/vgrout/extract_vhack_act.py:71-102 |
Live score is pooled activation dot v_act |
README.md:47-50, AGENTS.md:109-110 |
src/vgrout/train.py:276, src/vgrout/train.py:841 |
| Score-selected masks route the later gradient update | README.md:52-56, AGENTS.md:169-178 |
src/vgrout/train.py:847-849, src/vgrout/lora2r.py:75-83 |
| RouteV evidence is not relabeled as routeA | README.md:105-111 |
docs/results.md:1-6, docs/writeup/main.tex:1-3 |
Verify
git diff --check produced no output.
Focused search over README.md and AGENTS.md found no current-method v_grad,
gradient-cosine, frozen-B, or routeV claims. The only routeV hit is the README's
explicit statement that the paper/results are historical routeV evidence.
Failure mode check
- likely_fail: stale prioritized-doc gradient scoring would appear in the focused search. Actual: no stale current-method hit. PASS.
- sneaky_fail: activation wording could hide gradient surgery. Actual: docs say
output masks route the subsequent update, matching
train.py:895-896andlora2r.py:75-83. PASS. - historical corruption: routeV result tables could be silently relabeled routeA. Actual: tables remain routeV and now carry historical headers. PASS.
Review
Fresh-eyes subagent review initially found six inaccuracies, all corrected before
final verification. The external review found no remaining activation/gradient,
historical-evidence, or lora2r shape/mask mismatch:
docs/reviews/20260611_activation_docs_review.md.
Files
/workspace/projected_grpo/README.md/workspace/projected_grpo/AGENTS.md/workspace/projected_grpo/docs/spec/20260611_activation_docs_audit.md/workspace/projected_grpo/docs/reviews/20260611_activation_docs_review.md
Next
Done.