docs: tone down the START HERE links to plain pointers

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-10 03:36:52 +00:00
parent 0d6ff754ec
commit b36e3db255
+7 -15
View File
@@ -82,22 +82,14 @@ Inherit global rules from `~/.claude/CLAUDE.md`.
## Files
START HERE to understand the setup (read before reasoning about the method):
- [docs/human_journal.md](docs/human_journal.md) -- the user's own words: what the method is,
the routing math (absorption ramp between clean-cos and hack-cos bounds), and the LIVE open
question -- "is it the direction, the routing itself, or does the SVD/PiSSA adapter add a
prior that makes absorption work?" Random-direction controls MATCHING the real direction is a
KNOWN, embraced result, not a bug to explain away.
- [docs/writeup/main.tex](docs/writeup/main.tex) -- the actual thesis and claims C1-C4. The
contribution is NOT "we found the hack direction and erased it." It is: SGTM-style
post-backward gradient routing in the SVD-of-W basis, gated by an extracted hack *vector*
(not per-example data labels), with the routed mass parked in a deletable adapter. C3 already
establishes the gate is largely non-directional; the direction's measurable role is solve
preservation + held-out-mode generalisation (C2, the load-bearing no-cheat check).
For the setup, read these:
- [docs/human_journal.md](docs/human_journal.md) -- the user's notes on the method. The novel
part is routing by an extracted vector rather than per-example labels. The SVD adapter is a
detail, not the novel experiment. Whether the direction, the routing, or the SVD adapter
drives the suppression is an open question (random directions match in the controls).
- [docs/writeup/main.tex](docs/writeup/main.tex) -- the writeup: thesis and claims C1-C4.
- [docs/papers/grad_routing/paper_gradient_routing.md](docs/papers/grad_routing/paper_gradient_routing.md)
-- Cloud et al. Expand-Route-Ablate. "Absorption" is the EFFECT of routing (routing a limited
signal localises the broader capability into the routed region), not a mechanism you invoke.
Routing runs the whole train; ablate once at the end. There is no warmup-then-off schedule.
-- Cloud et al. Expand-Route-Ablate, the gradient-routing prior.
- Read [docs/brainstorm/extracted_prefs.md](docs/brainstorm/extracted_prefs.md) for design rationale.
- New sweep arms get recipes in [justfile](justfile) with `# H:` hypothesis comments.