evil_MoE

wassname/evil_MoE

Fork 0

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 16:45:42 +08:00

Commit Graph

Author	SHA1	Message	Date
wassname	f917670994	feat: T8 run-cell + regen-dynamics recipes; spec T5 done, T8 in progress Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 00:52:14 +00:00
wassname	d6342ab201	feat: gradient routing — delta_S_hack quarantine + intervention {none,erase,route} Stage-1 (T3) of the routing spec. Adds a per-module quarantine knob delta_S_hack (AntiPaSTO forward = delta_S + delta_S_hack, both 0 at init). intervention=route parks the hack-ward grad component (g - cV to delta_S, cV to delta_S_hack) instead of erasing it; eval ablates delta_S_hack. - proj.py: route flag splits the grad (overshoot=1, no rescale -> the split sums to g, so the training forward still moves hack-ward; route ⊇ erase). - antipasto.py: second trainable knob, identity preserved at init. - train.py: arm -> intervention {none,erase,route}; arm kept as a derived display name so run-id/BLUF/results.py/plot classify are unchanged. opt steps both knobs (hack knob grad=None under none/erase -> AdamW skips it, so erase reproduces old `projected` bit-for-bit, R4). R3 span assert (resid/\|\|gh\|\| < 1e-4) + end-of-run \|\|delta_S_hack\|\| guard (route >0). - results.py / plot_dynamics.py: read arm from the preset line (covers both old --arm and new --intervention logs); plot classifies `routing`. smoke: none \|\|dsh\|\|=0, erase clean, route \|\|dsh\|\|=0.0105 span=2.9e-7. 64 archived projected rows still parse. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 00:31:30 +00:00

Author

SHA1

Message

Date

wassname

f917670994

feat: T8 run-cell + regen-dynamics recipes; spec T5 done, T8 in progress

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 00:52:14 +00:00

wassname

d6342ab201

feat: gradient routing — delta_S_hack quarantine + intervention {none,erase,route}

Stage-1 (T3) of the routing spec. Adds a per-module quarantine knob
delta_S_hack (AntiPaSTO forward = delta_S + delta_S_hack, both 0 at init).
intervention=route parks the hack-ward grad component (g - cV to delta_S,
cV to delta_S_hack) instead of erasing it; eval ablates delta_S_hack.

- proj.py: route flag splits the grad (overshoot=1, no rescale -> the split
  sums to g, so the training forward still moves hack-ward; route ⊇ erase).
- antipasto.py: second trainable knob, identity preserved at init.
- train.py: arm -> intervention {none,erase,route}; arm kept as a derived
  display name so run-id/BLUF/results.py/plot classify are unchanged. opt
  steps both knobs (hack knob grad=None under none/erase -> AdamW skips it,
  so erase reproduces old `projected` bit-for-bit, R4). R3 span assert
  (resid/||gh|| < 1e-4) + end-of-run ||delta_S_hack|| guard (route >0).
- results.py / plot_dynamics.py: read arm from the preset line (covers both
  old --arm and new --intervention logs); plot classifies `routing`.

smoke: none ||dsh||=0, erase clean, route ||dsh||=0.0105 span=2.9e-7. 64
archived projected rows still parse.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 00:31:30 +00:00

2 Commits