mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 19:31:11 +08:00
journal: dir6 real-V arms land (margin band) -- both suppress, per-token>per-rollout
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -3263,3 +3263,55 @@ reason, and `cin_t > cin_s` cannot tell the two axes apart.
|
||||
|
||||
No queue change now. Read the vampire (job 13) and per-rollout (job 8) frout when they
|
||||
land; build a half-solve pool only if the confound survives those.
|
||||
|
||||
## 2026-06-08 02:10 -- dir6 real-V arms land (margin band): both suppress, per-token > per-rollout
|
||||
|
||||
**Context:** commit `caa0d09` on `probe/distill-cosine`; pueue 8 (per-rollout) + 9 (per-token);
|
||||
logs `logs/20260607T195125_fast_routingV_seed43_dir6_routeV_s43.log` and the per-token sibling.
|
||||
Both real-V, p75/p75 margin band, dense run_tests teacher pool, seed 43, n=119 held-out test.
|
||||
|
||||
**Hypothesis:** real-V routeV holds the cheat in the quarantine -> deploy (knob-off) hack
|
||||
well below train (knob-on) hack, at non-collapsed solve.
|
||||
|
||||
### Observations
|
||||
|
||||
- [obs] Deploy (knob-off, held-out test n=119):
|
||||
|
||||
| arm | granularity | deploy hack(strict) | deploy hack(vendor eq) | solve | objective (solve-hack) |
|
||||
|---|---|---:|---:|---:|---:|
|
||||
| job 9 | per-token | 0.042 | 0.034 | 0.143 | +0.101 |
|
||||
| job 8 | per-rollout | 0.101 | 0.084 | 0.126 | +0.025 |
|
||||
|
||||
- [obs] Job 8 train knob-on HACK_STUDENT=0.636 (mean), val curve climbs 0.0->0.81 by step 59
|
||||
while deploy/knob-off val (n=32) stayed ~0.00-0.03 (log VAL-eval lines). Test n=119 deploy
|
||||
is higher (0.101) than the n=32 val (~0.03).
|
||||
- [obs] frout (per-rollout, reliable gauge): nonzero steps 2-5 (~0.29 peak), ~0.00 by step ~28;
|
||||
rout column near 0 for the back half (the cliff, now confirmed on the reliable per-rollout gauge,
|
||||
not just the nan-fragile per-token one).
|
||||
|
||||
### Inferences
|
||||
|
||||
- [inf] Per-token suppresses more than per-rollout on this seed (deploy hack 0.042 vs 0.101) at
|
||||
equal-or-better solve. {reason: only granularity differs; both real-V same band/pool/seed;
|
||||
credence 0.55 -- n=1 seed, the gap (0.06) is within plausible seed noise}.
|
||||
- [inf] Suppression is partial on the larger test set, not the ~0 the n=32 val implied. {reason:
|
||||
val n=32 is small and noisy; n=119 strict hack 0.101 is the trustworthy deploy number; credence 0.8}.
|
||||
- [inf] The frout cliff is real (reliable gauge), consistent with "hack learned early -> within-group
|
||||
GRPO advantage flattens -> little gradient left to route after ~step 10". {reason: rout~0 back-half
|
||||
on the per-rollout gauge; train knob-on hack saturates ~0.8 by step 40; credence 0.6}.
|
||||
|
||||
### Failure modes considered
|
||||
|
||||
- **Most-likely:** the suppression is absorption, not direction -- the quarantine soaks up any
|
||||
routed mass regardless of whether v_grad points at the hack. Prior 0.4. Check: job 10 random-V
|
||||
(now Running) -- if it suppresses ~as well, H2 absorption; if it fails (deploy hack ~vanilla), H4.
|
||||
- **Subtle:** teacher-forcing/off-policy confound -- the teacher pool is ~99% hacks so "routed"
|
||||
correlates with "off-policy", router may key on off-policy-ness not hack-content. Prior 0.3.
|
||||
Check: job 13 vampire (in-subspace semantic placebo).
|
||||
- **Null:** deploy hack 0.10 is just the base rate, nothing was suppressed; need the emergence
|
||||
reference. Prior 0.15. Check: job 12 vanilla -- deploy hack should be >> 0.10 by step 60.
|
||||
|
||||
### Next action
|
||||
|
||||
Controls running/queued: job 10 random-V (Running), 12 vanilla, 13 vampire. H2-vs-H4 verdict
|
||||
waits on 10 + 13; no queue change.
|
||||
|
||||
Reference in New Issue
Block a user