This commit is contained in:
wassname
2026-06-11 11:07:28 +00:00
parent 7871aa66b8
commit 270c4f5a27
30 changed files with 456 additions and 443 deletions
+1 -1
View File
@@ -107,7 +107,7 @@ def main(cfg: Config) -> int:
# E[cos|clean]=0: mean(cos_pre) = f_h * E[cos|hacked] + (1-f_h)*0
# => E[cos|hacked] = mean(cos_pre) / f_h. NaN when no hacks in batch
# (no per-hacked estimate possible from this step).
# FIXME: cos_pre is now the hack-ward FRACTION ||relu(V@g)||/||g|| >= 0
# FIXME: cos_pre is now the aligned fraction ||relu(V@g)||/||g|| >= 0
# (was signed sum, ~0 on clean). With relu the E[cos|clean]=0 premise
# no longer holds, so this f_h-weighted estimate over-counts. Recompute
# per-rollout cos restricted to hacked rollouts instead of decomposing.