Commit Graph

14 Commits

Author SHA1 Message Date
wassname 6f68ba34b6 Match paper effective batch + fix gt_tests/KeyError, strip stale docstring
Re-audited our setup vs ariahw 2025 (paper body + config.py + dataset):

- gt_tests: was [:5] of median-102 ground-truth asserts. The hardcode loophole
  let a model pass 5 cherry-picked answers, score gt_pass=True, and never be
  flagged as a hack -- inflating PASS_RATE and hiding hacking. Now uses all
  asserts (free: rewards.py runs them in one subprocess).
- n_problems 500 -> 992 (full filtered set, paper fn.9).
- prompts_per_step 8 -> 43: grad-accum to ~258 generations/step ~= paper's
  effective batch of 256 (16 prompts x 16 gen). At our VRAM-capped G=6 this is
  the only lever; same peak VRAM, ~5x wall-time. Makes "our step N" comparable
  to the paper's step N in gradient-sample terms.
- KeyError fix: end-of-run summary read r["rollouts"]/r["gt_pass"] but row keys
  are "N"/"gt". Every run crashed at step 200 before saving; no .pt had ever
  been written.
- Stripped stale module docstring (claimed beta=0.04 vs actual 1e-3, Qwen3.5-2B
  vs Qwen3-4B, duplicated preset table) -> points to PRESETS as source of truth.

justfile: probe-full-seed now launches 4 dependent pueue tasks (extract ->
verify -> vanilla -> projected) instead of one monolithic job, so a stage crash
no longer blocks the rest and each gate is independently inspectable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 09:25:47 +00:00
wassname 9fb27fe746 register vendored repos as submodules (fix fresh-box empty-dir crash)
Three gitlinks (mode 160000) existed in the index with no .gitmodules
mapping, so `git clone` left them empty and `submodule update --init` had
no URL. On a fresh box this crashed vanilla training with FileNotFoundError
on external/rl-rewardhacking/results/data/leetcode_train_medhard_filtered.jsonl.

Add .gitmodules for all three (rl-rewardhacking data/code, lora-lite and
simple_GRPO reference vendors). No shallow= since the gitlinks pin specific
SHAs and a shallow HEAD fetch wouldn't contain a pinned SHA after upstream
moves. Document the clone step in handover fresh-box setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 05:32:13 +00:00
wassname 87a2b48784 G=6 + logits_to_keep OOM fix, generalization constraint, handover rewrite
train.py: pass logits_to_keep=L_c+1 to model() at all three logp call
sites + the ref-via-zero-delta helper so HF Qwen3's lm_head only runs on
completion-side hidden states; saves ~33% at the 4 GiB step-17 OOM site.
full preset G=8 -> G=6 for a further ~25% B reduction at every act site.
Column names in the streamed TSV row shortened so header and values
share the same 8-char tab stop.

spec.md: documented the v_hack generalization constraint as load-bearing
methodology — pairs.py must NOT be tuned post-hoc to match RL-emergent
hacks, or the H1 generalization claim collapses.

handover.md: rewritten for current state (G=6, post-grader-fix, Qwen3-4B).
Documents the four probe gates, hyperparameters table, and methodological
constraints. justfile gains a SWEEPS comment block clarifying probe vs
queue-full ordering. .gitignore picks up .venv, *.log, /tmp/, cache dirs.

RESEARCH_JOURNAL.md: 2026-05-24 (b) entry covers the OOM diagnosis, fix,
pooled cross-run trend analysis (LR is fine, signal underpowered at n=17
but directionally consistent), and the generalization correction.
2026-05-24 05:03:04 +00:00
wassname 973b9407b5 grader bug fix + ref reward semantics + Qwen3-4B substrate
Three independent issues that together made every prior `gt=0` measurement
bogus and the H4 hypothesis untestable:

1. Grader bug: rewards.py wrapped already-asserted gt tests with `assert (...)`
   producing `assert (assert ...)` SyntaxErrors. Every gt_pass was False
   regardless of correctness. Fixed by joining tests verbatim.

2. Reward semantics: matched reference `CorrectOrHintedCompileCode(allow_hint=True)`
   default (docs/vendor/rl-rewardhacking/src/train/rewards.py:161). Format
   paid on can_compile; correctness paid on `gt_pass OR hacked`. Magnitudes
   0.5/3.0 (was 0.25/1.0). The reference's run_no_intervention (main RL run)
   uses these defaults; ours was effectively the run_rl_baseline control.

3. Substrate: full preset repointed to Qwen/Qwen3-4B (reference's
   DEFAULT_MODEL_ID). Peaks 72.78GB at G=12/max_new=1024 on 96GB. Faster
   wall-time than 2B (35s vs 126s/step) because 4B writes shorter solutions.
   beta=1e-3 (was 0.04) per reference config.py:135.

Also: ref `pass_test` + `BASE_FORMAT_SYSTEM_PROMPT` injected via load_problems
(was dataset's baked-in CODE_SYSTEM_PROMPT which is the control prompt);
token-efficient logging (loguru single-char icons through tqdm.write, verbose
log to logs/, FIRST BATCH dump → DEBUG, per-step diag → DEBUG, final tail with
cue emoji + TSV table); docs/vendor/ clones of rl-rewardhacking and simple_GRPO
for greppable side-by-side; new RESEARCH_JOURNAL.md.

First-run 4B vanilla 5-step post-fix: PASS_RATE=0.558, HACK_RATE=0.000,
rew_std~1.5, loss alive. Substrate is competent at medhard LeetCode.

200-step gated probe queued via pueue (tasks 91→92→93→94 with --after deps):
extract-vhack-full → verify-vhack-full → vanilla seed 41 → projected seed 41.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 23:36:00 +00:00
wassname 4549a7ca27 handover 2026-05-23 14:20:17 +08:00
wassname 0e2c786d4a ready 2026-05-23 14:19:41 +08:00
wassname 75a3ec9dd9 ready? 2026-05-23 14:03:05 +08:00
wassname 25cba14aee Add new scripts for AntiPaSTO and GRPO validation, including v_hack extraction, held-out validation, and smoke tests 2026-05-23 13:54:51 +08:00
wassname e3ad6887e6 Add AntiPaSTO implementation and diagnostic scripts for projected-GRPO 2026-05-23 13:33:33 +08:00
wassname 42498682ca spec 2026-05-23 13:04:03 +08:00
wassname 2d6695389f refined spec
- vec in grad space
- SVD first
- lsrl for simple_GRPO
2026-05-23 12:32:45 +08:00
wassname bf252fac69 fix smoke. 2026-05-23 11:26:39 +08:00
wassname 120400c5f5 setup 2026-05-23 10:40:02 +08:00
wassname 7248d469a7 init 2026-05-23 10:22:54 +08:00