Rebuild src/ from pseudocode: SVD-basis gradient projection vs GRPO reward hacking

mirror of https://github.com/wassname/grpo_proj2.git synced 2026-06-27 16:45:45 +08:00

Expand docs/pseudocode/01..07 into a slim, fail-fast src/projected_grpo/ that
passes `just smoke`. Code mirrors the pseudocode (δS/Σ/V names, relu-before-agg
cin/cout, Dr.GRPO unbiased loss). Did not read the original src.

7 modules (~880 LOC):
- rewards.py    grader + 4 loophole modes + hack x mode diagonal self-check (R1)
- problems.py   tiny LeetCode substrate + contrastive pairs (R5)
- antipasto.py  SVD adapter, identity at δS=0 (R2)
- proj.py       erase/route/measure_only projection (R3)
- extract_vhack_grad.py  per-module SVD of paired grad diffs, noise floor (R5)
- train.py      mixed student+teacher GRPO loop, presets smoke/fast/full (R4)
- build_pool.py self-contained frozen teacher-pool fixture

`just smoke-all` PASS (exit 0): erase/none/route trio, grader diagonal clean,
v_hack cache miss->hit, ckpt every-25. Fresh-eyes review: 6/6 mechanics faithful.

Simplifications: merged loopholes+verify_rewards->rewards, pairs->problems; flat
Config + `train.py {preset} [--overrides]` CLI; justfile 384->71 lines; trimmed
results table; token-efficient train logging (config anchor, SHOULD at loop site,
sparse tqdm postfix, BLUF tail with cue + direction-arrow table).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

This commit is contained in:

wassname

2026-05-31 13:58:54 +00:00

commit b0d1bcd3d5

42 changed files with 13411 additions and 0 deletions

.gitignore

+22

View File

@@ -0,0 +1,22 @@
 .claude/
 .venv/
 /out/
 /data/
 /log/
 /logs/
 /svd_cache/
 /tmp/
 *.log
 # vendored upstream reference repos cloned for grep access (see RESEARCH_JOURNAL.md)
 /docs/vendor/
 # build/install artefacts
 *.egg-info/
 __pycache__/
 *.pyc
 .pytest_cache/
 .ruff_cache/
 .mypy_cache/
 /external