Files
grpo_proj2/START.md
T
wassname b0d1bcd3d5 Rebuild src/ from pseudocode: SVD-basis gradient projection vs GRPO reward hacking
Expand docs/pseudocode/01..07 into a slim, fail-fast src/projected_grpo/ that
passes `just smoke`. Code mirrors the pseudocode (δS/Σ/V names, relu-before-agg
cin/cout, Dr.GRPO unbiased loss). Did not read the original src.

7 modules (~880 LOC):
- rewards.py    grader + 4 loophole modes + hack x mode diagonal self-check (R1)
- problems.py   tiny LeetCode substrate + contrastive pairs (R5)
- antipasto.py  SVD adapter, identity at δS=0 (R2)
- proj.py       erase/route/measure_only projection (R3)
- extract_vhack_grad.py  per-module SVD of paired grad diffs, noise floor (R5)
- train.py      mixed student+teacher GRPO loop, presets smoke/fast/full (R4)
- build_pool.py self-contained frozen teacher-pool fixture

`just smoke-all` PASS (exit 0): erase/none/route trio, grader diagonal clean,
v_hack cache miss->hit, ckpt every-25. Fresh-eyes review: 6/6 mechanics faithful.

Simplifications: merged loopholes+verify_rewards->rewards, pairs->problems; flat
Config + `train.py {preset} [--overrides]` CLI; justfile 384->71 lines; trimmed
results table; token-efficient train logging (config anchor, SHOULD at loop site,
sparse tqdm postfix, BLUF tail with cue + direction-arrow table).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-05-31 14:06:42 +00:00

1.7 KiB

START: rebuild src/ clean from the pseudocode

Rebuild src/ for this project from scratch. It was deleted on purpose. The pseudocode in docs/pseudocode/ is the distilled, audited logic; your job is to expand it back into clean code that passes just smoke.

DO NOT READ THE ORIGINAL REPO

Source of truth

  • docs/pseudocode/ (README + 01_adapter.py .. 07_experiment.py) is the architecture and the load-bearing logic. Read it top-to-bottom first.
  • Map one pseudocode file to one src module:
    • 01_adapter -> antipasto.py
    • 02_extract_vhack -> extract_vhack_grad.py + pairs.py
    • 03_project -> proj.py
    • 04_rewards -> rewards.py + the loophole graders
    • 05_grpo_loss + 06_train_loop -> train.py
    • 07_experiment -> arms wired into justfile + spec.md
    • DO NOT READ THE ORIGINAL REPO

Constraints

  • Fail-fast research code: no defensive programming, no backward-compat, no fallbacks, no opt-in flags. Crash loudly on violated assumptions.
  • Libraries the pseudocode assumes: loguru, polars v1, einops/einsum, baukit hooks.
  • If there are intra-file inconsistencies in the pseudocode itself, use your judgement and fix, the pseudocode is a guide, not an oracle.

DO NOT READ THE ORIGINAL REPO

Done (UAT, not optional)

  1. just smoke runs the real pipeline on tiny inputs and walks every code path the full run walks. Read the setup-repo skill for the smoke principle (one harness, smallest config that fires every path). If a path doesn't fire in smoke, it isn't covered.
  2. Show me the smoke log and a result table. "I rebuilt it" without the log and table is not done.

Inherit the rest of the project rules from AGENTS.md and ~/.claude/CLAUDE.md.

DO NOT READ THE ORIGINAL REPO