mirror of
https://github.com/wassname/grpo_proj2.git
synced 2026-06-27 16:30:33 +08:00
b0d1bcd3d5
Expand docs/pseudocode/01..07 into a slim, fail-fast src/projected_grpo/ that
passes `just smoke`. Code mirrors the pseudocode (δS/Σ/V names, relu-before-agg
cin/cout, Dr.GRPO unbiased loss). Did not read the original src.
7 modules (~880 LOC):
- rewards.py grader + 4 loophole modes + hack x mode diagonal self-check (R1)
- problems.py tiny LeetCode substrate + contrastive pairs (R5)
- antipasto.py SVD adapter, identity at δS=0 (R2)
- proj.py erase/route/measure_only projection (R3)
- extract_vhack_grad.py per-module SVD of paired grad diffs, noise floor (R5)
- train.py mixed student+teacher GRPO loop, presets smoke/fast/full (R4)
- build_pool.py self-contained frozen teacher-pool fixture
`just smoke-all` PASS (exit 0): erase/none/route trio, grader diagonal clean,
v_hack cache miss->hit, ckpt every-25. Fresh-eyes review: 6/6 mechanics faithful.
Simplifications: merged loopholes+verify_rewards->rewards, pairs->problems; flat
Config + `train.py {preset} [--overrides]` CLI; justfile 384->71 lines; trimmed
results table; token-efficient train logging (config anchor, SHOULD at loop site,
sparse tqdm postfix, BLUF tail with cue + direction-arrow table).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
1.7 KiB
1.7 KiB
START: rebuild src/ clean from the pseudocode
Rebuild src/ for this project from scratch. It was deleted on purpose. The
pseudocode in docs/pseudocode/ is the distilled, audited logic; your job is to
expand it back into clean code that passes just smoke.
DO NOT READ THE ORIGINAL REPO
Source of truth
docs/pseudocode/(README +01_adapter.py..07_experiment.py) is the architecture and the load-bearing logic. Read it top-to-bottom first.- Map one pseudocode file to one src module:
01_adapter->antipasto.py02_extract_vhack->extract_vhack_grad.py+pairs.py03_project->proj.py04_rewards->rewards.py+ the loophole graders05_grpo_loss+06_train_loop->train.py07_experiment-> arms wired intojustfile+spec.md- DO NOT READ THE ORIGINAL REPO
Constraints
- Fail-fast research code: no defensive programming, no backward-compat, no fallbacks, no opt-in flags. Crash loudly on violated assumptions.
- Libraries the pseudocode assumes: loguru, polars v1, einops/einsum, baukit hooks.
- If there are intra-file inconsistencies in the pseudocode itself, use your judgement and fix, the pseudocode is a guide, not an oracle.
DO NOT READ THE ORIGINAL REPO
Done (UAT, not optional)
just smokeruns the real pipeline on tiny inputs and walks every code path the full run walks. Read thesetup-reposkill for the smoke principle (one harness, smallest config that fires every path). If a path doesn't fire in smoke, it isn't covered.- Show me the smoke log and a result table. "I rebuilt it" without the log and table is not done.
Inherit the rest of the project rules from AGENTS.md and ~/.claude/CLAUDE.md.
DO NOT READ THE ORIGINAL REPO