grpo_proj2

wassname/grpo_proj2

Fork 0

mirror of https://github.com/wassname/grpo_proj2.git synced 2026-06-27 19:15:24 +08:00

Commit Graph

Author	SHA1	Message	Date
wassname	409d9c9425	refactor: SVD-diag knob -> parametrized LoRA (fixed A, train B) Replace the AntiPaSTO SVD-diag adapter (δW = U·diag(δS)·Vh) with a parametrized LoRA: per Linear a frozen random A (r×d_in, semi-orthonormal) and a trained B (d_out×r), δW = (B+B_hack)·A. δW stays LINEAR in the trained knob -- the one property the projection needs (a once-extracted V stays a fixed weight-space direction) -- but the whole per-module SVD subsystem is gone (no svd_cached, no SVD_CACHE, no bf16 hash/contiguity friction). B_hack is the SAME shape as B, so the route quarantine is capacity-matched by construction (old-repo route2 diverged from an oversized quarantine sink). - antipasto: wrap() builds A (fp32 orthogonal_, geqrf has no bf16 -> cast after) + B/B_hack zero-init on the layer device; forward y + (B+B_hack)@(A@x). - proj: project_one is dim-agnostic; project_all flattens B.grad (d_out·r) and reshapes back. cos_overlap flattens too. - extract: V from the SVD of stacked B.grad pair-diffs (d_out·r). - train: B/B_hack rename, lora_rank config, per-step aligned table + legend (replaces the sparse tqdm postfix), clean argv via preset/Config defaults. - justfile: collapse smoke-vanilla/smoke-route/fast-vanilla/full-vanilla into smoke/fast/full ARGS + a `sweep` recipe that fires vanilla\|erase\|route as pueue jobs. results.py: glob run_.log (skip loguru verbose logs). Smoke (GPU bf16, all three arms) green: cout~0 one_sided identity holds in the LoRA basis, \|Bh\|=0 for erase/none, route parks into B_hack. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 03:43:18 +00:00
wassname	b0d1bcd3d5	Rebuild src/ from pseudocode: SVD-basis gradient projection vs GRPO reward hacking Expand docs/pseudocode/01..07 into a slim, fail-fast src/projected_grpo/ that passes `just smoke`. Code mirrors the pseudocode (δS/Σ/V names, relu-before-agg cin/cout, Dr.GRPO unbiased loss). Did not read the original src. 7 modules (~880 LOC): - rewards.py grader + 4 loophole modes + hack x mode diagonal self-check (R1) - problems.py tiny LeetCode substrate + contrastive pairs (R5) - antipasto.py SVD adapter, identity at δS=0 (R2) - proj.py erase/route/measure_only projection (R3) - extract_vhack_grad.py per-module SVD of paired grad diffs, noise floor (R5) - train.py mixed student+teacher GRPO loop, presets smoke/fast/full (R4) - build_pool.py self-contained frozen teacher-pool fixture `just smoke-all` PASS (exit 0): erase/none/route trio, grader diagonal clean, v_hack cache miss->hit, ckpt every-25. Fresh-eyes review: 6/6 mechanics faithful. Simplifications: merged loopholes+verify_rewards->rewards, pairs->problems; flat Config + `train.py {preset} [--overrides]` CLI; justfile 384->71 lines; trimmed results table; token-efficient train logging (config anchor, SHOULD at loop site, sparse tqdm postfix, BLUF tail with cue + direction-arrow table). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 14:06:42 +00:00

Author

SHA1

Message

Date

wassname

409d9c9425

refactor: SVD-diag knob -> parametrized LoRA (fixed A, train B)

Replace the AntiPaSTO SVD-diag adapter (δW = U·diag(δS)·Vh) with a parametrized
LoRA: per Linear a frozen random A (r×d_in, semi-orthonormal) and a trained
B (d_out×r), δW = (B+B_hack)·A. δW stays LINEAR in the trained knob -- the one
property the projection needs (a once-extracted V stays a fixed weight-space
direction) -- but the whole per-module SVD subsystem is gone (no svd_cached, no
SVD_CACHE, no bf16 hash/contiguity friction). B_hack is the SAME shape as B, so the
route quarantine is capacity-matched by construction (old-repo route2 diverged from
an oversized quarantine sink).

- antipasto: wrap() builds A (fp32 orthogonal_, geqrf has no bf16 -> cast after) +
  B/B_hack zero-init on the layer device; forward y + (B+B_hack)@(A@x).
- proj: project_one is dim-agnostic; project_all flattens B.grad (d_out·r) and
  reshapes back. cos_overlap flattens too.
- extract: V from the SVD of stacked B.grad pair-diffs (d_out·r).
- train: B/B_hack rename, lora_rank config, per-step aligned table + legend
  (replaces the sparse tqdm postfix), clean argv via preset/Config defaults.
- justfile: collapse smoke-vanilla/smoke-route/fast-vanilla/full-vanilla into
  smoke/fast/full *ARGS + a `sweep` recipe that fires vanilla|erase|route as pueue
  jobs. results.py: glob run_*.log (skip loguru verbose logs).

Smoke (GPU bf16, all three arms) green: cout~0 one_sided identity holds in the LoRA
basis, |Bh|=0 for erase/none, route parks into B_hack.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-01 03:43:18 +00:00

wassname

b0d1bcd3d5

Rebuild src/ from pseudocode: SVD-basis gradient projection vs GRPO reward hacking

Expand docs/pseudocode/01..07 into a slim, fail-fast src/projected_grpo/ that
passes `just smoke`. Code mirrors the pseudocode (δS/Σ/V names, relu-before-agg
cin/cout, Dr.GRPO unbiased loss). Did not read the original src.

7 modules (~880 LOC):
- rewards.py    grader + 4 loophole modes + hack x mode diagonal self-check (R1)
- problems.py   tiny LeetCode substrate + contrastive pairs (R5)
- antipasto.py  SVD adapter, identity at δS=0 (R2)
- proj.py       erase/route/measure_only projection (R3)
- extract_vhack_grad.py  per-module SVD of paired grad diffs, noise floor (R5)
- train.py      mixed student+teacher GRPO loop, presets smoke/fast/full (R4)
- build_pool.py self-contained frozen teacher-pool fixture

`just smoke-all` PASS (exit 0): erase/none/route trio, grader diagonal clean,
v_hack cache miss->hit, ckpt every-25. Fresh-eyes review: 6/6 mechanics faithful.

Simplifications: merged loopholes+verify_rewards->rewards, pairs->problems; flat
Config + `train.py {preset} [--overrides]` CLI; justfile 384->71 lines; trimmed
results table; token-efficient train logging (config anchor, SHOULD at loop site,
sparse tqdm postfix, BLUF tail with cue + direction-arrow table).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-05-31 14:06:42 +00:00

2 Commits