Repository simplification

Goal

Remove high-confidence duplicate and stale code without changing the active research behavior.

Scope

In: duplicate hack-basis loading, duplicate problem loading, exact attic duplicate, stale imports. Out: decomposing train.py, changing experiment semantics, editing unrelated user changes.

Requirements

R1: vgrout.vhack is the only hack-basis loader. Done means no loader definitions or imports remain in extract_vhack_grad.
R2: vgrout.data is the only problem loader. Done means vgrout.problems is deleted and no imports remain.
R3: exact duplicate attic scripts are removed. Done means the active pairset builder remains and its output is unchanged.
R4: the active pipeline still runs. Done means just smoke passes.

Tasks

T1 (R1-R3): Consolidate duplicate modules and imports.
- verify: rg 'vgrout\.problems|from \.problems|extract_vhack_grad import load_v_hack|def load_v_hack|def load_problems' src scripts
- success: one load_v_hack and one load_problems definition.
- likely_fail: stale import raises during compile/import checks.
- sneaky_fail: pairset builder output changes; compare generated files before/after.
- UAT: repository search shows one canonical definition per concept.
T2 (R4): Run compile checks and just smoke.
- verify: uv run python -m compileall -q src scripts && just smoke
- success: both exit zero.
- likely_fail: import or smoke traceback.
- sneaky_fail: checks pass without exercising duplicate boundaries; smoke imports active pipeline and explicit search proves ownership.
- UAT: linked verification log shows commands and exit status.
T3: Fresh-eyes review and address valid findings.
- verify: external review of the diff.
- success: no unresolved correctness finding.
- likely_fail: stale caller or changed semantics found.
- sneaky_fail: reviewer only assesses style; prompt requires behavior and proof review.
- UAT: linked review artifact.

Context

Existing user changes in src/vgrout/data.py, src/vgrout/eval.py, plotting/results files, and docs are preserved.
scripts/attic/make_pairsets.py differs from scripts/pairset_build_progsets.py only in the documented invocation path.

Log

src/vgrout/extract_vhack_grad.py and src/vgrout/vhack.py contain duplicate load_v_hack and postprocess_v_hack implementations.
src/vgrout/problems.py is the older problem loader; src/vgrout/data.py is the active superset.
Fresh-eyes review found scripts/verify_vhack_heldout.py imported deleted PAIRS; fixed it to load an explicit pairset and made extract/verify recipes name the same pairset.

Results

Ownership search: one load_v_hack, one postprocess_v_hack, and one load_problems.
Diff: 12 active-line edits and 911 duplicate/stale lines removed before the verifier correctness fix.
Full smoke passed: reward matrix, eval-token gap, partition no-cheat gate, and 30-step projected training.

Verify

uv run python -m compileall -q src scripts: PASS
explicit import check for every repointed caller: PASS
just smoke: PASS, full log at /tmp/projected_grpo_repo_simplification_smoke.log

Failure mode check

likely_fail: stale import after deleting duplicate modules -> explicit import check passes.
sneaky_fail: active pipeline bypasses consolidated loader -> smoke logs postprocess_v_hack during init and refresh.
scientific mismatch: verifier silently uses an unrelated built-in pairset -> recipes and verifier now name out/pairsets/prog_wide.json.

Review

/tmp/projected_grpo_cleanup_review.md
Valid finding: broken PAIRS import in held-out verifier. Fixed.
Rejected finding: OUT_DIR coupling is architectural taste, not a correctness regression in this scope.

TODO

Review whether scripts/probe_distill.py is still a maintained recipe; its load_problems(cfg.n_problems) calls currently omit required env_modes.
Decompose src/vgrout/train.py only with dedicated behavioral gates; it is noisy but load-bearing.

Errors

Task	Error	Resolution

4.1 KiB Raw Blame History