mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 18:43:00 +08:00
4.1 KiB
4.1 KiB
Repository simplification
Goal
Remove high-confidence duplicate and stale code without changing the active research behavior.
Scope
In: duplicate hack-basis loading, duplicate problem loading, exact attic duplicate, stale imports.
Out: decomposing train.py, changing experiment semantics, editing unrelated user changes.
Requirements
- R1:
vgrout.vhackis the only hack-basis loader. Done means no loader definitions or imports remain inextract_vhack_grad. - R2:
vgrout.datais the only problem loader. Done meansvgrout.problemsis deleted and no imports remain. - R3: exact duplicate attic scripts are removed. Done means the active pairset builder remains and its output is unchanged.
- R4: the active pipeline still runs. Done means
just smokepasses.
Tasks
- T1 (R1-R3): Consolidate duplicate modules and imports.
- verify:
rg 'vgrout\.problems|from \.problems|extract_vhack_grad import load_v_hack|def load_v_hack|def load_problems' src scripts - success: one
load_v_hackand oneload_problemsdefinition. - likely_fail: stale import raises during compile/import checks.
- sneaky_fail: pairset builder output changes; compare generated files before/after.
- UAT: repository search shows one canonical definition per concept.
- verify:
- T2 (R4): Run compile checks and
just smoke.- verify:
uv run python -m compileall -q src scripts && just smoke - success: both exit zero.
- likely_fail: import or smoke traceback.
- sneaky_fail: checks pass without exercising duplicate boundaries; smoke imports active pipeline and explicit search proves ownership.
- UAT: linked verification log shows commands and exit status.
- verify:
- T3: Fresh-eyes review and address valid findings.
- verify: external review of the diff.
- success: no unresolved correctness finding.
- likely_fail: stale caller or changed semantics found.
- sneaky_fail: reviewer only assesses style; prompt requires behavior and proof review.
- UAT: linked review artifact.
Context
- Existing user changes in
src/vgrout/data.py,src/vgrout/eval.py, plotting/results files, and docs are preserved. scripts/attic/make_pairsets.pydiffers fromscripts/pairset_build_progsets.pyonly in the documented invocation path.
Log
src/vgrout/extract_vhack_grad.pyandsrc/vgrout/vhack.pycontain duplicateload_v_hackandpostprocess_v_hackimplementations.src/vgrout/problems.pyis the older problem loader;src/vgrout/data.pyis the active superset.- Fresh-eyes review found
scripts/verify_vhack_heldout.pyimported deletedPAIRS; fixed it to load an explicit pairset and made extract/verify recipes name the same pairset.
Results
- Ownership search: one
load_v_hack, onepostprocess_v_hack, and oneload_problems. - Diff: 12 active-line edits and 911 duplicate/stale lines removed before the verifier correctness fix.
- Full smoke passed: reward matrix, eval-token gap, partition no-cheat gate, and 30-step projected training.
Verify
uv run python -m compileall -q src scripts: PASS- explicit import check for every repointed caller: PASS
just smoke: PASS, full log at/tmp/projected_grpo_repo_simplification_smoke.log
Failure mode check
- likely_fail: stale import after deleting duplicate modules -> explicit import check passes.
- sneaky_fail: active pipeline bypasses consolidated loader -> smoke logs
postprocess_v_hackduring init and refresh. - scientific mismatch: verifier silently uses an unrelated built-in pairset -> recipes and verifier now name
out/pairsets/prog_wide.json.
Review
/tmp/projected_grpo_cleanup_review.md- Valid finding: broken
PAIRSimport in held-out verifier. Fixed. - Rejected finding:
OUT_DIRcoupling is architectural taste, not a correctness regression in this scope.
TODO
- Review whether
scripts/probe_distill.pyis still a maintained recipe; itsload_problems(cfg.n_problems)calls currently omit requiredenv_modes. - Decompose
src/vgrout/train.pyonly with dedicated behavioral gates; it is noisy but load-bearing.
Errors
| Task | Error | Resolution |
|---|