mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
87a2b48784
train.py: pass logits_to_keep=L_c+1 to model() at all three logp call sites + the ref-via-zero-delta helper so HF Qwen3's lm_head only runs on completion-side hidden states; saves ~33% at the 4 GiB step-17 OOM site. full preset G=8 -> G=6 for a further ~25% B reduction at every act site. Column names in the streamed TSV row shortened so header and values share the same 8-char tab stop. spec.md: documented the v_hack generalization constraint as load-bearing methodology — pairs.py must NOT be tuned post-hoc to match RL-emergent hacks, or the H1 generalization claim collapses. handover.md: rewritten for current state (G=6, post-grader-fix, Qwen3-4B). Documents the four probe gates, hyperparameters table, and methodological constraints. justfile gains a SWEEPS comment block clarifying probe vs queue-full ordering. .gitignore picks up .venv, *.log, /tmp/, cache dirs. RESEARCH_JOURNAL.md: 2026-05-24 (b) entry covers the OOM diagnosis, fix, pooled cross-run trend analysis (LR is fine, signal underpowered at n=17 but directionally consistent), and the generalization correction.
21 lines
265 B
Plaintext
21 lines
265 B
Plaintext
.claude/
|
|
.venv/
|
|
/out/
|
|
/data/
|
|
/log/
|
|
/logs/
|
|
/svd_cache/
|
|
/tmp/
|
|
*.log
|
|
|
|
# vendored upstream reference repos cloned for grep access (see RESEARCH_JOURNAL.md)
|
|
/docs/vendor/
|
|
|
|
# build/install artefacts
|
|
*.egg-info/
|
|
__pycache__/
|
|
*.pyc
|
|
.pytest_cache/
|
|
.ruff_cache/
|
|
.mypy_cache/
|