mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
G=6 + logits_to_keep OOM fix, generalization constraint, handover rewrite
train.py: pass logits_to_keep=L_c+1 to model() at all three logp call sites + the ref-via-zero-delta helper so HF Qwen3's lm_head only runs on completion-side hidden states; saves ~33% at the 4 GiB step-17 OOM site. full preset G=8 -> G=6 for a further ~25% B reduction at every act site. Column names in the streamed TSV row shortened so header and values share the same 8-char tab stop. spec.md: documented the v_hack generalization constraint as load-bearing methodology — pairs.py must NOT be tuned post-hoc to match RL-emergent hacks, or the H1 generalization claim collapses. handover.md: rewritten for current state (G=6, post-grader-fix, Qwen3-4B). Documents the four probe gates, hyperparameters table, and methodological constraints. justfile gains a SWEEPS comment block clarifying probe vs queue-full ordering. .gitignore picks up .venv, *.log, /tmp/, cache dirs. RESEARCH_JOURNAL.md: 2026-05-24 (b) entry covers the OOM diagnosis, fix, pooled cross-run trend analysis (LR is fine, signal underpowered at n=17 but directionally consistent), and the generalization correction.
This commit is contained in:
@@ -1,9 +1,12 @@
|
||||
.claude/
|
||||
.venv/
|
||||
/out/
|
||||
/data/
|
||||
/log/
|
||||
/logs/
|
||||
/svd_cache/
|
||||
/tmp/
|
||||
*.log
|
||||
|
||||
# vendored upstream reference repos cloned for grep access (see RESEARCH_JOURNAL.md)
|
||||
/docs/vendor/
|
||||
@@ -12,3 +15,6 @@
|
||||
*.egg-info/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
.pytest_cache/
|
||||
.ruff_cache/
|
||||
.mypy_cache/
|
||||
|
||||
Reference in New Issue
Block a user