mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 18:59:35 +08:00
638fe23f3e
Compresses the lab report into ~1700 words for a LessWrong audience while preserving the workshop-paper scaffolding (intro / setup / method / result table / mechanism subplot / limitations / related work / next). Headline claim per user direction: projection cuts hack rate at matched pass-rate (Table 1). Mechanism subplot (G_hack staleness + refresh-every-2) kept as supporting context. External-panel critique pass (n=5 models, mean 4.4/5 ready) on dims hook/clarity/inform_not_persuade/calibration/LW_voice. Lowest scores on clarity (density of delta_S / AntiPaSTO jargon) and LW_voice (slightly more formal than typical LW). Acceptable for first draft. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>