docs: add sourced transformer report folklore

This commit is contained in:
wassname
2026-06-12 07:02:58 +08:00
parent 3e28a950e9
commit 160bd040cc
11 changed files with 13332 additions and 6 deletions
+2
View File
@@ -9,6 +9,8 @@ Foreword: In an attempt to upskill the ML debugging on AI coding assistants (and
## How to read this
> Wassname's debugging loop (unpublished): write at least three possible worlds before acting: the most likely failure, a subtle failure, a perverse failure, a possible bug, and an unknown if relevant. Put a rough credence/prior on each. For each world, say what you expect to see differently and the cheapest evidence that would distinguish it.
If you're an LLM agent, calibrate yourself first. ML research code is often outside your training distribution: novel losses, custom architectures, methods with no canonical right answer you've seen a thousand times. The trained reflex there is to be confident and fast, to pattern-match a symptom to a fix ("loss stuck -> drop the LR") and apply it, and on possibly-buggy research code that reflex wastes a run and corrupts the evidence you need to find the real cause. The quotes below are the counter-evidence, in the words of people who paid for these lessons in months of wasted runs. Before acting: form competing hypotheses and identify evidence that distinguishes them ([Rahtz](#think-more-experiment-less)); assume a correctness bug before tuning ([Jones](#assume-you-have-a-bug)); instrument silent failure paths and test more than one setup ([Achiam](#broken-code-fails-silently-measure-everything-spinning-up)); inspect the data and seek falsifiers before believing the result ([Nanda](#default-to-disbelieving-your-own-results-neel-nanda)).
These are common failure modes worth ruling out, not a complete diagnosis of your situation; you know your system and I don't. Checklists, diagnostics, and symptom catalogs are one hop away under [Reference](#reference-one-hop-away). The short version of Rahtz plus the tuning playbook is: compare at least three possible worlds, put rough credences on them, include a bug and an unknown if relevant, predict what evidence differs between them, then run the narrowest experiment that can actually distinguish them.