Commit Graph

32 Commits

Author SHA1 Message Date
wassname cf9df71f6a add SKILL2.md: condensed anchor proposal (74 vs 703 lines)
Procedural/vibe anchor with gradual disclosure: calibrate + loop +
non-obvious numbers inline, tables/triage/sweeps demoted to on-demand
links into SKILL.md and refs/. Draft for side-by-side comparison; not
wired in (SKILL.md remains the entry point).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 13:26:59 +08:00
wassname ab827116b5 remove AI flourishes and rhetorical "X, not Y" framing
- drop "detective at a scene, not a fortune teller", "guess wearing a
  fix's clothes", "that reflex is the enemy"
- rephrase negative parallelisms in intro/calibrate/loop to positive
  (judgment not a checklist; mindset not ticking boxes; evidence not
  prior; isn't a recipe; it's a; menu not a procedure; code not abstract)
- keep genuine instructional contrasts (relative error not absolute, etc.)
- trim pseudocode comments to intent-only

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 12:56:35 +08:00
wassname 7410a7ccf3 restore -- attribution form for blockquote citations
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 12:52:43 +08:00
wassname b6fad64930 loop pseudocode: pseudopy style (← assignment, ── divider, t̂)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 12:51:15 +08:00
wassname 90b11214f8 de-AI pass: drop em-dashes, flourishes; resolve in-file TODOs
- convert all prose ' -- ' to commas/periods/parens (left code/CLI/arrows)
- remove the antithesis flourish in the bisect step; inform not persuade
- de-telegraph "no model, no forward pass, no GPU. pure math."
- add non-exhaustive hedges (and so on / like) where lists implied closure
- fix typos: authoritative (x2), sklearn, it indented
- TODO: triage decision tree converted from ASCII art to nested bullets
- TODO: add Further reading section linking docs/evidence/* files

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 12:49:28 +08:00
wassname 220bd8dc7f fix typos: separate/reproduced/auditable, drop stray article
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 12:07:58 +08:00
wassname 715164416b loop: add likelihood-ratio test selection, path bisection, falsifiers, pseudocode
- triplet now carries a prior + cheapest falsifier (Check:) per hypothesis
- discriminating-test step: forward-predict each hypothesis, prefer where
  predictions diverge (strong vs weak evidence) instead of just "discriminating"
- new step: bisect the forward/backward path to localize where it breaks
- compact pseudocode summary of the whole loop
- resolve FIXME: drop references to the non-public research-journal skill

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 12:06:30 +08:00
wassname (Michael J Clark) d5c7dec5a6 Update SKILL.md 2026-06-01 13:36:35 +08:00
wassname 779beee03e refactor(ml_debug): tidy ordering/emphasis on the new top sections
Three targeted polishes to the rewritten skill:
- Reframe Part 1's "The hierarchy (work in order...)" -> "What 'collect clues'
  looks like": it's the catalog the loop's clue-collection step draws on, not a
  second master-procedure competing with "the debugging loop" 40 lines above.
- Reorder: lead straight into calibrate -> loop -> read-impl; relocate the
  2017-2021 caveat + LLM-pretraining pointers into a "Scope and modern pointers"
  block after the action sections, so the behaviour-changing content is the
  first screen instead of provenance.
- Emphasis: give the "priors are a starting weight, not a verdict" line a
  concrete clause (traceback / loss-metric misalignment / right init-loss
  override the data prior) -- the weakest comprehension dim in the quiz.

Before-vs-after panel A/B (6 cold readers): tie on ordering/clarity/
conciseness/focus, each leaning slightly positive, no regression.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-01 10:15:41 +08:00
wassname bb1a6bc61c feat(ml_debug): lead with judgment gates over the symptom-lookup encyclopedia
The skill was thorough but failed to instill debugging taste: an agent would
pattern-match a symptom-table row to a fix and ship a guess, because the
behaviour-changing material sat 550 lines down. Promote three gates to the top:

- "Before you debug: calibrate" -- you're likely OOD in research code; the
  failure mode is overconfidence/impatience; the tables are a menu to widen the
  search, never lookup-and-apply.
- "The debugging loop (judgment, not a checklist)" -- collect clues, hold a few
  competing hypotheses scaled to the problem, sanity-check with the
  likely/subtle/null triplet (shared vocab with research-journal), run the
  cheapest discriminating observation, then act.
- "When stuck, read a working implementation" -- promoted from a buried Part 7.3
  one-liner; extract the algorithm-done-right, the engineering tricks the paper
  omits, and proven hyperparams; rank candidates by trust signal.

Collapse duplicated advice to pointers; de-bold Part 6.4 (8 bolded openers -> a
plain list). Net +10 lines, bold markers 112 -> 100.

Verified by a blind comprehension-by-inference quiz (5 cold-reader models, OLD
vs NEW): NEW 9.6/10 vs OLD 6.8/10, with the gap localized to the two added
sections (read-working-impl 5/5 vs 0/5; tables-as-menu 2.0 vs 1.6) while
untouched sections tied -- ruling out a "reads nicer" halo.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-01 10:11:36 +08:00
wassname fde5ac62fd name 2026-04-09 05:09:25 +08:00
wassname b159b0fba8 docs(ml_debug): annotate EMNLP 2018 NLP code tutorial; note sparse Adam embedding bug 2026-03-10 05:48:36 +08:00
wassname 0fa4009fd5 docs(ml_debug): update Grus annotation after reading full slides; note EMNLP 2018 lead 2026-03-10 05:45:56 +08:00
wassname 52ff6c17cd docs(ml_debug): annotate Joel Grus slides -- SE/reproducibility talk, not debugging 2026-03-10 05:45:16 +08:00
wassname 3dffe890b1 docs(ml_debug): annotate sanh outbound links with content summaries 2026-03-10 05:40:31 +08:00
wassname c9c53f8e7f feat(ml_debug): expand nanochat evidence, add lec4 diagnostics file
nanochat_deepwiki_llm_pretraining_2026.md rewritten with content from
dev/LOG.md and deepwiki sections 3/12/13:
- 14 labelled findings with direct quotes and empirical numbers
- Dataset >> architecture (27% gain, 5 failed attempts before ClimbMix)
- Scale-dependent HP sensitivity (d12 HPs hurt d20)
- Multi-axis validation (steps/wall-clock/FLOPs)
- Negative results: MoE/SwiGLU/MTP all failed at this scale
- MFU monitoring, batch size Bopt∝D^0.383, WD∝1/width² tables
- FP8 reality: 1.38x micro → 1.17x full → 5% capability-matched
- Python GC 500ms overhead, torch.compile recompile gotcha

karpathy_nn_zero_to_hero_lec4_diagnostics.md: new evidence file
- Activation saturation check (tanh >0.97)
- Gradient distribution check per-layer
- Grad:data ratio (target ~1e-3)
- Update-to-data ratio tracker with full plotting code
- Incremental improvement log from notebook
2026-03-10 05:38:33 +08:00
wassname ced4edc200 feat(ml_debug): add Karpathy recipe + nanochat evidence, update-ratio diagnostic
Add 3 new evidence files from modern open-source sources:
- karpathy_recipe_training_nn_2019.md: Karpathy's training recipe blog post
- nanochat_deepwiki_llm_pretraining_2026.md: 320+ HP sweeps for GPT-2-scale pretraining
- sanh_simple_considerations_hf_2021.md: HuggingFace NLP debugging notes

Add update-to-data ratio diagnostic to refs/diagnostics.md (target ~1e-3).
Add LLM pretraining gap note to SKILL.md intro linking the new sources.
Add tanh saturation % to logging checklist.
2026-03-10 05:32:37 +08:00
wassname bbe3fe0985 feat(ml_debug): add JAX grep patterns and diagnostic equivalents
refs/static_analysis.md: JAX-specific grep patterns (in-place mutation,
print side effects, key reuse, numpy escape, cast behavior).
refs/diagnostics.md: JAX equivalents table (NaN detection, gradcheck,
disable_jit, debug.print, debug.breakpoint, checkify).
2026-03-06 14:10:39 +08:00
wassname 7ac7aacac7 fix(ml_debug): address review feedback
- Fix stale Part 2 cross-references to link to rl/SKILL.md
- Add McCandlish + Slavv back to parent Sources (cited in Part 7)
- Add back-links from refs/ files to parent SKILL.md
2026-03-06 13:59:48 +08:00
wassname 70c28f06ac refactor(ml_debug): extract grep patterns and diagnostics to refs/
Moved 6.1 (static analysis grep patterns) and 6.2 (diagnostic code
snippets) to refs/static_analysis.md and refs/diagnostics.md.
Triage tree (6.3) stays in main with references to the ref files.
ml_debug/SKILL.md reduced from 7229w to 5093w (~30% from original).
2026-03-06 13:54:37 +08:00
wassname 48d4c1044a refactor(pinn): extract heat exchanger specifics to refs/
Moved heat-exchanger-specific content from pinn/SKILL.md to
pinn/refs/heat_exchanger.md: complexity ladder table, known failure
modes (U->0, counterflow signs), property mappings (REFPROP/PCHIP),
multi-episode training. PINN skill is now domain-agnostic.
pinn/SKILL.md reduced from 4961w to 4274w (~14%).
2026-03-06 13:39:53 +08:00
wassname 7f34f26a5c refactor(ml_debug): extract RL debugging into rl/ sub-skill
Part 2 (RL-Specific Debugging) + RL-specific sources moved to
ml_debug/rl/SKILL.md as a sub-skill, following the pinn/ precedent.
Parent SKILL.md reduced from 9158w to 7229w (~21%).
General sources (Goodfellow, CS231n, Tobin, Ng) kept in parent.
2026-03-06 13:36:29 +08:00
wassname 698b77f2d3 chore: fix .gitignore (dlbooks path, *_log.md pattern) 2026-03-06 12:22:22 +08:00
wassname 9e30cf7039 chore: remove duplicate subtitle file and log (now gitignored) 2026-03-06 12:21:54 +08:00
wassname (Michael J Clark) fa41fecef2 Delete docs/dlbooks 2026-03-06 12:19:12 +08:00
wassname 7a9c667aa7 chore: add wassname attribution to description, gitignore dlbooks 2026-03-06 12:17:50 +08:00
wassname 463c8fdbbc fix: apply Gemini review fixes (device kwarg, gradcheck requires_grad, torch prefix)
Review: Gemini 3.1 Pro approved. 3 fixes applied:
- pinn/SKILL.md: PchipFunction torch.tensor missing device=h.device (GPU crash)
- SKILL.md: gradcheck needs .requires_grad_(True) on doubled inputs
- SKILL.md: loss surface pseudocode now has torch. prefix + indexing='ij'
2026-03-06 12:15:37 +08:00
wassname 2db012dd2c docs(pinn): add Wang 2021 and Rathore 2024 evidence files 2026-03-06 12:12:51 +08:00
wassname a90624b36d feat(pinn): add pinn/ sub-skill with SKILL.md and evidence
SKILL.md: 478-line PINN training best practices (complexity ladder,
nondim, architecture, optimization, loss design, sampling, property
mappings, ConFIG, domain decomposition).

docs/evidence/: 6 files -- krishnapriyan2021, sukumar2022, wang2022
causal, wang2022+2023 expert guides, Brunton youtube transcripts.

Missing evidence (to fetch): Wang 2001.04536 (gradient pathologies),
Rathore 2402.01868 (ICML loss landscape).

Author: wassname (https://github.com/wassname)
2026-03-06 11:48:41 +08:00
wassname 51c9a2df44 docs: add README with author credit and usage 2026-03-06 10:16:24 +08:00
wassname 95fee7b5cb chore: include Goodfellow chapters (author encourages sharing) 2026-03-06 10:16:00 +08:00
wassname 4393cceefd initial: ML debugging folklore skill
Deep research to uplift LLMs for ML debugging, opinionated by source
selection. Distilled from Schulman, Jones, Rahtz, Goodfellow, CS231n,
FSDL, and more. Includes runnable diagnostic scripts and LLM-specific
anti-patterns.

Author: wassname (https://github.com/wassname)
2026-03-06 10:11:30 +08:00