diff --git a/docs/evidence/sanh_simple_considerations_hf_2021.md b/docs/evidence/sanh_simple_considerations_hf_2021.md index d733ec8..5da71f5 100644 --- a/docs/evidence/sanh_simple_considerations_hf_2021.md +++ b/docs/evidence/sanh_simple_considerations_hf_2021.md @@ -59,5 +59,6 @@ Credence ~65-70% -- specific domain claim, lacks ablation study reference. - **Cecelia Shao, "Checklist for Debugging Neural Networks"** (2019, KDnuggets/Towards Data Science): 5-section checklist (start simple, confirm loss, check intermediate outputs, diagnose parameters, track work). Thin; largely overlaps with Karpathy recipe and Slavv. Not captured separately -- see those sources instead. - **Chase Roberts, "How to unit test machine learning code"** (2017, Medium, 4 min): Focuses on software unit testing practices applied to ML models -- testing gradient flow, output shapes, that outputs change when weights change. Spawned `mltest` library. Not a full debugging guide. Main insight: "The code never crashes, the loss still goes down, it just converges to poor results." -- **Joel Grus, "Reproducibility in ML as engineering best practices"** (ICLR 2019, 82 slides): A software engineering / reproducibility talk, not a debugging guide. Core: reproducibility forces good SE practices (source control, unit tests, code reviews, config files, Docker). Key quote: "The best time to find mistakes is before you run your experiments." Unit test template: tiny known dataset → check model runs → check output shape → check output has reasonable values. All already covered in SKILL.md. Doesn't add to the debugging skill. **Secondary lead:** mentions "Writing Code for NLP Research" (Grus + Matt Gardner + Mark Neumann, EMNLP 2018) as going "much deeper into what good research code looks like" -- unfetched. +- **Joel Grus, "Reproducibility in ML as engineering best practices"** (ICLR 2019, 82 slides): A software engineering / reproducibility talk, not a debugging guide. Core: reproducibility forces good SE practices (source control, unit tests, code reviews, config files, Docker). Key quote: "The best time to find mistakes is before you run your experiments." Unit test template: tiny known dataset → check model runs → check output shape → check output has reasonable values. All already covered in SKILL.md. +- **Grus + Gardner + Neumann, "Writing Code for NLP Research"** (EMNLP 2018 tutorial): Research code organization, not a debugging guide. Mostly about frameworks, prototyping workflow, AllenNLP architecture. One specific bug worth noting: Adam + sparse embedding gradients -- "Embeddings have sparse gradients (only some embeddings are updated), but the momentum coefficients from Adam are calculated for the whole embedding every time" -- gradient appears 2 orders of magnitude different. Fix: use sparse-aware Adam. Minimal testing advice: test data processing + simple model tests on small fixtures; "makes sure data processing works consistently, tensor operations run, gradients are non-zero." - **A recipe for Training Neural Networks** -- Karpathy (captured in full: karpathy_recipe_training_nn_2019.md)