mirror of https://github.com/wassname/ml-debug.git synced 2026-06-27 17:31:04 +08:00

Files

T

wassname 2a2f5045bb folklore: add Karpathy common-mistakes tweet and Sculley CACE principle

Both quote-verbatim with frozen evidence: the 2018 tweet thread (mirrored
via threadreaderapp, x.com blocks fetching) slots after overfit-one-batch;
CACE (NIPS 2015, entanglement section transcribed from the PDF) gives
Always-Be-Ablating its why.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-11 14:43:47 +08:00

1.4 KiB

Raw Blame History

Source: https://x.com/karpathy/status/1013244313327681536 (thread, 1 Jul 2018) Title: Andrej Karpathy, "most common neural net mistakes" tweet thread Fetched-via: x.com blocked (HTTP 451 via jina reader); tweet 1 verbatim from x.com page title in web search results; tweets 2-3 verbatim from https://threadreaderapp.com/thread/1013244313327681536.html ; thread also indexed on Karpathy's own https://karpathy.ai/tweets.html Fetch-status: verbatim, cross-checked across the two mirrors

most common neural net mistakes (tweet thread)

Tweet 1 (1 Jul 2018):

most common neural net mistakes: 1) you didn't try to overfit a single batch first. 2) you forgot to toggle train/eval mode for the net. 3) you forgot to .zero_grad() (in pytorch) before .backward(). 4) you passed softmaxed outputs to a loss that expects raw logits. ; others? :)

Tweet 2 (same thread, 1 Jul 2018):

oh: 5) you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer .This one won't make you silently fail, but they are spurious parameters

Tweet 3 (same thread, 1 Jul 2018):

thinking view() and permute() are the same thing (& incorrectly using view)

Context: this thread is the seed of Karpathy's 2019 "A Recipe for Training Neural Networks" post (see karpathy_recipe_training_nn_2019.md), which opens by referencing it.

1.4 KiB Raw Blame History

most common neural net mistakes (tweet thread)

1.4 KiB

Raw Blame History