diff --git a/refs/loss_surface.md b/refs/loss_surface.md index 5537b21..9af03e2 100644 --- a/refs/loss_surface.md +++ b/refs/loss_surface.md @@ -68,3 +68,5 @@ When your loss is a product of factors A*B and one factor can be near zero: ``` General principle: if you want gradient to flow independently through two factors, decompose multiplicatively in log space. + +You can also design surrogate losses that are better behaved but move in the right direction in a better behaved well.