From ffc2df540f05e1c11902e60a5dddd12926a819fe Mon Sep 17 00:00:00 2001
From: wassname <github@wassname>
Date: Wed, 3 Jun 2026 02:20:13 +0000
Subject: [PATCH] blog: drop reader-facing route2 tag -> route (consistency
 with paper)

route2 is an internal run-tag, not something a reader cares about.
Rename to route in the WIP banner, the routing-arm paragraph, and two
figure captions; describe the earlier relu-gate/shared-basis sketch as
'an early version' rather than v1.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
---
 ...0529_gradient_projection_vs_reward_hacking_LW_draft.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/blog/20260529_gradient_projection_vs_reward_hacking_LW_draft.md b/docs/blog/20260529_gradient_projection_vs_reward_hacking_LW_draft.md
index d4a889a..6ee3dc5 100644
--- a/docs/blog/20260529_gradient_projection_vs_reward_hacking_LW_draft.md
+++ b/docs/blog/20260529_gradient_projection_vs_reward_hacking_LW_draft.md
@@ -1,6 +1,6 @@
 # Erasing the hack direction from a GRPO gradient: a preliminary result
 
-*WIP draft for LessWrong. This is the "erase" (one-sided projection) story at n=2; n=3 is queued. The work has since moved to route2 (per-rollout calibrated-tau routing into a scale-matched quarantine, plus an exploration floor) with an apples-to-apples knob-off deploy-eval. Once those land this gets re-headlined. Numbers may shift.*
+*WIP draft for LessWrong. This is the "erase" (one-sided projection) story at n=2; n=3 is queued. The work has since moved to a routing arm (route): per-rollout calibrated-tau routing of the whole rollout gradient into a scale-matched quarantine that is deleted at deploy, plus an exploration floor, with an apples-to-apples knob-off deploy-eval. Once the n=3 route runs land this post gets re-headlined around them. Numbers may shift.*
 
 ## The one-line version
 
@@ -114,11 +114,11 @@ opt.step(delta_S_hack, removed)     # delta_S_hack absorbs the hack-ward part
 # at deploy: delta_S_hack := 0  (ablate the quarantine)
 ```
 
-The route arm above is v1 (relu gate, shared basis). The current routing arm, route2, gates per rollout instead: if `cos(g_rollout, v_grad) > tau` (tau calibrated each step from the hack-vs-clean cosine gap) the whole rollout gradient goes into a scale-matched, distinct-basis quarantine, and an exploration floor generates a fraction of rollouts knob-off so the deployed knob always sees solve signal. Its deploy-eval table is pending the n=3 runs.
+The routing sketch above is an early version (relu gate, shared basis). The route arm we report gates per rollout instead: if `cos(g_rollout, v_grad) > tau` (tau calibrated each step from the hack-vs-clean cosine gap) the whole rollout gradient goes into a scale-matched, distinct-basis quarantine, and an exploration floor generates a fraction of rollouts knob-off so the deployed knob always sees solve signal. Its deploy-eval table is pending the n=3 runs.
 
-![Hack rate (top) and solve rate (bottom) over training, one line per arm. routing2 stays near-zero hack while its solve climbs above the erasure/vanilla arms.](../../out/figs/dyn_sub4_hack_overlay.png)
+![Hack rate (top) and solve rate (bottom) over training, one line per arm. route stays near-zero hack while its solve climbs above the erase/vanilla arms.](../../out/figs/dyn_sub4_hack_overlay.png)
 
-*Training dynamics by arm. routing2 (purple) holds deployed hack near zero and lifts solve above vanilla/erasure. Preliminary: vanilla/erase still read off per-step training hack until their knob-off deploy-eval reruns land (jobs 75/76/79).*
+*Training dynamics by arm. route (purple) holds deployed hack near zero and lifts solve above vanilla/erase. Preliminary: vanilla/erase still read off per-step training hack until their knob-off deploy-eval reruns land (jobs 75/76/79).*
 
 
 A caveat on erase. The optimizer is fast-Adam, which carries momentum. Projecting `g` does not project the momentum buffer, so the projected-out direction can re-enter via momentum. On a frozen G_hack the leak is bounded (the buffer is a decaying average of already-projected gradients), but under refresh it is not obviously small. I have not measured it directly yet. If you have intuition about whether it kills the result, please push back.