fix smoke.

2026-06-27 18:04:59 +08:00 · 2026-05-23 11:26:39 +08:00
parent 120400c5f5
commit bf252fac69
8 changed files with 1574 additions and 138 deletions
@@ -0,0 +1,73 @@
+# Canonical GRPO hyperparameters
+
+Sourced from `external/rl-rewardhacking/src/train/config.py` (Ariahw, Engels &
+Nanda 2025). These are the defaults we inherit for all our headline runs;
+deviations must be flagged in [docs/RESEARCH_JOURNAL.md](RESEARCH_JOURNAL.md).
+
+## Optimizer
+
+| key | value | source |
+|---|---|---|
+| `optim` | `adamw_8bit` | config.py L137 |
+| `learning_rate` | `7e-5` | config.py L138 |
+| `lr_scheduler_type` | `cosine` | config.py L139 |
+| `adam_beta1` | `0.9` | config.py L143 |
+| `adam_beta2` | `0.99` | config.py L144 |
+
+## LoRA
+
+| key | value | source |
+|---|---|---|
+| `lora_rank` (r) | `32` | config.py L41 |
+| `lora_alpha` | `32` | config.py L42 |
+| `lora_dropout` | `0.0` | config.py L45 |
+| `lora_bias` | `none` | config.py L46 |
+
+## GRPO
+
+| key | value | source |
+|---|---|---|
+| `beta` (KL coeff) | `1e-3` | config.py L135 |
+| `num_generations` | `16` | config.py L162 |
+| `per_device_batch_size` | `32` (prompts) | config.py L164 |
+| `temperature` | `0.7` | config.py L172 |
+| `auto_find_batch_size` | `True` first run | config.py L165 |
+
+Canonical effective per-step rollout count = `per_device_batch_size * num_generations = 32 * 16 = 512`.
+We do NOT run the canonical setting. See deviations below.
+
+## Additional canonical hyperparams (not in tables above)
+
+| key | value | source |
+|---|---|---|
+| `max_prompt_length` | `1536` | run_rl_training.py L73 |
+| `max_completion_length` | `1536` | run_rl_training.py L73 |
+| `warmup_steps` | `10` | config.py |
+| `top_p` | `0.95` (default) | verl GRPOConfig |
+| `cache_activations_layers` | `[18]` | config.py L195-197 |
+
+## Our deviations for compute fit (96GB single-GPU)
+
+The justfile uses `MODEL=Qwen/Qwen3.5-2B` as H4-main:
+
+| key | canonical | ours | rationale |
+|---|---|---|---|
+| `num_generations` | `16` | `8` | halve rollouts to fit 96GB |
+| `per_device_batch_size` | `32` | `16` | halve prompts to fit 96GB |
+| **effective batch** | **512** | **128** | 4x smaller |
+
+If H4 falsified (Qwen3.5-2B hack rate <30% at step 200), the justfile switches
+`MODEL=Qwen/Qwen3-4B` and the **same** NUM_GEN=8, BATCH=16 settings (further
+reduction to 4/16 = 64 effective batch is the secondary fallback if 4B OOMs).
+
+Smaller batches are noisier for GRPO advantage normalization. Expect higher SEM
+and budget more seeds if either path is taken.
+
+## Our additions (projected_grpo only)
+
+| key | default | scope |
+|---|---|---|
+| `vhack_m` | `16` | SVD top-m for v_hack denoising; sweep ∈ {8, 16, 32} |
+| `vhack_layer_frac` | `0.7` | layer fraction for v_hack extraction (per Wu-Tang 60-75% depth) |
+| `preserve_magnitude` | `True` | restore ‖g‖ after projection; ablation arm sets False |
+| `n_pairs` | `20` | initial contrastive pair count; scale to 60-80 for headline runs |