mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 20:21:41 +08:00
74 lines
2.7 KiB
Markdown
74 lines
2.7 KiB
Markdown
# Canonical GRPO hyperparameters
|
|
|
|
Sourced from `external/rl-rewardhacking/src/train/config.py` (Ariahw, Engels &
|
|
Nanda 2025). These are the defaults we inherit for all our headline runs;
|
|
deviations must be flagged in [docs/RESEARCH_JOURNAL.md](RESEARCH_JOURNAL.md).
|
|
|
|
## Optimizer
|
|
|
|
| key | value | source |
|
|
|---|---|---|
|
|
| `optim` | `adamw_8bit` | config.py L137 |
|
|
| `learning_rate` | `7e-5` | config.py L138 |
|
|
| `lr_scheduler_type` | `cosine` | config.py L139 |
|
|
| `adam_beta1` | `0.9` | config.py L143 |
|
|
| `adam_beta2` | `0.99` | config.py L144 |
|
|
|
|
## LoRA
|
|
|
|
| key | value | source |
|
|
|---|---|---|
|
|
| `lora_rank` (r) | `32` | config.py L41 |
|
|
| `lora_alpha` | `32` | config.py L42 |
|
|
| `lora_dropout` | `0.0` | config.py L45 |
|
|
| `lora_bias` | `none` | config.py L46 |
|
|
|
|
## GRPO
|
|
|
|
| key | value | source |
|
|
|---|---|---|
|
|
| `beta` (KL coeff) | `1e-3` | config.py L135 |
|
|
| `num_generations` | `16` | config.py L162 |
|
|
| `per_device_batch_size` | `32` (prompts) | config.py L164 |
|
|
| `temperature` | `0.7` | config.py L172 |
|
|
| `auto_find_batch_size` | `True` first run | config.py L165 |
|
|
|
|
Canonical effective per-step rollout count = `per_device_batch_size * num_generations = 32 * 16 = 512`.
|
|
We do NOT run the canonical setting. See deviations below.
|
|
|
|
## Additional canonical hyperparams (not in tables above)
|
|
|
|
| key | value | source |
|
|
|---|---|---|
|
|
| `max_prompt_length` | `1536` | run_rl_training.py L73 |
|
|
| `max_completion_length` | `1536` | run_rl_training.py L73 |
|
|
| `warmup_steps` | `10` | config.py |
|
|
| `top_p` | `0.95` (default) | verl GRPOConfig |
|
|
| `cache_activations_layers` | `[18]` | config.py L195-197 |
|
|
|
|
## Our deviations for compute fit (96GB single-GPU)
|
|
|
|
The justfile uses `MODEL=Qwen/Qwen3.5-2B` as H4-main:
|
|
|
|
| key | canonical | ours | rationale |
|
|
|---|---|---|---|
|
|
| `num_generations` | `16` | `8` | halve rollouts to fit 96GB |
|
|
| `per_device_batch_size` | `32` | `16` | halve prompts to fit 96GB |
|
|
| **effective batch** | **512** | **128** | 4x smaller |
|
|
|
|
If H4 falsified (Qwen3.5-2B hack rate <30% at step 200), the justfile switches
|
|
`MODEL=Qwen/Qwen3-4B` and the **same** NUM_GEN=8, BATCH=16 settings (further
|
|
reduction to 4/16 = 64 effective batch is the secondary fallback if 4B OOMs).
|
|
|
|
Smaller batches are noisier for GRPO advantage normalization. Expect higher SEM
|
|
and budget more seeds if either path is taken.
|
|
|
|
## Our additions (projected_grpo only)
|
|
|
|
| key | default | scope |
|
|
|---|---|---|
|
|
| `vhack_m` | `16` | SVD top-m for v_hack denoising; sweep ∈ {8, 16, 32} |
|
|
| `vhack_layer_frac` | `0.7` | layer fraction for v_hack extraction (per Wu-Tang 60-75% depth) |
|
|
| `preserve_magnitude` | `True` | restore ‖g‖ after projection; ablation arm sets False |
|
|
| `n_pairs` | `20` | initial contrastive pair count; scale to 60-80 for headline runs |
|