Files
adapters_as_hypotheses/.comprehension_lesswrong.json
T
wassname 7dbe3faf19 wip
2026-02-22 21:52:57 +08:00

134 lines
4.8 KiB
JSON

{
"mode": "editor",
"full_text_verified": true,
"claims_checked": [
{
"claim": "- Probing finds representations that predict behavior, but high probe accuracy does not mean the mod",
"number": "2022",
"status": "number_found_in_paper"
},
{
"claim": "- CCS discovers latent knowledge but cannot intervene on it (Burns et al., 2022).",
"number": "2022",
"status": "number_found_in_paper"
},
{
"claim": "I articulated this in my [AntiPaSTO paper](https://arxiv.org/abs/2601.07473), and GDM's interpretabi",
"number": "2601.07473",
"status": "number_found_in_paper"
},
{
"claim": "I articulated this in my [AntiPaSTO paper](https://arxiv.org/abs/2601.07473), and GDM's interpretabi",
"number": "3",
"status": "number_found_in_paper"
},
{
"claim": "I went through ~30 PEFT methods in HuggingFace PEFT and the broader literature. For each one I:",
"number": "30",
"status": "number_found_in_paper"
},
{
"claim": "1. Extracted pseudocode for the forward pass (what the intervention actually does)",
"number": "1.",
"status": "number_found_in_paper"
},
{
"claim": "2. Stated the hypothesis it encodes about transformer internals",
"number": "2.",
"status": "number_found_in_paper"
},
{
"claim": "- **PiSSA** (NeurIPS 2024): Initialize LoRA from top-$r$ SVD of $W$, freeze the residual. Gemma-7B o",
"number": "2024",
"status": "number_found_in_paper"
},
{
"claim": "- **PiSSA** (NeurIPS 2024): Initialize LoRA from top-$r$ SVD of $W$, freeze the residual. Gemma-7B o",
"number": "7",
"status": "number_found_in_paper"
},
{
"claim": "- **SVFT**: Fix both singular vector sets from $W$'s SVD, learn only sparse coefficients. Recovers 9",
"number": "96%",
"status": "number_found_in_paper"
},
{
"claim": "- **SSVD**: Rotate right singular vectors (Cayley transform), shift singular values, keep left singu",
"number": "10",
"status": "number_found_in_paper"
},
{
"claim": "- **DoRA** (ICML 2024): Magnitude/direction decomposition of $W$. Consistently beats LoRA.",
"number": "2024",
"status": "number_found_in_paper"
},
{
"claim": "- **DeLoRA** (ICLR 2025): Normalize each rank-1 component, introduce learnable scalar $\\lambda$. Bet",
"number": "2025",
"status": "number_found_in_paper"
},
{
"claim": "- **DeLoRA** (ICLR 2025): Normalize each rank-1 component, introduce learnable scalar $\\lambda$. Bet",
"number": "1",
"status": "number_found_in_paper"
},
{
"claim": "- **ROAD**: 2D rotary adaptation with explicit angle $\\theta$ and magnitude $\\alpha$.",
"number": "2",
"status": "number_found_in_paper"
},
{
"claim": "**IA3** learns nothing but a per-channel scaling vector ($\\lambda \\in \\mathbb{R}^d$, initialized to ",
"number": "3",
"status": "number_found_in_paper"
},
{
"claim": "**IA3** learns nothing but a per-channel scaling vector ($\\lambda \\in \\mathbb{R}^d$, initialized to ",
"number": "1",
"status": "number_found_in_paper"
},
{
"claim": "**Conflict of interest disclosure:** the strongest OOD result in this catalog is my own work. [AntiP",
"number": "2601.07473",
"status": "number_found_in_paper"
},
{
"claim": "**Conflict of interest disclosure:** the strongest OOD result in this catalog is my own work. [AntiP",
"number": "1",
"status": "number_found_in_paper"
},
{
"claim": "I give AntiPaSTO the highest grade (\\*\\*!!!\\*\\*) because the OOD transfer is the kind of evidence th",
"number": "4",
"status": "number_found_in_paper"
},
{
"claim": "**Decoupling family:** DoRA (magnitude/direction) -> ETHER (fixed-strength orthogonal) -> DeLoRA (no",
"number": "1",
"status": "number_found_in_paper"
},
{
"claim": "- **Scale dependence.** Most of these results are on 1B-7B models. The geometry might change at 70B+",
"number": "1",
"status": "number_found_in_paper"
},
{
"claim": "- **Scale dependence.** Most of these results are on 1B-7B models. The geometry might change at 70B+",
"number": "7",
"status": "number_found_in_paper"
},
{
"claim": "The full catalog with pseudocode, evidence, and grades for 30 methods is at:",
"number": "30",
"status": "number_found_in_paper"
}
],
"missing_context": [
"Paper limitations not discussed"
],
"citations_found": [
"- CCS discovers latent knowledge but cannot intervene on it (Burns et al., 2022)"
],
"suggestions": [],
"verdict": "needs_work"
}