Add concepts and motivation sections for Guided CoT and Daily Dilemmas

This commit is contained in:
wassname
2026-04-10 08:46:23 +08:00
parent 628199d2b4
commit c961523540
3 changed files with 20 additions and 6 deletions
+11 -6
View File
@@ -2,19 +2,24 @@
"cells": [
{
"cell_type": "markdown",
"id": "4f22075e",
"id": "257ad67e",
"metadata": {},
"source": [
"# Brukino's AntiPaSTO Appetizer: Guided CoT Eval & Frenet-Serret Curvature\n",
"\n",
"Testing if $\\kappa$ spikes late in the Chain of Thought when the model's criterion shifts.\n",
"*Note: Using `Qwen2.5-0.5B-Instruct` as `Qwen3.5-0.8B` is not publicly available on HuggingFace.*\n"
"*Note: Using `Qwen2.5-0.5B-Instruct` as `Qwen3.5-0.8B` is not publicly available on HuggingFace.*\n",
"\n",
"## Concepts & Motivation\n",
"\n",
"- **Guided Chain-of-Thought (CoT) with Logprobs:** Standard teacher-forced evaluation misses how the reasoning process itself changes, while full on-policy generation is slow and hard to parse. The *Guided CoT* trick strikes a balance: we let the model generate a short reasoning trace (~32 tokens) greedily, then append a fixed suffix (e.g., `\\nI should answer now.\\nMy choice: **`) to force a decision. By running a single forward pass over this combined sequence, we extract both the hidden state trajectory of the reasoning *and* calibrated log-probabilities (`log P(Yes) - log P(No)`) at the final position.\n",
"- **Daily Dilemmas (Self-Honesty Subset):** Sourced from `wassname/daily_dilemmas-self-honesty` (adapted from the Reddit *AmITheAsshole* subreddit), these are moral dilemmas where honesty explicitly conflicts with other values. Simple prompting (e.g., \"You are honest\") often struggles here. By testing opposite personas on these dilemmas, we observe if structural shifts in reasoning (captured by $\\kappa$) correlate with actual preference flipping."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fdfdeb3c",
"id": "1f677aab",
"metadata": {},
"outputs": [],
"source": [
@@ -38,7 +43,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "fb3c4b88",
"id": "96c80ad7",
"metadata": {},
"outputs": [],
"source": [
@@ -66,7 +71,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2e52f347",
"id": "0774537a",
"metadata": {},
"outputs": [],
"source": [
@@ -118,7 +123,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "dc78efbb",
"id": "cb27ea7c",
"metadata": {
"lines_to_next_cell": 2
},