71 Commits

Author SHA1 Message Date
wassname 48c1b07b83 readme 2026-05-05 08:12:41 +08:00
wassname cf0f7d6c54 results 2026-05-04 18:33:19 +08:00
wassname 7eac38829d hmm 2026-05-04 06:17:30 +08:00
wassname 49eba3e853 fix: remove StaticCache from data gen (breaks Qwen3.5-4B hybrid attention)
StaticCache passed as past_key_values triggers create_masks_for_generate()
which requires linear_attention mask not in transformers 5.6.x. Plain
generate() uses DynamicCache and avoids this code path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 18:24:33 +08:00
wassname 57a08750b8 fix: on-policy data paths, 4-bit inference, revert adapter defaults
- data/load_pairs: path now includes model slug (out/data/{model}/{behavior})
  so data from different models can't be silently reused
- data.py, kl_calibrate.py, tinymfv_airisk.py: add use_4bit=True with
  BitsAndBytesConfig for inference stages; training stays bfloat16
- run_sweep/kl_calibrate/eval_tinymfv_calibrated: revert adapter defaults
  to full list; pass --adapters delora via CLI for this first run
- add bitsandbytes dep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 17:31:09 +08:00
wassname 7396bc1544 chore: default adapters to delora-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 17:24:23 +08:00
wassname 553d15b9c3 fix: remove flash_attention_2, revert to Qwen3.5-4B
Qwen3.5-4B + FA2 trips linear_attention masking in transformers.
sdpa (default) works fine; sl confirmed same approach in their sweep.
Model reverted to Qwen3.5-4B to match sl baselines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 17:21:38 +08:00
wassname 43278709d7 fix: transformers>=5.6.0, flash-attn locked, switch to Qwen3-4B
Qwen3.5-4B requires linear_attention mask support not in transformers<5.6.
Qwen3-4B uses standard full_attention and works with current transformers.
flash-attn added as URL dep so uv sync keeps it in .venv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:54:50 +08:00
wassname 8ed3103e47 feat(authority): add authority behavior, logratio+SI metrics, prune dead code
- Add AUTHORITY_PROMPT + 3 persona pairs (MFT-paper framing, sl-identical)
- Wire authority into data._personas/_topics/_build_specs
- Add SINGLE_FOUNDATION + _axis_shift for single-foundation behaviors
- Add logratio to per-vignette/frame scoring (same convention as sl)
- Add _si.py: port si_per_foundation from sl foundations.py
- Drop prompt_baseline mode, repe, sycophancy, subspace, run_demo
- Strip kl_calibrate to dW-only; remove repe+prompt_texts deps
- Simplify replicate.py to train+diff only (no eval/demo/subspace)
- Default behavior="authority" across eval, sweep, replicate
- Install tinymfv git dep; flash_attn 2.6.3 prebuilt wheel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 14:04:23 +08:00
wassname 309afaf4d8 feat(auth_care): align ws with steering-lite for cross-repo comparable rows
- ws.data: add AUTH_CARE_{POS,NEG}_PERSONAS + prompt template (verbatim from
  sl branching.py PERSONA_PAIRS_AUTH_CARE)
- ws.prompt_texts: add ENGINEERED_PROMPT_AUTHCARE (verbatim from sl
  baseline_engineered_prompt.py L46-56), register in PROMPTS dict
- ws.eval.tinymfv_airisk: AXIS_PAIR['auth_care'] = ('Care', 'Authority');
  default model -> Qwen3.5-4B; default behavior -> auth_care; defaults
  prompt_pos=engineered_prompt_authcare, prompt_neg=base (one-sided
  baseline matching sl baseline_engineered_prompt)
- ws.scripts.eval_tinymfv_calibrated: bump defaults to auth_care +
  Qwen3.5-4B; fix prompt-baseline subprocess to forward
  engineered_prompt_authcare instead of trad_care prompts (was silently
  writing wrong-axis prompt-baseline CSV regardless of cfg.behavior)
- ws.scripts.readme_tinymfv_table: parametrise on cfg.behavior
  (BEHAVIOR_AXIS dispatch for title/blurb/arrows); model_label CLI field
  replaces hardcoded Qwen3-0.6B in bare-table source labels
- ws.eval.airisk: relax pos_rows guard so single-sided runs go through
2026-05-03 08:13:01 +08:00
wassname 9dff8d0256 feat: add auth_socn behavior + behavior-aware axis_shift + pmass/flips/bare-logit eval helpers
- data.py: AUTH_SOCN_POS/NEG_PERSONAS (6 pairs, ported from steering-lite branching.py),
  wired into _personas() / _topics() / _build_specs() for auth_socn behavior
- tinymfv_airisk.py: AXIS_PAIR dict + behavior-aware _axis_shift (auth_socn uses
  ΔlogitSocNorms − ΔlogitAuthority vs trad_care's ΔlogitSanc − ΔlogitCare);
  PMASS_FLOOR=0.9 NaN-gate; _logit NaN-safe; _flips_per_foundation_table;
  _bare_logit_per_foundation_table; new __foundations_flips.csv + __bare_logit.csv artifacts
- README: fill trad_care comparison table with actual ws results (jobs 93-96),
  add bare model row for ws, add sl:engineered_prompt row

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 06:11:48 +08:00
wassname 497ee05aef first pass care vs sanctity 2026-05-03 06:02:07 +08:00
wassname aa4fcff446 scripts(readme_tinymfv_table): mirror steering-lite layout
- Split bare table (absolute logit per foundation) from Δ table
- Add C (calibrated coeff) and kl (achieved p95) columns to Δ table; read
  from out/<behavior>/kl_calibration/summary.csv
- Cells now show mean±std, sourced from dlogit_std (ws) and the per-foundation
  std field of steering-lite JSONs
- Headers: "Care ↓" and "Sanc ↑" mark target direction
- Sort Δ rows by |axis| descending
- Preserve signs in tabulate output via disable_numparse=True
2026-05-02 20:53:19 +08:00
wassname aa0b07451d scripts: tinymfv comparison table + calibrated eval wrapper
- ws.scripts.readme_tinymfv_table: cue / axis_shift / per-foundation Δlogit
  table that combines ws adapter rows (loaded from
  out/trad_care/<adapter>/*__foundations_dlogit.csv) with steering-lite's
  frozen baselines (loaded from
  lite/steering-lite/outputs/tinymfv_sweep/*.json). Same axis, same metric,
  same iso-KL footprint -> directly comparable.
- ws.scripts.eval_tinymfv_calibrated: thin launcher that reads
  out/<behavior>/kl_calibration/summary.csv and runs ws.eval.tinymfv_airisk
  once per adapter with --coeffs -alpha_neg 0.0 +alpha_pos. Necessary
  because the pos/neg alphas are asymmetric per adapter.
2026-05-02 19:47:09 +08:00
wassname f866618eac feat: trad_care behavior + per-foundation Δlogit (tiny-mfv axis pivot)
OOD eval was framed as "steer for honesty, eval on airisk wrongness" but
tiny-mfv is multi-foundational (Care/Sanctity/Authority/...). Honesty isn't
a clean axis it measures, and a 0.6B model has weak honesty representations
to steer; the result was inconsistent shifts we over-interpreted.

Pivot mirrors steering-lite: train on Care-vs-Traditional/Sanctity persona
pair, eval with paired-by-(vid,cond) Δlogit per foundation, composite
axis_shift = ΔlogitSanctity − ΔlogitCare (nats). Directly comparable across
both repos.

- ws.data: TRAD_CARE_PROMPT/POS/NEG (6 paraphrase pairs, ported verbatim from
  steering-lite/branching.py); _personas/_topics/_build_specs branches.
- ws.repe: fit_repe_directions branch for trad_care (same recipe as honesty).
- ws.prompt_texts: simple_*_prompt + engineered_*_prompt entries for the
  prompt_only baseline row (alpha>0 -> traditional, alpha<0 -> caring).
- ws.eval.tinymfv_airisk: ported _logit (eps=0.01), _per_vidcond_wrongness,
  _dlogit_per_foundation_table, _axis_shift; emits new <stem>__foundations_dlogit.csv
  and reports axis_shift in BLUF (cue thresholds 0.5/0.15 nats). Existing
  outputs preserved.
2026-05-02 19:43:07 +08:00
wassname 0bc46dc51e cuda 2026-05-02 06:04:58 +08:00
wassname 4f2034dd46 tidy 2026-05-02 05:52:25 +08:00
wassname 71a8d4c555 tidy 2026-05-01 22:29:06 +08:00
wassname 63715bbf99 logging 2026-05-01 22:22:09 +08:00
wassname b4a8a0351d feat: add n_think parameter to evaluation functions for guided reasoning 2026-05-01 21:13:30 +08:00
wassname 27cf12c2d8 Switch AIRisk evals to tiny-mfv workflow 2026-05-01 20:47:31 +08:00
wassname a0f4e719af Add batched data gen and bidir calibration 2026-05-01 18:58:08 +08:00
wassname b2ef8fef7b wip 2026-04-30 21:06:18 +08:00
wassname 44e16b0c9a fix: keep all 438 rows in DD eval (both to_do and not_to_do per dilemma)
Old filter used to_do_has_positive_honesty / not_to_do_has_negative_honesty
flags, which kept 194 to_do + 33 not_to_do = 227 rows (85% to_do only).

The dataset has honesty_label=+/-1 for all 438 rows via symmetric labeling
(if to_do is honest, not_to_do gets -1 regardless of its own values).
Filtering to honesty_label != 0 keeps all 438 rows and pairs both sides of
every dilemma, matching AntiPaSTO's symmetric evaluation approach.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:58:20 +08:00
wassname 93334c5889 fix: match AntiPaSTO prompt format (INSTRUCTION_PROMPT + anchor)
Was: "Do you choose to: {action}?" with "My choice: **" anchor.
AntiPaSTO ref: "Should you take this action? Action: {action}" with "My choice:" anchor.

The ** anchor told the model to produce markdown-bolded answers, scoring
at the wrong token boundary. "My choice:" anchor scores " Yes"/" No"
which get_choice_ids already catches via prefix match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:56:00 +08:00
wassname ce73e97154 fix: skip guided-CoT for non-thinking models; trim README
Gemma-3/4 don't have </think> as a special token, so guided_cot_one
raised RuntimeError and killed the whole sweep. Fix: add has_thinking_mode
to _tok_extras and gate phase_a2 in replicate.py on it.

README cut from ~380 to ~120 lines: results tables, how to run, cite, links.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 05:39:50 +08:00
wassname 5704b00175 gemma4: disable thinking mode via enable_thinking=False in apply_chat_template
Gemma 4 (E2B/E4B) uses channel-based thinking tokens (<|think|>, <|channel>).
chat_template_extras() detects this via template string and passes
enable_thinking=False to all apply_chat_template calls in data gen,
dilemmas eval, and KL calib (via build_chat_text). Qwen3 and Gemma 3
return {} (existing thinking-mode handling unchanged).
2026-04-28 21:47:33 +08:00
wassname 08efb837c0 kl_calibrate: greedy-trajectory KL + Illinois regula-falsi root search
Refactor calibration to match the gist methodology: for each prompt, greedy-
generate n_tokens under the steered policy, capture per-step steered
log-probs, then teacher-force the same continuation under base. Per-position
KL(steered ‖ base) is computed along the steered trajectory rather than at
fixed continuation positions. Captures cumulative drift the old fixed-
continuation KL missed.

Replaces 1-step Newton on alpha with exponential bracket -> Illinois regula-
falsi in log-(alpha, p95) space. Linear in log-log (since p95 ~ alpha^k near
root) so usually converges in 3-4 iters, with Illinois rule breaking the
stuck-endpoint failure mode of pure regula falsi.
2026-04-28 21:23:41 +08:00
wassname 7440229d48 narrow honesty: clamp n_personas to list length, expose grid in sweep
Allows narrow honesty (1 persona pair) to share data-volume parity with
broader behaviors by bumping n_samples. data.py logs the clamp; replicate.py
on-disk size check uses clamped n_personas; run_sweep.py exposes
n_topics/n_personas/n_samples to CLI.

README clarifies honesty_label provenance: party='You' filter from
Action_to_party_to_value, not values_aggregated.
2026-04-28 21:23:32 +08:00
wassname cce818b03f dilemmas: per-action-type SI breakdown in summary CSV
Adds SI_to_do, SI_not_to_do (and si_fwd/si_rev/n_cho/n_rej variants)
to compute_full_metrics output. Lets us see if a steering method's
effect is symmetric across "Should you do X?" vs "Should you NOT do X?"
framings, or if it leans on one. Already-known pattern: paired-opposite
filler rows produced large SI_to_do > SI_not_to_do gaps before the
honesty-only filter landed.
2026-04-28 21:12:57 +08:00
wassname 0f050f2734 honesty: narrow training/prompt/eval to honesty-only axis
The wassname/daily_dilemmas-self-honesty dataset uses paired-opposite
labels: if to_do has honesty in you_values -> to_do=+1, paired not_to_do=-1
even when not_to_do's own values are e.g. ['empathy'], unrelated to
honesty. Those 211/438 filler rows were dragging dW SI negative.

Changes:
- dilemmas.py: filter to action-specific honesty rows at load (227/438)
- data.py: narrow training pair to ('an honest',) / ('a dishonest',),
  matching RepE extraction, simple_*_prompt baselines, and AntiPaSTO
  honesty synonyms. Cluster paraphrases (candid/sincere/manipulative/...)
  drift onto a broader axis -- comparison was unfair.
- prompt_baseline.py: regenerate engineered prompts via AxBench J.2
  template with narrow-honesty concept slot. Old prompts brought in
  integrity/transparency (POS) and Machiavellian/manipulation (NEG).
- nbs/rescore_honesty_only.py: ad-hoc rescore script confirming the
  fix flips dW from -16 SI to +60 SI without rerunning the model.
2026-04-28 21:11:14 +08:00
wassname 06ec48d8f7 KL-budget calibration: match off-task dist-shift across methods
α=1 means very different things across LoRA/PiSSA/DeLoRA/OFT/IA3/RepE/prompt;
calibrate α per method so p95 token-KL on held-out continuations matches
prompt:engineered_prompt_honest's footprint (≈0.61 nats over 50 stratified
prompts, 100 audit). Newton iter α_next=α·sqrt(T/M) converges 7/7 methods
in 2-3 iters. At calibrated ±α on daily-dilemmas (n=219), all 6 adapters
land deeply negative SI: fix counts cluster at 14-19 across all methods,
but adapters break 65-139 already-honest rows (vs 15-20 for engineered
prompts). Interpretation: prompts perturb topic-conditionally, adapters
uniformly — at matched off-task budget, adapters scatter mass over
already-correct rows. RepE sits between.

Caveats: single seed, calibration off-task, anchor audit p95 is 1.78×
calib (calibrated conservatively).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 14:08:55 +08:00
wassname 325171c291 fix SI_best, add prompt row-alignment check, narrow dw_decomp claims
Address pi-review issues:

- SI_best: max(si_fwd, si_rev) does not equal "best honesty under post-hoc
  sign flip" because under k_fpr=2 the FPR penalty hits the swapped rate,
  so -si_rev != counter_rate - 2*flip_rate. Fix by computing
  si_honest_at_neg1_k2 = counter_rate - 2*flip_rate (role-swapped fix/broke
  for the a=-1-as-honest branch) and taking max against si_fwd.
- Prompt pairing: add (idx, dilemma_idx, action_type) symmetric-difference
  check between base, honest_prompt, and dishonest_prompt before computing
  paired SI. Previously only .sort("idx") was done, so dropped/duplicated
  rows would silently produce cross-example comparisons.
- dw_decomp narrative: mag_only preserves only one scalar per tensor (its
  Frobenius norm), then replaces all within-tensor structure with a single
  Gaussian draw. Tighten docstring + README to claim "per-tensor norm
  allocation" rather than "magnitude pattern", and flag mag_only/random_norm
  as single-seed Monte Carlo controls.

Re-run honesty_tables.py: SI_best now flips prompt:simple from -13.89 to
+3.46 because the role-swapped a=-1 branch is its better direction. Update
README OOD SI table accordingly. Refresh RepE rows in raw-logratio table
with post-padding-fix numbers (mean_pmass ~0.96, no longer ~0.17); drop
stale pmass caveat block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 09:17:56 +08:00
wassname da75668d6b move RESEARCH_JOURNAL and fork_plan under docs/
Working notes belong with the rest of the docs. Updated relative links
in docs/hypothesis_ablation_catalog.md from ../fork_plan.md to fork_plan.md
since both files now live in docs/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 09:09:52 +08:00
wassname e4504da9a5 cleanup: drop stale HANDOVER/RESEARCH_LOG, fix axis line in fork_plan
HANDOVER.md and RESEARCH_LOG.md were stubs from before the honesty-axis
switch and the work they referenced is already done. fork_plan.md still
said "sycophancy training" at line 24 even though the rest of the doc
already documents the honesty axis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 08:37:01 +08:00
wassname b7bad4e002 DeLoRA dW decomp: magnitude pattern carries most of the steering
Result: random_direction * original_per_tensor_norm (mag_only) gives a
larger positive logratio shift (+1.07 at a=+1) than the full trained
dW (+0.24), with 5x fewer broken rows. Stripping the magnitude pattern
(dir_only) collapses the effect to +0.02. So which-layers-get-updated
(magnitude allocation) explains most of the steering at +alpha; the
learned elementwise direction adds little.

If this survives multiseed and Gemma replication, it implies weight
steering for honesty needs only a learnable per-tensor scalar -- a
much smaller hypothesis class than full low-rank PEFT.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 08:33:24 +08:00
wassname 19bc3edb2e add dW magnitude/direction ablation eval
Constructs four variants of a trained dW and evaluates each on daily
dilemmas at coeffs {-1, 0, +1}:
  full         original (control)
  dir_only     elementwise direction preserved, all tensors rescaled
               to a common Frobenius norm (flattens per-tensor magnitude)
  mag_only     random direction per tensor, original per-tensor norm
               (preserves which layers/modules carry the load)
  random_norm  random direction + common norm (control)

Tests whether the trained behavior is carried by element direction or
by the per-tensor magnitude pattern. Default adapter is delora since
it has the largest raw dd_delta and the worst SI -- which factor is
load-bearing?

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 08:31:24 +08:00
wassname 64adf9267d SI tables v2: SI_best, SI_k1, fix/broke rates; paired prompts; IID syc
- Pair prompt baselines as alpha=-1/0/+1 (dishonest/base/honest) under
  simple and engineered families, giving full bidirectional SI for
  prompts (same as dW)
- Add SI_best = max(si_fwd, si_rev) * pmass^2 * 100 -- sign-aligned
  upper bound (snooping-aware robustness probe)
- Add SI_k1 (symmetric, breaks weighted 1x) alongside default SI_k2
  to expose how much the class-imbalance-driven 2x penalty contributes
- Expose fix_rate / broke_rate columns so the SI components are visible
- Add IID syc table (held-out persona claims) using
  cross_adapter_ablation/sycophancy_per_row.csv with variant=full_all_tensors
- Add raw mean +- std logratio table per (method, coeff) for OOD

The IID/OOD split shows: dW interventions land hard on IID (PiSSA biggest,
+5.7 mean shift) but most break OOD via the broke_rate channel. OFT and
engineered prompts are the only methods with non-negative SI_best.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 08:29:49 +08:00
wassname 0ded47388f SI tables: README + nbs/honesty_tables.py with adapters/prompts/RepE
- Combined methods comparison table in README using SI as primary metric
- nbs/honesty_tables.py produces SI / raw-logratio / flip-count tables
  from existing per-row CSVs (cross_adapter_full_dd, prompt_baseline,
  activation_baseline)
- prompt_baseline.py: si_fwd computed inline for prompt methods
- activation_baseline.py: tok.padding_side restore moved after the
  inference loop so logit extraction sees the correct side

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 08:25:05 +08:00
wassname df61cdc628 add surgical_informedness metric; fix simple_honest_prompt to match training persona
- dilemmas.py: compute_surgical_informedness + compute_full_metrics (ref-anchored
  bidirectional SI, k_fpr=2; forward-only fallback when coeff=-1 absent)
- prompt_baseline.py: simple_honest/dishonest prompts now use same
  HONESTY_PROMPT.format(persona=...) template as training persona prefix
  (was "You are an honest assistant..."); also adds simple_dishonest_prompt;
  _summarize computes SI per method via _si_per_method
- full_dd_benchmark.py: _summarize computes SI per adapter; output sorted
  by SI; final_summary reports SI as main_metric

Re-queue: pueue 237 (T3 prompt_baseline), 238 (T2 full_dd_benchmark)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 06:04:06 +08:00
wassname a48430b075 switch training/eval axis from sycophancy to honesty
- data.py: HONESTY_PROMPT/POS/NEG_PERSONAS (5 paraphrases each, vgel/repeng
  short-form), _load_suffixes() reading data/branching_suffixes.json,
  behavior branches in _personas/_topics/_build_specs for paper-recipe
  question pool from 550 SSteer suffix entries
- activation_baseline.py: _fit_repe_directions branches on behavior; honesty
  mode captures last-token hidden states under pos/neg personas with
  assistant_prefixes from suffix entries (all-layers RepE)
- prompt_baseline.py: paired engineered_prompt_honest + _dishonest (AxBench
  J.2), both as plain strings
- evals/smoke.py: behavior field in SmokeCfg
- data/branching_suffixes.json: 550 SSteer branching-suffix entries
- README: updated persona description, adapter table, baselines table with
  honesty-axis numbers (438 rows, delora +0.237 best)
- RESEARCH_JOURNAL.md: 2026-04-27 axis-switch entry
- fork_plan.md: open design question resolved as option 2 (honesty axis)
- HANDOVER.md: overnight handover notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 06:00:03 +08:00
wassname c828b0c00b baselines 2026-04-27 19:40:43 +08:00
wassname 6ec664995b T6/T7/T8 ablations + lens-search hold pending multiseed
- Add `eval/layer_module_ablation.py` (T7) and `eval/parameterization_ablation.py` (T8) for causal ablation of trained `dW`.
- Add `nbs/ablation_analysis.py` consuming T7/T8 CSVs through three lenses (SVD-on-`dW`, layer index, module family).
- Fix `prompt_baseline.py` engineered-prompt tuple bug; add `DIFF_FILENAME` constant in `diff.py`.
- Delete superseded notebooks (`analyze_diff*`, `cross_adapter_v9`, `hypothesis_sweep_v5-v9`, `strong_conclusion_v4`, `v10_llama`, `functional_projection_v10`).
- Document (README, fork_plan, RESEARCH_JOURNAL): each lens has a built-in failure mode (SVD tautological for low-rank adapters; layer-index tells depth not mechanism; module-family disagrees cross-adapter; native parameterization decompositions non-comparable). Mark analysis question on hold pending T4 multiseed: cross-adapter inconsistency may be N=1 seed noise.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 19:05:20 +08:00
wassname db7979d0e2 baselines 2026-04-27 13:02:34 +08:00
wassname 8fa9e54eaa docs: rewrite fork plan with UAT tasks 2026-04-27 11:22:52 +08:00
wassname a3d999fd92 wip 2026-04-27 09:59:06 +08:00
wassname 2f12058b7e clarify tested subspace and parametrization hypotheses 2026-04-27 07:10:39 +08:00
wassname b001c40521 document adapter benchmark and projection interpretation 2026-04-27 07:09:02 +08:00
wassname 25334ec574 fix daily-dilemmas cross-adapter baseline 2026-04-27 07:00:09 +08:00
wassname 6f41e47ea9 v10 functional projection falsifier for act-oracle overlap 2026-04-27 06:54:09 +08:00