wassname

persona-steering-template-library

Measured persona prompt templates and contrastive persona pairs for steering experiments

Updated 2026-06-25 14:08:19 +08:00

steer-heal-love

Hypothesis: you can distill a steering vector into LoRA weights and "heal" the incoherency the vector injects by regularising the training (KL to base, or weight decay). Then loop and see what multiple rounds give you.

Updated 2026-06-24 20:50:29 +08:00

lora-lite

Python 0 0

A hackable, single-file-per-variant LoRA library built on PyTorch forward hooks.

Updated 2026-06-19 08:47:41 +08:00

evil_MoE

Python 0 0

Putting the E in MoE with an evil expert (can initial seeding, cause follow up unwated behaviour to absorb into a MoE)

Updated 2026-06-14 13:06:38 +08:00

grpo_proj2

Python 0 0

Updated 2026-06-01 14:30:20 +08:00

minicache

Python 0 0

Updated 2026-05-15 14:44:57 +08:00

isokl_steering_calibration

Python 0 0

Updated 2026-05-13 10:46:52 +08:00

weight-steering

Python 0 0

Updated 2026-05-05 08:12:41 +08:00

autoresearch_template

Python 0 0

Updated 2026-04-05 07:04:52 +08:00

greater_tables_project

Python 0 0

HTML tables from pandas DataFrames

Updated 2026-02-27 16:36:41 +08:00

alignment-handbook

Python 0 0

Robust recipes to align language models with human and AI preferences

Updated 2025-06-04 13:37:07 +08:00

SimPO

Python 0 0

SimPO: Simple Preference Optimization with a Reference-Free Reward

Updated 2025-06-02 13:26:08 +08:00

emergent-misalignment

Python 0 0

Updated 2025-04-27 15:37:14 +08:00

vllm

Python 0 0

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated 2025-03-07 08:18:58 +08:00

xbsjsonedit

Python 0 0

A basic editor for xBrowserSync json backup files

Updated 2024-11-09 10:49:53 +08:00