alignment-handbook

mirror of https://github.com/wassname/alignment-handbook.git synced 2026-06-27 16:14:07 +08:00

Author	SHA1	Message	Date
wassname	6d128ea986	wip	2025-06-04 05:37:07 +00:00
wassname	a264efaa4c	better formating	2025-06-03 22:21:18 +00:00
wassname	097e4e0b01	wip	2025-06-02 22:31:52 +00:00
wassname	880d4eda1e	chat template fix	2025-06-02 07:27:46 +00:00
wassname	2819dd46d0	fmt	2025-06-02 07:13:52 +00:00
wassname	fc7d4ed451	configs	2025-06-02 06:20:04 +00:00
wassname	8708597941	wip	2025-06-02 05:51:13 +00:00
Loubna Ben Allal	ae3f44fc7d	Add Smollm2 pipeline (#205 ) * add smollm2 pipeline * update readme	2024-11-21 13:46:39 +01:00
Loubna Ben Allal	73dce0c35d	Add Smollm (#194 ) * add smollm * add to news	2024-08-19 08:47:20 +02:00
Kashif Rasul	95dc47218c	update API to use latest TRL (#182 ) * update API * update deepspeed * update black * remove unused import * fix typos * fix typos in readmes * fix grammer * removed as it exists in superclass * fixes in readme * Update README.md Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com> * Update src/alignment/configs.py Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com> * Update src/alignment/configs.py Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com> * Update src/alignment/configs.py Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com> * Update src/alignment/configs.py Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com> * add back dataset_kwargs * use hub_model_revision in sft and dpo * fix duplicate --------- Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>	2024-07-30 09:16:25 +02:00
Chansung Park	606d2e954f	Add fsdp+qlora support (#160 )	2024-05-08 15:08:13 +02:00
Alvaro Bartolome	70769f9e9b	Add `run_orpo.py` (#143 ) * Add `ORPOConfig` * Add `task=orpo` and support `(prompt,chosen,rejected)` datasets * Add missing `model_init_kwargs` and `dataset_num_proc` * Add `run_orpo.py` (WIP) * Update `trl` dependency from source * Add `setup_chat_format` before `apply_chat_template` * Add `config_full.yaml` for `mistral-7b-orpo` * Fix comment indentation * Use `chat_template=chatml` instead * Add `kaist-ai/mistral-orpo-capybara-7k` recipe * Rename `DPOTrainer` to `ORPOTrainer` in `config_full.yaml` files * Run `black --line-length 119 src` * Add `is_openai_format` to fix `(prompt,chosen,rejected)` formatting * Run `black --line-length 119 src` * Fix `isort` in `run_orpo.py` * Update `mistral-capybara/orpo/config_full.yaml` * Check if `test` is available split * Pin `trl` to `alvarobartt/trl` fork (debugging) * Add `qwen-capybara` recipe * Update `mistral-capybara` recipe * Set `add_generation_prompt=True` if `task="orpo"` * Reduce `logging_steps` to 10 * Unset `add_generation_prompt` when `task=orpo` * Add filtering based on prompt length Done similarly to the original implementation, in order to better reproduce their results * Fix prompt length filtering * Update `trl` pinned version * Remove extra outdate config files * Update `recipes/mistral-capybara/orpo/config_full.yaml` * Run `make style` * Activate BEAST MODE * Pin deps * Add readme * Fix dep --------- Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>	2024-04-11 16:02:20 +02:00
Sergei Bogdanov	c44cb1cd1d	fix: Zephyr LoRA fine-tuning fixed (#139 ) Co-authored-by: svbogdanov <sergei@numind.ai>	2024-03-21 19:28:31 +01:00
Bram Vanroy	595023faa4	Adding continued_pretraining task (#131 ) * add continued pretraining script * simplify config; add dataset_config option * add ds configs in data mixer creator * use extended sftconfig * add option to avoid setting chat template * fix data_configs bug * add continued pretraining info * add gpt2-nl recipe for continued pretraining example * add final newline * make style * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/gpt2-nl/README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * rename continued pretraining to cpt * improve README --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-03-14 15:15:23 +01:00
lewtun	a9b8a50a27	🌟 (#135 ) * Add StarChat2 * Add DPO * Fix unit test * Typos * Typo	2024-03-12 17:22:21 +01:00
lewtun	ff618a4d13	🪁 (#129 ) * Add Gemma 7B recipe * Use Gemma template * Make it work for dolly lol * Enable cahce * Clean up * DPO to the max * DPO, DPO, DPO * Add openhermes * Add custom configs * Add kwargs * Fix config * Bump deps * Move old recipes * Add doc * Add norte * Renable cache * Nuke * Clean * Apply suggestions from code review Co-authored-by: Alvaro Bartolome <alvaro@argilla.io> * Fix isort * Update README.md * Update config_full.yaml --------- Co-authored-by: Alvaro Bartolome <alvaro@argilla.io> Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>	2024-03-01 17:29:42 +01:00
lewtun	87cc800498	Apply quantization during DPO QLoRA (#115 ) * Add QLoRA fix * Update script	2024-02-05 16:50:17 +01:00
Costa Huang	8df2271324	Constitutional AI recipe (#108 ) * cai * add training configuration * update readme * Update recipes/cai/README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/cai/README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/cai/README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/cai/README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/cai/README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * rename * update * rename * Quick change --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-02-01 07:02:19 -08:00
Edward Beeching	cbcb3f60fb	DPO/IPO/KTO ablations (#104 ) * adds configs and readme * cleaning config files * fix typos and removes things from config * updates text to use comparisons rather ablations * fix readme and adds launch script * fix launch script, adds blogpost link * bump release version, added missing dep, fixes configs * updates main readme file	2024-01-18 14:55:00 +01:00
lewtun	f0ffa0d7a6	Update Zephyr configs to account for UltraFeedback & TRL fixes (#88 ) * Add files * Add checkpointing * Add checkpointing to SFT * Add loss type * Fix setup\| * Clean SFT * Add lora config * Rename config * Remove max eval samples * Add kwargs tp push to hub * Add DPO configs * Fix dpo configs * Extend chat template test to multi-turn * Add warmup * Refactor * Fix LoRA -> QLoRA * Fix configs * Specify chat template * Add sample logging * Fix push to hub hanging * Add reentrant * Fix quality * Add transformer logging * Tweak grad acc * Add null type * Add doc	2024-01-10 17:42:24 +11:00
Evgenii Zheltonozhskii	e316174e1c	Add warmup to config (#71 ) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2024-01-04 16:04:46 +11:00
Thomas Capelle	f025057ce4	Missing config params on SFT (#31 ) * fix warmup with total number of steps * Explicitely tell to use 80GB Gpus * Revert "fix warmup with total number of steps" This reverts commit 760e477efdbf7f67be766a0d43b0c3b2ac26947a.	2023-11-21 12:00:09 +01:00
lewtun	4c6226bc42	Add moar explanations (#18 )	2023-11-12 15:43:39 +01:00
Sebastian Schramm	d48a4a477b	Resolves #13 fix typo in zephyr recipe readme	2023-11-10 16:57:39 +01:00
Lewis Tunstall	e2e8ab945d	Refactor imports	2023-11-10 13:38:45 +00:00
Lewis Tunstall	edf67d1d93	Tweaks	2023-11-10 11:15:45 +00:00
Lewis Tunstall	a0b8d49424	Rename recipe	2023-11-10 10:49:13 +00:00
edbeeching	0f0b61c096	ups lora bs x grad_acc to 64	2023-11-10 09:30:54 +01:00
edbeeching	13141a4b0b	adds updated model paths, adds eval to sft scripts	2023-11-10 09:26:39 +01:00
Lewis Tunstall	4b0769d137	Fix links	2023-11-09 14:42:57 +00:00
Lewis Tunstall	44b324487d	Bump bs	2023-11-09 14:20:43 +00:00
Lewis Tunstall	756bb76d22	Fix Slurm opts	2023-11-09 14:09:52 +00:00
Lewis Tunstall	33a0ce3afd	Add more doc	2023-11-09 13:39:03 +00:00
edbeeching	3a5430222e	removes need for yq dep	2023-11-09 13:04:34 +01:00
edbeeching	49da3ef739	adds configs and instructions for lora training	2023-11-09 10:56:25 +01:00
Lewis Tunstall	2de17f5ba1	Add doc	2023-11-09 07:32:24 +00:00
Lewis Tunstall	e2c19a0252	Tweak	2023-11-08 23:09:16 +00:00
Lewis Tunstall	ee10c4efd9	Make DPO work!	2023-11-08 22:58:34 +00:00
Lewis Tunstall	e54e095978	Make it work for realz	2023-11-08 22:20:17 +00:00
Lewis Tunstall	d2900adc83	Make it work!	2023-11-08 16:31:57 +00:00
Lewis Tunstall	967eab4cfb	Add skeleton	2023-11-08 13:21:57 +00:00
Lewis Tunstall	8197fe1b1e	Update readme	2023-10-09 16:46:56 +02:00
Lewis Tunstall	8903d4aff8	Add skeleton structure	2023-08-29 09:33:26 +02:00

43 Commits