Open-Assistant

mirror of https://github.com/wassname/Open-Assistant.git synced 2026-06-27 16:10:30 +08:00

Author	SHA1	Message	Date
Yannic Kilcher	b6a0eedf81	various fixes to the SFT code	2023-02-11 21:37:35 +01:00
Sotirios Anagnostidis	6d569a50dd	choice for empty eval	2023-02-11 13:21:02 +01:00
Sotirios Anagnostidis	6a68139b91	os private dataset	2023-02-11 13:20:42 +01:00
Sotirios Anagnostidis	5b1427d811	os_private	2023-02-11 13:19:09 +01:00
Sotirios Anagnostidis	ac97943be1	refactor datasets and oa private data selection	2023-02-11 11:49:58 +01:00
Sotirios Anagnostidis	23ee2f24d9	dataset args	2023-02-11 11:02:48 +01:00
Sotirios Anagnostidis	714677b5d8	add possible kwargs to datasets	2023-02-11 10:44:03 +01:00
Sotirios Anagnostidis	44ed44e05d	deactivtae samples mixing by default	2023-02-11 10:33:25 +01:00
Sotirios Anagnostidis	631ba14b34	no wandb when debugging	2023-02-11 10:30:57 +01:00
Sotirios Anagnostidis	540a96fb0e	pre commits	2023-02-11 10:29:56 +01:00
theblackcat102	9e69117ead	[fix] Fix other PR merge bug	2023-02-11 01:53:37 +00:00
theblackcat102	bcebbbc49c	[merge] Fix conflict	2023-02-11 00:23:25 +00:00
theblackcat102	34347607d4	[fix] add comments for translation data	2023-02-11 00:08:34 +00:00
jack.butler	7036df8dc0	update link to relative link	2023-02-10 15:23:06 +00:00
jack.butler	3f9e2c31ac	add hyperlink to dummy yml file	2023-02-10 15:22:27 +00:00
jack.butler	2ebb336141	update readme	2023-02-10 15:18:31 +00:00
jack.butler	5de1c02a7a	add dummy yml config for reward	2023-02-10 15:18:03 +00:00
sanagnos	4dd0d67e9c	Merge pull request #1398 from jackapbutler/fix-tokenizer-match Add tests and update docstring to tokenizer matching	2023-02-10 11:37:00 +01:00
jack.butler	24b07523aa	add test for tokenizer matching behaviour	2023-02-10 09:47:20 +00:00
jack.butler	2fbf2fa457	add docstring info about tokenizer matching	2023-02-10 09:46:58 +00:00
jack.butler	090c5cbcc2	fix tokenizer matching and add tests	2023-02-09 18:47:38 +00:00
sanagnos	4ba622de8e	Merge branch 'main' into sft-data-sampling	2023-02-09 09:19:17 +01:00
theblackcat102	a1b90bf981	Merge branch 'main' into add-dataset	2023-02-09 01:28:42 +00:00
Mark Worrall	9faae250ce	minor tidy-up	2023-02-08 20:57:13 +00:00
Mark Worrall	283df8ec84	Get working on multi-gpu	2023-02-08 20:49:25 +00:00
Mark Worrall	e2caf53654	First version of single GPU sampling working	2023-02-08 08:01:04 +00:00
theblackcat102	59dbfea48f	Merge pull request #1262 from jackapbutler/create-tokeniser-configs Add tokenizer config classes	2023-02-08 09:34:57 +08:00
theblackcat102	2c35ff6e50	[fix] patch translated history conversation	2023-02-08 00:20:11 +00:00
hyunwoongko	cb722768f7	Apply pre-commit	2023-02-08 03:50:27 +09:00
hyunwoongko	44c555cad1	Add gelu fusion	2023-02-08 03:45:22 +09:00
jack.butler	eb1c4ada2a	rename arg name to model_name	2023-02-07 09:19:24 +00:00
theblackcat102	a39cbab524	[fix] transformers import error	2023-02-07 01:26:28 +00:00
theblackcat102	af1c62cd83	[feature] Add missing hindi and spanish prompt for translation	2023-02-07 01:23:54 +00:00
jack.butler	dc7f255f01	add tokenizer config classes	2023-02-06 17:58:55 +00:00
theblackcat102	742161590f	[feature] Add rallio new instruction dataset v3	2023-02-06 00:22:30 +00:00
Kian-Meng Ang	1e321a6fca	Fix typos (#1143 ) Found via `codespell -S .mypy_cache,yarn.lock,.json,.ipynb -L rouge,nam,vie`	2023-02-05 20:18:03 +01:00
theblackcat102	0be4d88605	[feature] Add OA private RM dataset	2023-02-03 15:07:05 +00:00
theblackcat102	8b2080559c	[fix] Custom collate_fn for training	2023-02-03 06:08:01 +00:00
theblackcat102	1041564db7	[feature] mix generation from different tasks	2023-02-03 00:15:29 +00:00
theblackcat102	9be4c921cd	[feature] Add OA translated QA	2023-02-01 22:33:37 +00:00
theblackcat102	f8eba68544	[feature] Add mix conversation augmentation	2023-02-01 22:14:11 +00:00
sanagnos	222059b1b2	Merge pull request #991 from LAION-AI/rm-anthropic Add anthropic RLHF dataset & deepspeed support for reward model	2023-01-29 13:41:55 +01:00
theblackcat102	0e024e3955	[fix] Add working A100 config for deberta-xxlarge (deepspeed stuck during evaluation, deadlock?)	2023-01-29 03:17:11 +00:00
theblackcat102	fdcc629678	[feature] add reddit eli5, asks, askh; bug fix	2023-01-29 03:05:56 +00:00
theblackcat102	def03d75d2	[fix] Trim anthropic dataset down to last 2 convo only	2023-01-28 03:44:38 +00:00
theblackcat102	f43435efc9	[feature] add deepspeed default stage 2 config	2023-01-28 00:56:52 +00:00
theblackcat102	2a2f34391a	[fix] Added support for deepspeed	2023-01-28 00:55:40 +00:00
theblackcat102	3215a7bbf8	[feature] add initial version of anthropic dataset	2023-01-25 05:03:43 +00:00
theblackcat102	b8990d9078	[fix] remove spaces in format_pair	2023-01-23 02:48:47 +00:00
theblackcat102	736f46fb00	[fix] prosocial dialogue format error	2023-01-22 14:00:20 +00:00

1 2 3

127 Commits