mirror of https://github.com/wassname/Open-Assistant.git synced 2026-06-27 16:10:30 +08:00

Files

T

Andrew Maguire b60eb1e1ae minimal fastapi prom metrics (#1426 )

* minimal fastapi prom metrics

2023-02-10 14:37:43 +01:00

main.py

minimal fastapi prom metrics (#1426 )

2023-02-10 14:37:43 +01:00

README.md

Initial implementation of the inference system (#869 )

2023-01-21 22:38:18 +01:00

requirements.txt

minimal fastapi prom metrics (#1426 )

2023-02-10 14:37:43 +01:00

README.md

OpenAssistant Inference Server

Workers communicate with the /work endpoint via Websocket. They provide their configuration and if a task is available, the server returns it. The worker then performs the task and returns the result in a streaming fashion to the server, also via websocket.

Clients first call /chat to make a new chat, then add to that via /chat/<id>/message. The response is a SSE event source, which will send tokens as they are available.