Files
Open-Assistant/inference/server
Andrew Maguire b60eb1e1ae minimal fastapi prom metrics (#1426)
* minimal fastapi prom metrics
2023-02-10 14:37:43 +01:00
..
2023-02-10 14:37:43 +01:00

OpenAssistant Inference Server

Workers communicate with the /work endpoint via Websocket. They provide their configuration and if a task is available, the server returns it. The worker then performs the task and returns the result in a streaming fashion to the server, also via websocket.

Clients first call /chat to make a new chat, then add to that via /chat/<id>/message. The response is a SSE event source, which will send tokens as they are available.