Files
Open-Assistant/inference/server/README.md
T
Yannic Kilcher 1709dc0324 Initial implementation of the inference system (#869)
* very primitive implementation of inference

* re-worked with security in mind

* removed polling from clients

* switched workers to websockets

* implemented back and forth chats
2023-01-21 22:38:18 +01:00

11 lines
467 B
Markdown

# OpenAssistant Inference Server
Workers communicate with the `/work` endpoint via Websocket. They provide their
configuration and if a task is available, the server returns it. The worker then
performs the task and returns the result in a streaming fashion to the server,
also via websocket.
Clients first call `/chat` to make a new chat, then add to that via
`/chat/<id>/message`. The response is a SSE event source, which will send tokens
as they are available.