Search Based QA Research Report (#213)

* search based qa report
2026-06-27 16:10:30 +08:00 · 2023-01-01 15:08:53 -05:00
parent 29c6491d48
commit 4a754cd63d
1 changed files with 92 additions and 0 deletions
@@ -0,0 +1,92 @@
+# Cohere Grounded QA
+
+[Cohere AI created a question-answering chatbot](https://github.com/cohere-ai/sandbox-grounded-qa) that can
+
+1. Understand questions in the context of a conversation
+2. Search the internet for related information
+3. Identify which information in the search results is relevant to the question
+4. Synthesize the information into an answer to the question
+
+## Cohere API
+
+[Cohere's generate function](https://docs.cohere.ai/reference/generate): Continues a text prompt using either the `medium` or `xlarge` model.
+
+[Cohere's embed function](https://docs.cohere.ai/reference/embed): Embedgs a list of strings using either the `small` or `large` model. Alternatively, you can specify the ID of a custom model and use that instead.
+
+## Grounded QA System
+
+Cohere's Grounded QA system makes 4 calls to the Cohere API:
+
+1. Get contextualized question as a query to Google ([code](https://github.com/cohere-ai/sandbox-grounded-qa/blob/main/qa/model.py))
+
+   - Input: Chat History
+   - Output: Contextualized Question
+   - API Call: `cohere.generate`
+   - Model: `xlarge`
+   - [Prompt](https://github.com/cohere-ai/sandbox-grounded-qa/blob/main/qa/prompt_data/get_contextual_search_query.prompt): Nine few-shot examples of (Chat History, Contextualized Question) pairs followed by the current chat history and the prompt "question: "
+
+2. Generate sample answer to compare with search results ([code](https://github.com/cohere-ai/sandbox-grounded-qa/blob/main/qa/model.py))
+
+   - Input: Contextualized Question
+   - Output: Sample Answer
+   - API Call: `cohere.generate`
+   - Model: `xlarge`
+   - [Prompt](https://github.com/cohere-ai/sandbox-grounded-qa/blob/main/qa/prompt_data/get_sample_answer.prompt): Some task instructions followed by 12 few-shot examples of (Contextualized Question, Sample Answer) pairs followed by the current contextualized question and the prompt "answer: "
+
+3. Get embeddings to rank search results by cosine similarity to sample answer ([code](https://github.com/cohere-ai/sandbox-grounded-qa/blob/main/qa/search.py))
+
+   - Input: Sample Answer, Search Results
+   - Output: Embeddings of sample answer and all search result documents
+   - API Call: `cohere.embed`
+   - Model: `multilingual-22-12`
+
+4. Condition on the top 2 most similar search results and answer the question ([code](https://github.com/cohere-ai/sandbox-grounded-qa/blob/main/qa/answer.py))
+   - Input: Top 2 Search Results, Contextualized Question
+   - Output: Answer
+   - API Call: `cohere.generate`
+   - Model: `xlarge`
+   - [Prompt](https://github.com/cohere-ai/sandbox-grounded-qa/blob/43f3e9710112dcc8c92652ac1326ed9330823ddf/qa/answer.py#L25): Task instructions followed by the context and question.
+
+## Models
+
+Cohere's model documentation is pretty sparse
+
+### [xlarge](https://docs.cohere.ai/docs/generation-card#model-description)
+
+- Training Data: [`coheretext-filtered` dataset](https://docs.cohere.ai/docs/data-statement)
+  - 200GB of filtered text (3TB unfiltered) from the Google Books dataset, CommonCrawl, and text scraped by Cohere
+  - English documents only
+  - Filtered "harmful, biased, or otherwise undesirable documents"
+- Model architecture: Generative Pretrained Transformer
+- Model Performance:
+  - Hellaswag Accuracy, Zero-Shot: 0.805
+  - PIQA Likelihood, Zero-Shot: 0.824
+  - Cohere also reported [safety benchmarks](https://docs.cohere.ai/docs/generation-card#safety-benchmarks)
+
+### [multilingual-22-12](https://docs.cohere.ai/docs/multilingual-language-models)
+
+- Multilingual model was trained using dot product calculations
+- Model Performance:
+  - Clustering: 51.0
+  - Search-English: 55.8
+  - Search-Multilingual: 51.4
+  - Cross-lingual Classification: 64.6
+  - Cohere's multilingual model outperformed: Sentence-transformers: `paraphrase-multilingual-mpnet-base-v2`, Google: `LaBSE`, Google: `Universal Sentence Encoder` in all the above categories according to Cohere.
+
+## OpenAssistant for Grounded QA
+
+OpenAssistant may fulfill a similar role as the `xlarge` Cohere model in the grounded QA system if it can:
+
+1. Generate a contextualized question from a chat history
+2. Generate a sample answer to compare with search results
+3. Generate an answer conditioned on the top 2 most similar search results
+
+Perhaps these tasks could be work packages and get assigned to human annotators to create examples of the input and output for each task.
+
+OpenAssistant must also be able to identify when it is appropriate to search the internet. The Cohere system assumes every message from the user is a question and searches the internet for an answer. OpenAssistant would also need a way to indicate to an internal system that it "wants" to search the internet.
+
+Perhaps OpenAssistant could prefix every message it sends with a recipient ID. If it wishes to send a command to an internal system, if could prefix the message with something like CMD: whereas if it wants to communicate with the user, it could prefix its message with USR:
+
+This system may allow for flexible communication between OpenAssistant and one or more conversational systems.
+
+Examples of this prefix system would need to be taught to OpenAssistant through training data that contains such syntax. Perhaps such examples could be generated through the work packages system.