AI Machine Learning & Data Science Research

NVIDIA’s ChatQA Reaches GPT-4 Performance Without Using Data From OpenAI GPT

In a new paper ChatQA: Building GPT-4 Level Conversational QA Models, an NVIDIA research team introduces ChatQA, a suite of conversational question-answering models that achieve GPT-4 level accuracies without relying on synthetic data from OpenAI GPT models.

In recent times, the advancements made by ChatGPT (OpenAI, 2022) and its subsequent iterations have brought about a significant shift in the development of question-answering (QA) models within both the production and research communities. Despite these strides, creating a conversational QA model capable of matching the accuracy of cutting-edge black-box models like GPT-4 remains a formidable challenge for researchers.

Addressing this challenge, in a new paper ChatQA: Building GPT-4 Level Conversational QA Models, an NVIDIA research team introduces ChatQA, a suite of conversational question-answering models that achieve GPT-4 level accuracies without relying on synthetic data from OpenAI GPT models.

The researchers first unveil a two-stage instruction tuning method for ChatQA. In the first stage, the researchers employ supervised fine-tuning (SFT) on a combination of instruction-following and dialog datasets. This initial tuning imparts the model with the capability to effectively follow instructions as a conversational agent. The second stage, known as context-enhanced instruction tuning, is specifically designed to enhance the model’s proficiency in context-aware or retrieval-augmented generation in conversational QA.

Then they introduce a new dataset, HumanAnnotatedConvQA, aimed at significantly enhancing the language model’s ability to integrate user-provided or retrieved context for zero-shot conversational QA tasks.

In their empirical study, the team constructs a variety of ChatQA models based on Llama2-7B, Llama2-13B, Llama2-70B (Touvron et al., 2023), and in-house GPT-8B, GPT-22B models. They conduct a comprehensive analysis across 10 conversational QA datasets. In terms of average score, the ChatQA-70B model (54.14) outperforms both GPT3.5-turbo (50.37) and GPT-4 (53.90) without relying on synthetic data from ChatGPT models.

Additionally, the researchers explore the “unanswerable” scenario, where the desired answer is not present in the provided or retrieved context. In such cases, the language model needs to generate a response like “cannot answer” to prevent misinformation. Notably, ChatQA-70B surpasses GPT-3.5-turbo in handling this scenario, although there remains a slight gap compared to GPT-4 (approximately 3.5%).

The paper ChatQA: Building GPT-4 Level Conversational QA ModelsarXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

2 comments on “NVIDIA’s ChatQA Reaches GPT-4 Performance Without Using Data From OpenAI GPT

  1. Pingback: NVIDIA’s ChatQA Reaches GPT-4 Performance Without Using Data From OpenAI GPT - GPT AI News

  2. In fact, GPT chat is the optimal solution for many tasks. Here https://www.gotresumebuilder.com/career-advice/should-i-use-chatgpt-to-write-my-resume I read about his help in writing a CV. ChatGPT can help you choose the right keywords and phrases that will make your resume stand out from the crowd, as well as compile a list of achievements and skills that highlight your uniqueness and professional qualities.

Leave a Reply

Your email address will not be published. Required fields are marked *