Opendomain nameQ&A is a benchmark task in Natural Language Understanding (NLU) that mimics how people look for information and find answers to questions by reading and understanding them. For example, the question expressed in natural language ("Why is the sky blue?"), the QA system should be able to read the web page (such as this Wikipedia page)Diffuse sky radiation) and return the correct answer, although the answer is a bit complicated and lengthy.
However, there are currently not a large number of publicly available natural generation problems (ie, questions posed by those seeking information) and answers that can be used to train and evaluate QA models. The reason is to bring together high-quality data sets for question and answer, requiring a large number of actual problem sources and a large amount of manpower to find answers to questions.
The natural question and answer data set released by Google, Natural Questions, can be said to fill the gap in this part of the data. To put it simply, Google collects real-world query queries from its own search engine, and works with Wikipedia to provide training data sets for the Q&A system. In the process, annotators who perform anonymous queries need to read the entire Wikipedia page to find answers and provide two answers to the answers, including long answers covering all the information and short short answers.
Currently,The data set300,000 naturally generated questions and answers were collected, with a 90% quality accuracy. In addition, Natural Questions includes 16,000 examples, and the answers to each question are provided by five different annotators. According to Google, this information can be used to evaluate the performance of the Q&A system.