Home > News content

Microsoft language training model deberta exceeds human baseline in superglue reading test (Figure)

via:cnBeta.COM     time:2021/1/7 9:13:46     readed:103

Training networks with millions of parameters have made great progress recently.Microsoft has recently updated the deberta (decoding enhanced bet with discrete attention) model and trained a model consisting of 48 transformer layers with 1.5 billion parameters.

New year's Day promotion for members of JD plus: 148 yuan for double video members, 30 yuan for free

The performance of single deberta model is greatly improved, which makes the macro average score of superglue language processing and understanding surpass the human performance for the first time (89.9 vs 89.8), and surpass the human baseline with a considerable advantage (90.3 vs 89.8).

Superglue benchmark includes a wide range of natural language understanding tasks, including question answering and natural language reasoning. The macro average score of 90.8 of the model is also at the top of the benchmark rank of glue.

Deberta uses three novel techniques to improve the most advanced PLM (such as Bert, Roberta, unilm): a separate attention mechanism, an enhanced mask decoder and a virtual countermeasure training method for fine tuning.

Compared with Google T5 model, which is composed of 11 billion parameters, deberta with 1.5 billion parameters is more energy-efficient in training and maintenance, and easier to compress and deploy to various environment applications.

Deberta's surpassing human performance on superglue marks an important milestone towards universal AI. Although we have achieved gratifying results in superglue, the model is by no means NLU human intelligence.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments