All in AI Baidu, today (January 17) released in Beijing Baidu AI input method, that is, version 8.0 of Baidu input method. Unexpected, AI is still the highlight, and brought a technological breakthrough, two new features, and more than competing products 20% relative accuracy.
For this release, Baidu deliberately famous host, "China good tongue" Hua Shao invited to the site as a host, under the auspices of China, Baidu vice president, AIG head Wang Haifeng, director of Baidu Speech Technology High Bright and Baidu input method person in charge Cai Yu Ting all debut.
Wang Haifeng recalled the history of human-computer interaction, from the punch paper to the characters and then to the graphical interface, the emergence of smart phones, the touch of interaction. He believes that for smart phones, the analog keyboard is very important, "a cell phone can have no games, maps, social software, but can not do without a keyboard." But then he also said that even if there is blessing for voice input now, in fact, not enough, the future of the input method must be "total sensory input" must be supported by the AI technology.
The Baidu input method for this product, Wang Haifeng Lei Feng network, including the media, said Baidu input method is Baidu AI technology bridgehead, the new AI technology will be first applied to the input method, the future will give more AI capabilities , Enhance human-computer interaction experience.
A technological breakthrough: Deep Peak 2 model
Wang Haifeng, Baidu voice technology director came to power, revealing "a breakthrough in the past six months Baidu voice technology", that is, Deep Peak 2 model:
The full name of the Deep Peak 2 model is LSTM and CTC-based context-free phonemes modeling, which combines high-frequency phonemes together to form a phoneme combination, which is then treated as a basic modeling unit . Compared with the previous context-based modeling methods, the Deep Peak 2 model can give full play to the parameters of the neural network model and has more stability and accuracy for a variety of speaking modes; at the same time, it can result in faster decoding Speed, improve the overall efficiency of voice recognition, the current relative accuracy rate of 20% ahead of the industry.
In addition, highlighting said that this mode of modeling both Chinese and English, so that products have a stronger ability to identify mixed Chinese and English. The relative accuracy of 20% of the industry leading is based on a 1400-box black box test set test results, while the previous version of the previous version of Baidu input method, Deep Peak 2 model on the whole did improve the accuracy of Baidu input method.
Two new features: voice shorthand and AR expression
In support of this breakthrough, the new version of Baidu IME introduced two new features, namely, voice shorthand and AR expression. Prior to this, through the voice input, Baidu Input Act has already possessed a lot of functions, such as pronunciation modification, pronunciation Chinese-English real-time mutual interpreter, speech softly recognition, scene pronunciation recognition , Voice associative expressions, OCR scan input.
The so-called voice shorthand, divided into single, multi-person two modes. In single mode, suitable for taking notes, writing articles, recording inspiration and other scenes, you can continue uninterrupted, and at the same time record audio files for later modifications. The multiplayer mode is suitable for one-on-one interviews, scenes of 2-4 small meetings, and the application of voiceprint recognition, can distinguish between different speakers.
At the scene of the event, the host Huazhao spent 426 seconds in a matter of seconds to read the speeches in both English and Chinese. The input method was done in real time.
The so-called AR expression, based on Baidu's face recognition technology and AR technology, users not only based on the camera or album face recognition, the production of expression packages, but also allows users to control their own avatars avatar. AR expression produced, you can search directly through the input method, voice input and keyboard input is displayed. Cai Yuting explained that Baidu input method not only want to voice input through the microphone, but also want to control more "sensory", to achieve multi-modal input.
At the same time, Baidu input method has also entered into cooperation with China's intangible cultural heritage such as Taohuawu, introducing traditional Chinese folk art such as New Year pictures into the expression, and breaking the barrier of many kinds of classical characters into "live" The best heritage of history and culture.
Lei Feng network learned that, as of now, this product Baidu input method has been online for 8 years, the monthly active reached 400 million, while voice input daily flow reached 250 million, 8.0 Android version has been on the line, iOS version is being reviewed by Apple. Faced with the future of the input method, Cai Yuting believes that Baidu AI input method in the future want to do is to hear, see, understand the user's expression, to enhance the efficiency of user input.
And this will also be, Baidu and other manufacturers of the input method, the biggest difference.