On March 18th, the first "AI NEXT & rdquo" conference was held in Seattle, hosted by the Association of Technology and Innovation (ATI). The main guests of this meeting include: Microsoft's chief AI scientist Deng Li, Microsoft academician Huang Xuedong, Uber depth of learning Luming Wang and so on. Chinese, there are Amazon Alexa chief scientist Nikko Strom, Microsoft Xiaona architect Savas Parastatidis and other industry experts.
The theme of the conference is to explore the potential of AI, the AI technology applied to practical projects and services, on the CV, NLP, intelligent assistant, depth learning framework have done a special report. Microsoft chief voice scientist Huang Xuedong also delivered a speech on Microsoft's artificial intelligence business and progress made a comprehensive introduction and combing.
Lei Feng network according to speech recording and PPT finishing.
Huang Xu Dong joined Microsoft in 1993, leading Microsoft in the United States, Germany, Egypt, Israel team R & D Microsoft enterprise artificial intelligence customer service dialogue solutions, cris.ai and luis.ai and other cognitive services, CNTK open source depth learning tools and other artificial Intelligent products and technologies. In February 2017, Huang Xuedong has just been named "Microsoft Global Technology Fellows", which represents the highest honor of Microsoft technical staff.
Microsoft's AI business profile
In fact, the term "Artificial Intelligence" was originally proposed at the 1956 DARTMOUTH Institute, but why is it that artificial intelligence technology is now in the path of rapid development? Huang Xuedong that is mainly two reasons: massive data, as well as a substantial increase in computing power. At present, artificial intelligence is mainly concentrated in the visual, voice, language and knowledge (map) four areas, but the future of the computer will be able to understand the world.
According to Huang Xuedong introduction, the current Microsoft AI business can be divided into agents (Agent), application (Application), service (Service), infrastructure (Infrastructure) the four blocks.
Agent: Cortana (Chinese name: Microsoft Xiaona), Microsoft Little Ice, Toronto Project (customer service assistant)
Applications: Office 365, Dynamicas 365 (integration of CRM and ERP cloud services solutions), SwiftKey (input method), Pix (camera software)
Services: ChatFramework, Cognitive Services, Cortana Intelligence, Cognitive Toolkit,
Infrastructure: Azure for Azure, Azure N Series (GPU + FPGA), FGPA (Field Programmable Gate Array)
The History of Computer Language Technology
In 1954, IBM and Georgetown University for the first time demonstrated a very limited machine translation system.
In 1966, John Pierce published a very critical report on & ldquo; language technology & rdquo;
1975, the US government did not have any funding to support machine translation or voice recognition, development stalled.
1985, the "common task" (common task) & rdquo; method of the emergence of the researchers began to share data.
2007, based on the statistics of Google's translation on the line.
2011, Siri landing iPhone.
In 2016, Microsoft's voice recognition system reached the human level, the same year Google released support for 8 languages of the neural network translation system.
Although the current neural network translation system often make a variety of jokes, but Huang Xuedong that in the next few years, the level of translation of the computer may be the same as the current voice recognition, to achieve the level of human (experts).
Microsoft's achievements in speech recognition
In his speech, Huang suggested that Research Lab was established in 1991. The vision of the lab is to allow the computer to have the ability to see, listen, and ralph lauren pas cher. In 1993, Microsoft set up a voice group (Speech Group), hoping to make people and equipment between the voice of the mainstream. And now, this vision is being realized. "This article argues that voice technology makes the computer less daunting and easier to access," he said in a lecture on "The Economist" in January.
Huang Xuedong said that in 1993, they do voice dialogue to identify the word error rate (word error rate, referred to as WER) up to 80%. However, on September 14, 2016, Microsoft's voice team led by Huang Xuedong in the industry standard Switchboard speech recognition benchmark test, to achieve the word rate as low as 6.3% of this technological breakthrough, which is worse than IBM's 6.6% word rate , To achieve the field of speech recognition the lowest level of error rate. Just one month after the October 18, Huang Xuedong team to further reduce the word rate to 5.9%, the first time with the professional stenographer flat.
This human-level dialogue speech recognition system uses 10 different DNN (depth neural networks). According to Lei Feng net (public: Lei Feng network) to understand that the specific implementation process is: first with ResNet (residual network), LSTM (long and short term memory network), including six different neural network combination of parallel work, its The results are then combined by four new neural networks and then output to the professional stenographer level.
But Huang Xuedong said that the current computer recognition of the voice is still only stay in the transcription phase, want to really understand the semantics is also very difficult.
Microsoft Customer Service Assistant Toronto
In addition to being able to be used for recreational purposes, voice technology can do professional things, such as technical support. It mentioned in the speech, Microsoft in addition to voice assistant Cortana and chat robot small ice, there is a code-named "Toronto" ralph lauren pas cher, customer service assistant project.
Toronto is based on the depth of intensive learning of artificial intelligence, to understand the context of dialogue, so that customer service chat robot more humane, more efficient.
According to the introduction of PPT, Toronto can not only automatically reply, give advice, but also in the solution can not prompt the user to transfer manual services. In addition, it can help manual customer service to quickly understand the user information, give answers to suggestions, and can be transferred to other staff, and even recording function.
Of course, Huang Xuedong also said that these chat assistants and voice recognition is not the same, and there is no established an effective training method (established recipe).
Microsoft's progress in depth learning
Huang Xuedong previously said that Microsoft's deep learning toolkit CNTK in fact than Google's TensorFlow open source early, but because the beginning is not published in the GitHub, so the outside world know that relatively few people. But according to the benchmark test results, CNTK is better than Google's TensorFlow and Amazon's MxNet: under the same conditions, CNTK can handle more samples per second.
In addition, Huang Xuedong also listed ComputerWorld evaluation results in February this year to prove that Microsoft in the depth of learning strength. As you can see from the figure, the Cognitive Toolkit v2.0 beta 1 is outstanding in terms of performance, ease of development, and ease of deployment. Google's TensorFlow r0.10.
Advantages of Microsoft Cognitive Toolkit
At present, Microsoft's cognitive service API mainly includes language, voice, machine learning, visual, search, knowledge of these categories. According to Huang Xuedong introduction, Microsoft's cognitive toolkit has the following three major advantages:
Speed & amp; Scalability: Microsoft Cognitive Toolkit training and evaluation of depth learning algorithms faster than other toolkits, and can be effectively extended in different environments while maintaining accuracy.
Commercial-grade quality: the use of complex algorithms and a large number of data sets.
Compatibility: You can use C ++, Python and other languages, and can customize the built-in training algorithm, or even use their own algorithms.