Home > News content

Baidu Wang Haifeng: knowledge map is the cornerstone of AI

via:博客园     time:2017/11/16 15:03:15     readed:629

All of you who love AI and those who are concerned about AI, all of you who are engaged in AI, good morning everyone, thank you very much, Ms. Yang Jing gave me this opportunity to share the topic of artificial intelligence. My theme is focused on the specific areas of AI & mdash; that is, knowledge maps.

In our opinion, knowledge is the cornerstone of the very important AI. So today I will share with you all about our work on the knowledge map.

"Science and technology are primary productive forces," I believe all friends know this sentence. Since the first industrial revolution in the 18th century, science and technology have infused enormous natural forces and natural sciences into the production process. As a result, productivity has been greatly enhanced. Productivity has also affected changes in relations of production, which in turn has brought about changes in all aspects of society. The second industrial revolution of the 19th century brought us into the electrical age, and the third industrial revolution of the 20th century brought us into the information age. As these industrial revolutions took place, technology became more and more important to us. Today, we are fortunate to be in the fourth industrial revolution. The core technology is artificial intelligence. We have seen that artificial intelligence has penetrated all walks of life in all aspects that affect our lives. Whether we want to search for information or browse information, navigate through the map, or translate artificial intelligence across a broad range of industries.

We can clearly see that investment in artificial intelligence, industrial scale and the like are rapidly growing both in the world and in China, and the rapid growth in the future can also be expected. Artificial intelligence is very active in all fields and in all directions.

In conclusion, we believe that artificial intelligence is the new productive force and the most important foundation for the enhancement of human productivity in the long run.

As we all know, Baidu started as a search engine. Almost 18 years ago, we started doing search engines. From the day that it began, some artificial intelligence techniques were applied to them, such as natural language processing. Seven or eight years ago, we laid out artificial intelligence in a more comprehensive manner. From the beginning of natural language processing to speech, image, deep learning, machine learning, data mining, etc., today we form a relatively complete artificial intelligence layout.

The underlying layer is a very important part of artificial intelligence, with big data, powerful computing power, and a very powerful algorithm.

And really can simulate people's ability, we split them into two layers: the perception layer and the cognitive layer. We know that everyone perceives the world through eyes, ears, etc. Therefore, we need to do computer vision-related image and video technologies, as well as AR and VR technologies, as well as speech technologies related to human hearing such as Voice recognition and more. It should be said that the perception of not only people, many animals are there, and even some animal hearing stronger than others, and some animal vision is stronger than people. Cognition is peculiar to humans and language is the ability of people to distinguish themselves from other animals. At the same time, knowledge is also an important foundation for continuous improvement. In addition to having knowledge of the objective world, interaction between people, and understanding of people, this is what cognitive-level technology needs to solve.

On this basis, we provide AI open platform. Within Baidu, we support a large number of applications in a platform-based manner. Meanwhile, we also open up our platform to create an AI ecosystem and ultimately provide services to every user and enterprise through product applications.

If we want to search for a picture, using image processing technology is easy to find a similar figure on the Internet, this image processing technology can be done. If the question we want to ask is the nutritional value of white wine, just image processing is not enough, and this requires knowledge. Baidu will use the knowledge behind the answer to such a question. The same is true for the following example: what the voice technology can recognize the song, find the song in the library, and even the cover of the album can be tuned out, but if you want to know who played the song, just speech is not enough , Which requires the support of knowledge and knowledge map.

Therefore, we see that perceptual and cognitive layer technologies appear to be relatively independent, and each technology also has a lot of problems to continue to study and solve. However, when we put them together, especially after giving them knowledge, we Can do more things. So I want to say that knowledge is the ladder of AI progress. Every one of us knows that Gorky's phrase "books are the ladder of human progress," which contains two meanings. On the one hand, people can continuously learn more knowledge and make continuous progress through reading. On the other hand, At the same time, with more knowledge, more capable people can constantly generate new knowledge, more knowledge can settle down and inherit. The higher and higher this step will be, the more people can go high. For artificial intelligence, knowledge is the same. Knowledge-bound artificial intelligence can become more powerful and can do more things. In turn, because of the strong artificial intelligence, it can also help us to get better from the objective world To dig, access and precipitation of knowledge, and these knowledge and artificial intelligence system is a positive cycle, both common progress.

There are many ways we can gather the knowledge of the real world and turn it into a reticular map of knowledge by a variety of algorithms that have so much knowledge and more knowledge than is stored in one's mind and at the same time Powerful network, the cornerstone of artificial intelligence applications.

To give an example, this is from data to information, to knowledge, to the smart "pyramid". For example, we see 95 this figure, we all know that this is a figure, but what does it mean? If I do not give you more information, you only know that it is a number. If I tell you that this is the PM2.5 index for today, then the 95 figure turns out to be a useful piece of information. But if I do not have background knowledge, I do not know PM2.5 is 95 means what, this information is not of great value to me, 95 is good or not, do not know. If this time there is knowledge, I know 95 means that the air quality is probably good, which is already knowledgeable. Further, I can know that this index can be used for normal outdoor activities, but sensitive people should reduce their outings. This is the process from information to knowledge to intelligence.


This is the Baidu knowledge map. The bottom layer, we have to have basic storage, computing and service capabilities. Baidu's knowledge map is excavated from a very large amount of data, including data on the Internet, industry data, including log data, etc., and then mining, normalization and integration. At the same time, one by one in the picture to build the edge, the final formation of common knowledge map, industry knowledge map. On the basis of these huge maps, there will be basic operators to query, label, calculate, reason, predict and so on. Each product will call these operators to access the map, thus completing certain capabilities.


This is still more abstract, for everyone to see a picture. This is a small part of Baidu's vast knowledge map. We just look at a node in the middle, such as "Hip-Hop in China," and we find a lot of facts can be connected to this node, such as its related actors, the type of music, I love this program and so on. After a few times we found that we are going to be very far, the right is related to the Chinese Nobel laureate Tu Yo Yo, the left is also linked to many other people. Knowledge map contains a lot of knowledge, in different applications will play a role. Of course, every node in it is much bigger than what I showed at the moment. If this screen is bigger, it will show more.

Go back to the abstract section and see how big our knowledge map is. Each node in it can be understood as an entity, no matter it is a human being, an object or an entity, there are about hundreds of millions, and there are many edges between the entity and the entity. An entity may have tens, hundreds, Thousands of sides, this is a combination of relations, very much. Each side constitutes a fact, such as "China has hip-hop" Who participated in this performance is a fact, who organized it is a fact. Now there are hundreds of billions of this fact in the knowledge map of Baidu. At the same time, we support the dynamic calculation based on the graph, including dozens of application scenarios. There are hundreds of data streams working at the same time every day. Both support the update at the second level and can be queried at multiple levels.

Here is an example of a common knowledge map. Here's a Baidu encyclopedia of text, through natural language analysis and understanding, you can draw this text into a map. For example, the Milky Way and the sun, the Earth and other connections, there will be many sides, this is extracted from a common map. While the right corresponds to another map extracted from another article. The two maps are similar but different, they have different data sources, especially some common entities, there are thousands of web pages related to it, can extract a lot of knowledge, this time to do the integration of knowledge, Even some data may bring errors, whether it be errors in the original data or errors in the analysis, to be verified, and finally to ensure the quality of the knowledge map.

To cite another example of an industry knowledge map, this is a cellphone traffic package for one operator in the telecommunications industry. Related to the flow package there will be many connections, such as daily traffic, monthly traffic, traffic packets, etc., you can build such a map. At the same time, in addition to its static entities, attributes and relationships, there is business logic for an industry. For example, you call a carrier's customer service phone, want to do traffic package, he will ask you what package, national package or local package and so on. After you choose one of them, check traffic or other services, but also a complete process. This process actually forms part of the industry knowledge map. Combined with the diagram on the left and the right process, we have completed an automated carrier service. Now we hit a carrier customer service call, a certain percentage is in fact and Baidu's smart customer service robot dialogue.

Just talked about some basic map applications, to talk about a certain reasoning color. For example, we asked a few days before Christmas today. This problem is not difficult for people, for the knowledge of the map, this is not a static knowledge, we can not directly answer the question of the existence of the map, but need to figure out what is today, the number of Christmas, Christmas Day is the day to figure out, and then be a simple calculation to get a correct answer.


The more complicated the right, the user's question is, "Zippo can bring the plane". We need to know in the knowledge map Zippo is a lighter, and civil aviation provides lighter is not allowed to take this time to reasoning one step to get the final answer, the plane is prohibited with Zippo.


Baidu has long been doing the knowledge map, the real large-scale on-line in 2014, this year's three years, this curve has been rapid growth, about 160 times longer, indicating that Baidu search applications more and more dependent on the knowledge map .

The traditional search is to search a content, the main search engine gives 10 results page. With the knowledge of the map support, we can give users a more direct answer, and presented in a more friendly way. For example, the first example is to search "Hu Ge", we see the results of illustrations, the common information you need here. The second asked "the weight of the sun", although the webpage can be found, but it is better to give the weight directly. The right example is the user search "Sun Li", in addition to give some information about Sun Li, there will be related people, works and so on. We recommend related movies and TV works, users may be interested, such as "that year blooming moon", at the interface a little you can enter the "blooming moon" page.

Chinese language itself is very broad and profound, with a specific knowledge of the Chinese language map, such as asked "bump concave, stroke order. "I believe everyone will write this word, but not everyone can write to the stroke order? Knowledge map can be written directly to everyone. We are now mostly used pinyin input method and voice input, some words will not read, can not pinyin input. For Chinese, we will disassemble the Chinese characters and describe them in words. For example, if you do not know how to read the word "怼", we can ask, "What is the heart below? "There are students at home friends may be more concerned about this issue, such as to check the good" ldquo; good "word polyphony, or idioms, etc., the knowledge map can be listed directly.

The screen is based on an article published by Neo-Wonju, an analysis of the key words and key entities mentioned in this article. The article is thus labeled. For example, the subject label is "artificial intelligence", the topic label is "deep learning" and so on, plus other labels. We also have our own model for the user, knowing what areas he cares about, what topics he cares about, and the tagging of both, we can recommend the right article to the right user. For example, the user's portrait is "IT elite", "Internet" and so on, the new wisdom of this article may be exactly what the user likes.

Take another NBA example. Well-known players in the NBA, such as James, Curry, etc., have a variety of relationships, not just the relationship with the current team, such as his basic information, height, weight, achievements and so on. Many people compare James and Bryant, the comparison can be seen in the map. Users sometimes ask questions in some products, such as asking what achievements James made, comparing with Bryant and so on. The reason why the machine can answer these questions is because behind this knowledge. So, round by round, the communication between machines and people goes down.


When I was in high school, I was looking at the 1983 version of "Shooting Sculpture" and now I have the latest 2017 version of "Shooting Sculpture." We now know that this video is a new "shooting", its theme song is "Jagged heart." When we asked what a similar video was, we found the 1983 version of "Shooting." The knowledge map will be criss-crossed to correlate various kinds of information, now or in history. Guo Jing, version 1983, played by Huang Rihua, and if he asked other works of Huang Rihua, he could see "Tian Long Ba Bu." If he asks the author of this book, he will find Mr. Jin Yong. A step by step extension, the equivalent of such a huge picture tour, each user is not the same direction of care, in any direction can continue to be extended.

Just talked about some application examples, from the search to the dialogue, to recommend and so on. Although the outbreak of artificial intelligence is largely related to the Internet, the impact of artificial intelligence is much more than that of the Internet industry. It affects all walks of life and goes deep into all aspects of our work and life. The 19 major report also pointed out that the Internet, big data, artificial intelligence technologies and the real economy of deep integration, including industry, agriculture, finance and other fields. In the process of integration, if artificial intelligence wants to have better service for these industries, it is necessary to customize these industries and have knowledge of the industry. At this time, knowledge map of general knowledge also needs to have the knowledge map of the industry, Help these industries to enhance their productivity and help these industries and industries to upgrade.

Finally, I want to summarize. We continuously learn from AI techniques and vast amounts of data, as well as interacting with users, bringing together more and more knowledge that includes not only general knowledge but also industry knowledge to understand the world better so that we Artificial intelligence to enhance our products, enhance each industry, make our life better.

thank you all!

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments