Home > News content

Google AI "watched" tens of thousands of movies in order to see through human behavior one day

via:博客园     time:2017/10/23 19:28:21     readed:1161


Last week, AlphaGo "the ultimate edition" AlphaGo Zero once again surprised the world, self taught 3 days to win Li Shi, 40 days to top "world go champion".

But that doesn't mean that artificial intelligence has the power to replace humans:

Robots can easily do what humans can do over five years old, but learning to walk can't do it.

In short, it's still a tough question for AI to recognize human behavior, and a four month old baby can recognize all kinds of facial expressions.

Google is to make their own AI to overcome this problem, Google recently released a new collective human action database AVA (atomic visual actions), can be accurate to label the video in multiplayer action, and the object of study is the massive video from Youtube.


Graph self:Youtube)

according toGoogle Research BlogIntroduction, AVA analysis of the sample is mainly in the Youtube video video. Google to collect the contents of a large number of long sequence different from the video, and video capture from every 15 minutes, and 15 minutes from these again divided into 300 non overlapping 3 second segments, while allowing the action order and time sequence consistency in sampling.


(3 seconds snippet bounding box annotation example, only one bounding box is shown in the example)

Next you will need to manually label each 3 seconds each fragment of intermediate frames bounding box, from 80 atomic actions (atomic action) in the selection of appropriate tags (including walking, shake hands, hug and so on) for these characters marked behavior.







Google divides these behaviors into three groups, namely gesture / movement, character interaction and human interaction. At present, AVA has analyzed 570000 video clips, marked 96000 human actions, and generated 21000 action tags.


In the data analysis of AVA, each character action of each video clip is identified, and it is also found that human behavior is not a single one.

Statistics at least with two action tag data, you can analyze the frequency of human different movements together, in the AVA document called co-occurrence pattern mode.

From AVA's data, fights and martial arts, kissing and hugging, singing and playing musical instruments are common patterns of CO occurrence.

At the same time in order to human behavior as much as possible to cover a wider range, the analysis of the film or AVA series, with different countries and types of films, which may also be in order to avoid gender discrimination and racial discrimination. As early as 2015, Google Photos has been criticized for mistaking two black people as "Gorilla".


(graph: Twitter)

Google will also open the database, the ultimate goal is to improve the AI system "social visual intelligence", so as to understand what human beings are doing, and even predict what the human next step to do.

Of course, far from this goal, as Google Software Engineer Chunhui Gu and David Ross are inThe introduction of AVA wrote:

Church machines to identify human behavior in video is a fundamental problem in the development of computer vision, but it is essential for the application of personal video search and discovery, sports analysis and gesture interface.

Although the past few years have made an exciting breakthrough in image classification and searching for objects, identifying human behavior remains a great challenge.

Although kija called "AlphaGo for the progress of human self too much", there are 80 billion neurons but the brain, 100 trillion connections, cognitive level to reach the brain AI neural network is not easy.

At present, the development of computer vision technology is mainly concentrated in the field of static image.


Google began using deep learning algorithms instead of manual recognition since 2006, and Google Photos has now been able to identify cats and dogs and automatically classify them.


(from a to B is the whole process of Google Creatism system from street view to final work)

Google's Artificial Intelligence LaboratoryDeepMind is using artificial intelligence to make Google street view into professional photographyIt's even better than professional photographers.


Graph self:Motherboard)

And the Face ID technology on iPhone X may make facial recognition technology become more popular on smartphones. Even the world's largest porn site, Pornhub,It also announced the introduction of artificial intelligence technology to detect the content and performers of adult films on the websiteLet the AI algorithm classify the content of the adult film and the actors.


Graph self:The New Yorker)

In contrast, computer identification of human dynamic behavior is much more difficult. Recently, in the new issue of "New York guest", the cover of social media in the United States began to burn, cover articleThe dark factory: welcome to the future Dark FactoryMore and more human work is being replaced by robots.

Although more and more robots can do, but in this paper can also see robots look simple work for many still incapable of action, such as opening a box and untie a knot, Winnie robot laboratory Brown University machine only recently learned to pick the petals.


Graph self:The New Yorker)

The Google, the collective database of human action databases, is now the most direct role of AVA, which is likely to help its Youtube handle and audit a large number of videos uploaded every day, as well as better serve advertisers.

In the past Google was because of the inability of the video content and precise identification suffered. "Line"A magazine article has revealed that the Google automatic placement system in the video, some ads on the side of hatred and terrorism propaganda video, has let WAL-MART and Pepsi and other large customers to give up Google advertising platform.


For 90% of the revenue from the Google advertising business, the problem is of course not neglect, mainly by hiring a large group of Google after temporary workers to monitor and mark all kinds of video content, and as the training data of AI.

In addition to the high cost of human resources, there is a view that the unstable working status of these temporary workers and the lack of communication with Google will affect the accuracy of AI identification.

Thus, if the Google AI learning ability is strong enough, then in the near future, these temporary workers will be all unemployed, and the future of this technology application is certainly not limited to this.

As AI becomes more and more aware of human beings, discussions about the ethics of artificial intelligence may become more intense.

Drawings and pictures from the part:Google Research Blog

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments