Google's DeepMind Creates Dataset With 300,000 YouTube Clips to Make AI Smarter

Even the most advanced artificial intelligence algorithms in the world have trouble recognizing the actions of Homer Simpson.

DeepMind, the Google-owned artificial intelligence lab best known for defeating the world’s greatest Go players, created a new dataset of YouTube clips to help AI find and learn patterns so it can better recognize human movement. The massive sample set consists of 300,000 video clips and 400 different actions.

“AI systems are now very good at recognizing objects in images, but still have trouble making sense of videos,” a DeepMind spokesperson told IEEE Spectrum. “One of the main reasons for this is that the research community has so far lacked a large, high-quality video dataset.”

According to IEEE Spectrum, early testing of the “Kinetics Human Action Video Dataset” showed mixed results. The deep learning algorithm was up to 80 percent accurate in classifying actions like “playing tennis,” “crawling baby,” “cutting watermelon,” and “bowling.” But its accuracy dropped to 20 percent or less when attempting to identify some of the activities and habits associated with Homer: drinking beer, eating doughnuts, and yawning.

“Video understanding represents a significant challenge for the research community, and we are in the very early stages with this,” a DeepMind spokesperson said in a statement. “Any real-world applications are still a really long way off, but you can see potential in areas such as medicine, for example, aiding the diagnosis of heart problems in echocardiograms.”

DeepMind got some help from Amazon’s Mechanical Turk, a crowdsourcing service that companies can use to enlist other humans in completing a task. In this case, the task was labeling actions in thousands of 10-second YouTube clips.

After discovering the effectiveness of its dataset, the U.K.-based company ran tests to see if it had any gender imbalance. Past tests showed that the contents of certain datasets resulted in AI that was unsuccessful recognizing certain ethnics groups. Preliminary results showed this particular set of video clips did not present those problems. In fact, DeepMind found that no single gender dominated within 340 of 400 action classes. The actions that did not pass the test included shaving a beard, cheerleading, and playing basketball.

“We found little evidence that the resulting classifiers demonstrate bias along sensitive axes, such as across gender,” researchers at DeepMind wrote in a paper.

The company will now work with outside researchers to grow its dataset and continue to develop AI so it can better recognize what is going on in videos. The research could lead to uses ranging from suggesting relevant YouTube video to users to diagnosing heart problems.

We have reached out to DeepMind to learn more about why Homer Simpson is causing such problems.

Update June 9, 5pm: A DeepMind spokesperson clarified that dataset didn’t actually include videos of The Simpsons character—just actions he’s widely associated with. D’oh! We’ve updated our article accordingly.

H/T IEEE Spectrum

Dataset created to help AI see action in video hits unlikely roadblock

Machine-learning algorithms were reportedly only able to recognize 20 percent of Homer Simpson’s actions.

Phillip Tracy