In a new paper to be published next month in the Association for Computing Machinery’s Transactions on Graphics journal, researchers from the University of California-Berkeley were able to train a deep neural network to copy human movements by simply feeding them YouTube videos, paving the way for better mimicry of people.
Humanoid characters on a computer simulation were able to do backflips, handsprings, and cartwheels after learning them from video clips through state-of-the-art techniques in computer vision and reinforcement learning.
Here’s one example.
Xue Bin Peng and Angjoo Kanazawa, two of the artificial intelligence researchers who developed the program, said in a blog that this is a departure from previous techniques which strongly restricted the behaviors which can be produced.
“Therefore, these methods tend to be limited in the types of skills that can be learned, and the resulting motions can look fairly unnatural. More recently, deep learning techniques have demonstrated promising results for visual imitation on domains such as Atari and fairly simple robotics tasks,” the pair said.
According to them, their system works by first predicting the pose of the subject of a video fed in the pose estimation stage. After this, the motion reconstruction stage collates pose predictions into a reference motion and fixes artifacts for smoother movement. Finally, the reference motion is passed to the motion imitation stage, where a character is trained to mimic motion using reinforcement learning.
Researchers were able to teach simulated characters more than 20 different skills like vaulting, jumping jacks, high kicks, pushing a box, dancing from side to side, running, and walking.
“The key is in decomposing the problem into more manageable components, picking the right methods for those components, and integrating them together effectively. However, imitating skills from videos is still an extremely challenging problem, and there are plenty of video clips that we are not yet able to reproduce,” Xue and Kanazawa said.
Unfortunately, one of the actions they cannot properly reproduce yet is the 2012 viral dance hit “Gangnam Style.”
“We still have all of our work ahead of us, and we hope that this work will help inspire future techniques that will enable agents to take advantage of the massive volume of publicly available video data to acquire a truly staggering array of skills,” Xue and Kanazawa said.