How Facebook's AI will identify and label individuals in photos and video

Facebook knows what you look like. It also knows what your friends look like, and can identify all your faces in a group photo and automatically send them to friends right from your camera roll.

That barely scratches the surface of what Facebook is capable of thanks to its artificial intelligence technologies. The company is training computers to do all sorts of things, like translate captions containing slang and shorthand and let visually-impaired individuals “see” photos, and even recognize people in video and the timestamps at which they appear.

At F8, Facebook’s annual developer conference, Joaquin Quiñonero Candela, director of Applied Machine Learning at Facebook took the stage to discuss some of the new AI technologies Facebook is developing that will improve search, translation, and image recognition.

He positioned it all as exciting developments in AI that improve the Facebook experience, but it was hard to miss the creep factor associated with Facebook’s capacity to recognize the billion people who use its service.

AI experiments

Facebook is working on improving translation, programming computers to recognize more than just standard language programmed via an appliance manual or other training that doesn’t include a corpus of language that’s constantly changing and evolving. Because Facebook users around the world speak in slang, abbreviated language, and emoji, the translation technology must also be trained to recognize these differences.

“Facebook is all about human-to-human language,” Candela said, meaning that it’s not always textbook conversation people have on the platform. It’s working on technology to translate almost everything.

“Imagine the power this could give you to search through tens of millions of videos and find specifically the ones that contain the people you’re interested in seeing.”

The company is also developing tech that will let you search images by identifying things within the photo. Say you were looking for a photo from New Year’s Eve 2012, but didn’t want to slog through your archive to find it. Facebook will let you search for something like “fireworks,” and the image recognition technology would identify all your photos that had fireworks in them.

The tech is able to recognize individual bits of your photo, mapping things like trees, snow, people, dogs, lakes, buildings, or anything else that might appear in a photo. Facebook demonstrated how this technology can change the way people interact with Facebook when it launched automatic alt-text earlier this month.

Now, users with visual impairments can listen to Facebook automatically describe what’s in an image—”Three people standing in the snow.”

This is possible because of something called image segmentation, technology that identifies images on the pixel level. It can break down an image into different segments—person, tree, dog, beach—and understand where the sky ends and the ocean begins.

Through image segmentation, Facebook is building something called “talking pictures,” making it possible to “see” pictures by touching them. When a person touches a photo in a place a computer has identified as “human,” it will say, “Human 1.” This continues with other individual entities within a photograph.

Video is also getting amplified through AI. Facebook will be able to categorize videos based on subjects within them, and yes, that includes your face, too.

This type of tech may eventually be used when people are searching for Live video on a particular topic or location. Even if the streamer doesn’t name a video in a particular way, Facebook will be able to recognize information about it automatically.

While on stage, Candela showed off the possibilities of image recognition in video. In one demonstration, two Facebook engineered filmed themselves and Facebook recognized them immediately. It also recognized CEO Mark Zuckerberg and CTO Mike Schroepfer when the two of them walked by. Then, once the video was shot, the timestamps where people appear were also tagged, to.

Putting all of this together paints a somewhat creepy reality: Eventually, Facebook’s AI will have the power to search through videos and identify every individual within them.

“Imagine the power this could give you to search through tens of millions of videos and find specifically the ones that contain the people you’re interested in seeing,” Candela said. “And jump straight to the place where they appear.”

Yes, imagine the power indeed.

How Facebook’s AI will identify and label individuals in photos and video

This is sounding more and more like ‘Minority Report’

AI experiments

Selena Larson