Deepfakes: How the Controversial Tech Gets Made

Altered videos known as deepfakes continue to make headlines. But how does a deepfake video actually get made? While it is widely known that the technology uses artificial intelligence, the process of creating a deepfake—videos in which an individual’s face is superimposed onto another’s—is becoming streamlined online.

Deepfakes have stirred concern across many areas of society—and for good reason: Fear of the technology being used by a foreign adversary to interfere in an election has governments on edge, while everyday people have faced harassment after their own likenesses were used in compromising deepfakes.

The technology isn’t inherently good or bad, though. Just as with tools such as Photoshop, it all depends on how they are used. Deepfakes are frequently used by those seeking to create satirical or comedic videos. Deepfakes can also be a helpful aid for educators or filmmakers: A company recently used the technology to give soccer star David Beckham the ability to warn the world about malaria in nine different languages.

But can anyone make a deepfake? Is the technology close to being within the grasp of all computer users or is the learning curve still steep enough to keep the hobby relegated to small online circles? That’s where “derpfakes,” one of the most prominent members of the deepfake community, gave us some clarity.

How a deepfake gets made

In this example, the face of Hollywood actor Robert Downey Jr., sourced from his role as Tony Stark in Ironman, is placed onto the body of fellow actor Shia LaBeouf during his famous “just do it” rant.

The end product looks something like this.

Many deepfake creators say you need a high-end graphics card, time, and patience. Even then, it will almost certainly be of poor quality, as the ability to develop them can take months to master. And even then, a deepfake can only be as good as the quality of its source material.

Derpfakes notes as a general rule that a creator can usually only obtain two of the following three things: quality, speed, and duration. A clip with a longer duration will require much more time from its creator to keep the quality high, whereas a short clip can achieve good quality in much less time.

1) Software

Although several tools for developing deepfakes are available, DeepFaceLab is often touted as the best for new creators as it works on the widely used Windows 10 operating system.

Once downloaded, users must then collect source material for the individuals in their deepfake. Photos can be gathered by using Google Images while high-quality videos can be found on YouTube. Derpfakes says you can pull content from DVDs and Blu-ray discs.

The technology works with celebrities and public figures who have lots of images and videos on the internet. With private citizens, it’s thankfully still difficult to produce deepfakes that would fool onlookers. A futurist concern is that a video of, say, the president ordering nukes could be created in the near future and cause an international incident. The other concern is that technology will eventually evolve and create a world where revenge porn becomes easier to produce via deepfakes.

In May, Samsung announced that its researchers could begin making deepfakes from a single image.

2) Pulling still frames from the videos

To prepare the program to analyze LaBeouf and Downey Jr.’s faces, a user must first extract still frames from the two videos. Still frames, in this case, refer to screenshots, or images, taken throughout both videos.

3) Extracting faces from the still frames

Next, another tool in the program is utilized to detect and extract images of just the faces from the newly generated still frames. This will allow the AI to scan and learn everything it can about the faces of LaBeouf and Downey Jr.

4) The AI’s learning process

With the face images gathered and analyzed, creators will then select what’s known as a training model to teach the program to properly place the face of Downey Jr. onto that of LaBeouf’s. Beginners often choose models that work much faster but at the cost of quality.

READ MORE:

As the model is running, a preview window will appear, showing users a visual representation of the progress.

Knowing when to stop the process, derpfakes says, depends on numerous factors. Judging based on the quality of the images shown in the preview window is a good indicator of how successful the training model has been. Once the images appear to be staying at a consistent level of quality is when most users choose to stop.

6) Merging the faces

After stopping the training program, users then run another tool in order to have the target individual’s face changed.

When finished, still frames of Downey Jr.’s face on the body of LaBeouf will be created.

7) Converting new face frames into video

To complete the deepfake, users must then simply convert the face-swapped frames into video format.

And there it is. Assuming the user has a high-quality graphics card, the entire process can be completed within several hours or one day of work. As with anything, mastering the skill takes time and effort.

“Learn from your mistakes and successes,” derpfakes said. “Experience is the most valuable tool with deepfakes. Don’t be too focused on settings and the numbers—use your eyes to tell you what is working and what isn’t.”

When asked about the future of the technology, derpfakes says he predicts such videos will become increasingly mainstream in their use.

“Moving forward I have no doubt that deepfakes will appear in more and more positive situations, but clearly there is the inevitability that it will be used negatively as well,” derpfakes says. “I suspect it won’t be quite as apocalyptic as some people think, but to ignore or fail to combat nefarious uses could potentially have more serious effects.”

READ MORE:

Got five minutes? We’d love to hear from you. Help shape our journalism and be entered to win an Amazon gift card by filling out our 2019 reader survey.

How a deepfake gets made

It’s still a tall order.

How a deepfake gets made

1) Software

2) Pulling still frames from the videos

3) Extracting faces from the still frames

4) The AI’s learning process

6) Merging the faces

7) Converting new face frames into video

Mikael Thalen