What Are Deepfakes and Why Is It So Hard to Spot Them?

Deepfakes are fake videos, audio clips, and text — generated by AI — that are deceptively real. Here is the latest on the technology.

Ankur A. Patel

May 26, 2022

Key Takeaways

If you have only a couple of minutes to spare, here is what you should know about deepfakes:

Deepfakes are AI-generated media pieces (fake videos, audios, images, or text) that look incredibly realistic and closely imitate a living personality or incident. Because the technology behind them has advanced dramatically, they can often be pretty hard to spot.

The technology behind deepfakes started being developed in the late 1990s. Significant improvements were made in subsequent decades, and the progress has accelerated in recent years.

Deepfake videos use a combination of autoencoders and generative adversarial networks. It is possible to create a deepfake video with just one of these deep learning systems. Both of them rely on studying and detecting the basic features of an image, creating a photo-realistic reconstruction of the image, and a feedback system determining how realistic the results are.

Deepfake audios use advanced voice cloning, text-to-speech or speech-to-speech, and voice synthesizing techniques. Text deepfakes use transformer-based language generators like GPT to create texts mimicking real authors.

Deepfakes have several positive use cases in business and marketing, entertainment, writing, fashion, etc. In these industries, deepfakes are used in multiple ways to make basic processes efficient and drive down expenses.

At the same time, deepfakes pose serious threats. People can use them to create fake revenge porn video clips, unauthorized pornographic images, and videos of celebrities and minors, impersonation and other fraudulent behavior, fake news, and stock manipulation.

Some artificial intelligence companies have started working towards identifying deepfakes and limiting their malicious usage. But there is still a long way to go before they successfully flag deepfakes and prevent toxic agents from creating them.

This post is sponsored by Multimodal, a NYC-based development shop that focuses on building custom natural language processing solutions for product teams using large language models (LLMs).

With Multimodal, you will reduce your time-to-market for introducing NLP in your product. Projects take as little as 3 months from start to finish and cost less than 50% of newly formed NLP teams, without any of the hassle. Contact them to learn more.

Watch this song video by the Pulitzer-winning rapper Kendrick Lamar.

There is nothing weird about the song itself, but the video footage is weird. Apart from Lamar, there is Will Smith, Kanye West, Jessie Smollett, OJ Simpson, Kobe Bryant, and Nipsey Hussle, all rapping to the beats. Yes, Lamar could have gotten some of them to appear in the video and at least lip-sync to the song, but there is no way he brought Kobe Bryant back to life.

This video is not real. It’s fake — or more precisely — a deepfake.

Deepfakes are nothing new — they have been in the picture for decades. If you are a regular social media user, odds are you have encountered or created some simplistic ones of your own.

In this article, we will explore deepfakes in-depth, understand the technology behind them, and learn about the opportunities they present and the threats they pose.

What Are Deepfakes? — All You Should Know

Deepfakes are broadly defined as different types of media (video, audio, or textual) that have been synthesized using advanced artificial intelligence. These synthetic media often do not look fake because of how closely they resemble the actual person or thing.

Watch this deepfake video of Barack Obama:

And this convincing deepfake Tom Cruise talking about meeting Mikhail Gorbachev, playing golf, and showing a magic trick:

Or, listen to the fake audios generated in this one:

These are all pretty good examples of deepfakes. None of them are real, but they certainly look real to the human eye.

Today, there are both free-to-use and paid tools facilitating the creation of deepfakes. Social media platforms like Snapchat have already released filters like "face swap" to the public. At the same time, GitHub has open-source software ready to assist professionals in creating deepfake videos and other manipulated media.

The word deepfake first came into existence in 2017 when a Reddit user named himself "deepfakes." He had used Google's open-source technology to create deepfake porn videos containing celebrity faces instead of real faces. But the groundwork for the kind of deep-learning technology needed for deepfake creation had been laid far before that.

Here is a timeline of the development of deep fakes:

Timeline of Deepfakes' Development — Source

1997

Three academics named Christoph Bregler, Michele Covell, and Malcolm Slaney write a paper highlighting a Video Rewrite program. This program combines three technologies that have already been in development: face interpretation, audio synthesis from text, and lip modeling in 3D.

The early 2000s

Research into computer vision advances, with technologies like motion detection and facial recognition maturing. Gareth J. Edwards, Timothy F. Cootes, and Christopher J. Taylor publish a paper defining a new algorithm called Active Appearance models, which uses statistics to map an individual's face shape.

2010s

Later Timeline of Deepfakes Development — Source

In June 2014, the first generative adversarial network (GAN) is developed.

In 2016, the Technical University of Munich releases the Face2Face Project, and in 2017, the University of Washington releases the Synthesizing Obama Project. Both of these significantly improve the time it takes to synthesize faces and the fidelity of images rendered.

In 2018, Open AI develops GPT, a transformer-based text-generation model capable of creating human-like text. It subsequently releases better versions of the technology (GPT-2 and GPT-3).

Around the same time, a subreddit called r/deepfakes (it has now been deleted) with 90,000 members is replete with pornographic clips containing realistic celebrity faces. u/deepfakes, the Reddit user credited with the invention of the term "deepfake," uses the Python library Keras and TensorFlow to create these pornographic deepfakes.

Since then, the deepfake technologies have evolved with contributions from multiple independent researchers, universities, AI research labs, and companies worldwide. Before we dive deeper into these, let's understand the basic AI that goes into making a deepfake.

How Are Deepfakes Created?

The workings of different types of deepfakes vary. Let’s explore each of them individually:

Deepfake Videos

Video deepfakes are created using two types of AI: an autoencoder and a Generative Adversarial Network. Here is what both of them mean:

Autoencoders

Autoencoders are neural networks capable of face-swapping, which is essential for creating deepfakes. Autoencoders contain two parts: an encoder and a decoder. For swapping faces, two autoencoders are used.

Different autoencoders are trained differently. The latent face might be different for two separate encoders. To make sure face-swapping works perfectly, using the same encoder in two different autoencoders is crucial. The decoders can be different, but since they will be working on similar latent faces, their reconstructed results will not be incompatible with each other.

How Autoencoders do Face Swapping — (Source) The encoder will create two latent faces; let's call them Latent Face A and Latent Face B. Decoder B will receive latent face A, and decoder A will receive latent face B. Finally, decoder A will reconstruct face A with information about face B and vice versa.

Try to understand it this way:

Imagine that you just went to the thrift store and bought two dresses, each with a top portion and a skirt. Let’s say one of these is yellow while the other is red. You like the top part of the yellow one but the skirt of the red.
You take the dresses home and study their seams carefully. You understand where the top goes and where the skirt lies. You then form an outline either in your head or on paper. Then, you rip the dresses apart from the seams to separate the tops and the skirts. At this stage, you are doing the job of the encoder.
After this, you take the yellow top and the red skirt, align them to make a sensible dress, and sew them together. In this case, you have worked like a decoder performing face-swapping.
Now, what if you had given the yellow dress to your friend and asked her to rip it at the seams and give you the part that she likes and kept the red one yourself, doing the same thing with it? She could have liked the skirt of the yellow, and we have already established that you like the skirt of the red. You are both encoders here.
When she gives the ripped skirt back to you, you would have two skirts with no top, making it impossible for you to sew a dress. That is essentially what happens if we use two different encoders and decoders for face swapping.

Generative Adversarial Networks

The autoencoder technology behind deepfakes does a great job of swapping faces. However, some of its results are still awkward, with images having unrealistic placement or finish. That is where generative adversarial networks or GANs come in to improve deepfakes and make them hard to distinguish from reality.

Created in 2014 by Ian Goodfellow, GANs are responsible for making the deepfakes more realistic. Here's how they work:

How generative adversarial networks work — (Source) GANs pit a deep-learning system called the discriminator against the decoder (which in this case is called the generator because it is responsible for generating the swapped images). Both systems compete against each other. The discriminator’s task is to determine whether or not an image produced by the generator is real. If it recognizes an image as fake, the generator has to make the image more realistic. This is essentially a zero-sum game where both machines constantly compete to improve the results. Eventually, the generator perfects its images and succeeds in creating what we see as astonishingly real-looking deepfakes.

The entire process is like playing two-person hide-and-seek. Imagine yourself in this game with your friend. Your job is to hide from them, and theirs is to find you. Every time they succeed in finding you, you have to find an even better place to hide. This is essentially what happens in a GAN.

Audio Deepfakes

Audio deepfakes mainly rely on voice cloning technology.

Voice cloning requires a powerful computer with cloud capabilities and a deep-learning neural network. Thousands of example voices (clips, recordings, etc.) are fed into this system to teach it what the person whose voice needs to be cloned sounds like. This process trains the voice cloning system.

Once the system learns the voice, it can generate speech in the same voice using simple text-to-speech or speech-to-speech software.

One of the most famous examples of audio deepfakes occurred when a fraudster stole £200,000 from a German energy firm. Using the cloned voice of the firm's CEO, the fraudster tricked the firm in transfering the amount into a Hungarian bank account.

Text Deepfakes

As mentioned earlier, technologies like GPT can easily be used to generate text deepfakes.

Such transformer-based language generators are trained using vast datasets containing text examples from encyclopedias, films, television shows, books, journals, etc. They learn what human text sounds like. Then, using this knowledge and examples of a few written pieces by the author whose deepfake is being created, these models generate very realistic text imitating the author's style.

GPT has been used multiple times to write poems in the styles of Percy Bysshe Shelley, Robert Frost, or plays like Shakespeare. And it has done so with great accuracy. It can also mimic journalists, politicians, and celebrities, thereby creating deepfakes.

Understanding Deepfakes: The Good and the Bad

The idea of fake media pieces does inspire fear and speculation. However, deepfake technology has diverse applications in various industries. Let's explore the opportunities and threats of deepfakes in more detail.

Opportunities and Applications

Entertainment Industry

Deepfake technology has widespread use in the entertainment industry. It can be employed in improving video quality, generating voices of actors and artists who have vocal problems or have lost their voices altogether, dubbing, or sound correction.

Also, because a deepfake makes even the lip movements of the subject realistic, it can be an excellent way to lip-sync to censored/changed dialogues before the final cut.

See this public health campaign by David Beckham. Even actors who cannot speak multiple languages can appear in translated films with their lips syncing perfectly to the words just by using good deepfake technology.

Business and Marketing

The deepfake technology can revolutionize marketing in more ways than one.

A prominent example of this comes from a company called Windsor.io. Windsor helps with cold outreach by helping brands create multiple personalized, partially deepfake videos to address customers without having to record more than one video. It can change customers' names in each video or add personal sentences to tailor it to a specific individual.

Such tools significantly reduce the time and costs it takes marketers to reach out to targeted customers while still providing the same level of personalization and effectiveness.

Deepfakes can also enable business owners to create celebrity/influencer endorsements in various languages and for different products without asking for the time or commitment from the endorsers. This can widen reach and dramatically reduce marketing expenses.

Beside these, corporate training videos and employee outreach target video can also be made using this technology.

Fashion and Beauty

Deepfakes have a significant application in the fashion industry. Case in point: the Reface AI app. Using it, Gucci recently launched a virtual try-on feature for its sneakers. Zalando, another fashion brand, used Cara Delevigne's deepfakes to create a very successful ad-campaign micro-targeting its audience.

The beauty industry relies heavily on influencers who might not be great performers in TV or video ads. With deepfakes, that concern can be eliminated as long as the influencers license their images and video shots to the companies they sign up with.

Writing and Publishing

GPT is already heavily used in content writing, journalism, and even creative writing. Publishers can use its texts to release sequels of novels/stories written by famous authors even after they stop writing.

Of course, because GPT has not perfected text generation, it would still need significant human intervention to create readable imitations of famous authors’ works.

Apart from these, voice synthesizing, cloning, and deepfakes have been used to help people who have lost their voices due to some illness or injury. One of the most notable examples remains famous film critic Roger Ebert. After he lost his voice due to cancer, an Urbana-based company gave it back to him using advanced cloning and synthesizing technology.

Threats and Disadvantages

Pornographic Use

One of the most pressing threats of deepfakes remains their use in creating unauthorized pornographic clips and images. This is a technology that gained momentum through fake celebrity porn on Reddit. And it is still used pretty frequently for similar purposes.

An AI firm called Deeptrace recently tracked all deepfakes present online. Out of the 15,000 it found, 96% were pornographic.

With deepfakes, creating pornographic videos of underage celebrities is incredibly easy. Moreover, the technology can even be used to defame established public personalities. It can also provide harmful agents with greater leeway to produce revenge porn.

Stock Manipulation

This is not a very obvious threat but is indeed concerning. Many business owners, CEOs, board members, and important people associated with businesses are public figures. Their public actions and opinions influence their brands' stock prices.

For example, Jeff Bezos's public divorce from his wife influenced Amazon stock prices. Similarly, Elon Musk's video smoking a joint made the Tesla stock crash. While both of these were real, with the deepfake technology so quickly advancing, it is not hard for malicious agents to create fake videos like these.

If someone produces such a video, they can temporarily drive stock prices up or down, which isn’t just unfair and unethical but also illegal.

Impersonation and Fraud

We’ve already discussed an example above where a German company was scammed for millions of dollars. Fraudulent behavior like this is undoubtedly supported by deepfakes.

Impersonators can fool financial institutions, businesses, and even individuals to extract sensitive data or property. Fake memos generated by AI language generators can cause organizations to make harmful decisions before anyone has had a chance to examine the authenticity of the said document.

Is There a Solution to the Problems Deepfakes Pose?

Several firms focusing on ethical AI usage are now developing models and systems that flag deepfakes on various platforms and apps.

They can use discriminator models that have been trained on a large dataset of footage and recordings to determine a fake video. Some even focus on the nature and background information of videos, audio, and texts to determine if AI could have produced them.

Another way to go about spotting deepfakes is to maintain a blockchain ledger of all the videos produced online.

Deepfakes are not new, but they have matured to the point where they can be used in commercial applications. It will be fascinating to see just how much the line between real and fake will blur. Deepfakes may become so real that you truly cannot tell. Expect the debate around deepfakes to become more intense as the technology becomes more broadly available.

Subscribe to get full access to the newsletter and website. Never miss an update on major trends in AI and startups.

Here is a bit more about my experience in this space and the two books I’ve written on unsupervised learning and natural language processing.

You can also follow me on Twitter.

Ankur’s Newsletter

Discussion about this post