In the AI-generation gold rush, OpenAI sent a decisive salvo across the bow of existing video-generation tools like Google’s Imagen, Runway Gen-2 or Meta’s Make-A-Video.
These competing efforts were blurry, low-resolution, plastic in appearance and altogether rudimentary – more sneak peeks into a future of synthetic images than viable products. OpenAI’s Sora is an entirely different beast, taking text prompts to produce photorealistic humans, animals and landscapes. It uses treatments that mimic film grain or cell phone footage and employs professional tracking, dolly and jib movements.
It’s not perfect, but it’s pretty darn close to being indistinguishable from reality.
The results are rather impressive. A woman’s earrings sway naturally with her gait as light realistically reflects off her glasses and rain-soaked Tokyo streets. In another video, several giant wooly mammoths approach, treading through a snowy meadow, their shadows wrapping around them and the environment as expected. Several videos have no sign of the uncanny valley that made synthetic videos of the past scream that something was artificial.
These impressive results are also alarming.
“This is now the worst AI-generated video will ever look. Let that sink in.”
Beyond fears of what this means for creative jobs (as highlighted by 2023’s Hollywood writer and actor strikes) or what it means for our understanding of photos and video, the biggest alarm bell is for what it means for the future of objective truth, disinformation and power.
If you can’t tell what is real (AI-generated videos that look real as well as real videos others claim are fake), nothing is real except what you choose to believe. The last decade has shown us globally the dangers of social media-fueled echo chambers; with selective facts come a selective reality and ultimately, further division and harm to society.
What is real?
When looking at the example above with the wooly mammoths, it’s easy to say that it’s not real. As a viewer, you may recall that wooly mammoths went extinct about 4000 years ago, so you reason this must be an illustration of some sort, AI-generated or not.
(At least until we start cloning wooly mammoths.)
But consider for a moment that such a video was to be packaged and presented as accurate to people unaware that they’ve gone extinct. That’s not as far-fetched as you may think. As the BCC reported last year, AI-generated science YouTube videos targeting children were remarkably effective at convincing kindergarteners that Egypt’s pyramids were electric generators, aliens were real and that NASA was hiding that human activity has played no role in climate change. All of these are false, but that didn’t stop 5-year-olds from believing it and viewing the videos as proof for these claims.
A tool like Sora, which promises to easily and quickly deliver photorealistic humans and real-world environments to anyone, with little to no learning curve, does present a challenge from bad actors seeking to dupe children (and adults), and that should give you pause. It certainly gives me pause.
Deepfakes of the past took some level of skill and computing power to pull off realistically (at least two weeks and $552 in 2019 for a rudimentary one), but with tools like Sora, the threshold has been lowered to anyone with a keyboard and some time and intention.
OpenAI didn’t disclose how long each sample video it created took to make. I’ve seen several claims they can be made in minutes, but based on my experience with static AI image creation, I suspect it’ll be hours or days of fine-tuning and editing to get the ideal results. In posts on X following the announcement of Sora, OpenAI CEO Sam Altman asked for reader prompts and delivered two (a grandma cooking and a fantasy of ocean creatures in a bike parade) within about 90 minutes.
OpenAI has also not shared what video and image sources were used to train Sora or, more pointedly, if copyrighted works were used. The company, which also makes the chatbot ChatGPT and still image creator DALL-E, has been sued with allegations of using copyrighted works to train these previous products.
Regardless, the writing is on the wall. Soon, every Tom, Dick and Harriet will be able to make convincing fake videos. OpenAI seems to have recognized the dangers of AI tools on some level.
A large portion of the announcement was devoted to a safety section with a prominent menu header to acknowledge the risks of misinformation and societal harm. The platform has no public release date yet; it is currently only accessible to a select group of testers who have also been tasked with helping identify and assess risks and potential harms. I hope this level of care is genuine and not lip service.
Wild wild west
At present, there are no regulations on AI-generative tools. The EU’s AI Act may become the first, if passed, and would regulate the industry by limiting corporate and law enforcement use of AI along with a means for the public to file complaints. There are also several efforts in the US and China to regulate the use of AI, but at present, they are patchwork at best.
The only safeguards in place as I write this are self-imposed by the companies working on AI.
OpenAI uses language filters to check and reject text prompts that include items it deems violent, sexual, hateful, or attempts to use copyrighted material or the likeness of celebrities. There are plans to implement C2PA metadata into any public release version of the tool.
C2PA (Coalition for Content Provenance and Authenticity) is an authentication standards effort backed by Adobe, Sony, BBC and others. It brings together the efforts of CAI (Content Authenticity Initiative) and Project Origin to address image provenance and authenticity by setting authoring and metadata standards alongside open-source tools for public education about content authenticity.
“New technology is cool, and acknowledging the risk is great, but taking responsibility for the genie in the bottle before you let it out is the right thing to do.”
By joining this group and adopting the standard, OpenAI seems to acknowledge the need for a paper trail to determine what is authentic and what is synthetic. Until Sora goes public, we won’t know how it’ll be implemented, how the public will be trained to use authentication tools, or, more importantly, the value of using such tools.
However, there is one key thing missing from this acknowledgment. C2PA’s efforts have predominantly targeted journalists, who may be most concerned about media authentication. What do image provenance and authenticity mean to the average Sora user?
Case in point: rage bait. A critical success metric on social media is engagement – how many people interact with your content: a rubric of likes, comments, time spent consuming, shares, saves and follows. In this model, all that matters is these metrics that define engagement, so it doesn’t matter if things are true. The ends justify the means.
New technology is cool, and acknowledging the risk is great, but taking responsibility for the genie in the bottle before you let it out is the right thing to do. We’ve been entrenched in a years-long debate about AI images and whether they are photos, art, copyrightable or useful. We’ve snickered that AI can’t make hands look human or text look legible. But if Sora reminds us of one thing, it’s that technology advances faster than we humans do, and we have a limited window to be proactive before we become reactive to any harm.
This is now the worst AI-generated video will ever look. A year ago we giggled at how AI tools struggled with human bodies and couldn’t render a realistic Will Smith eating spaghetti, and 11 months later we have videos like the one below of a man reading a book.
In its presentation, OpenAI shared examples of the tool still struggling with hands, physics and overlapping animals. If we look closely at details, it’s possible to tell that something isn’t real, but that requires more than a passing glance. Or, in the case of social media and people resharing screengrabs where visual compression reduces image quality, it requires us to be skeptical and seek out the source to verify for ourselves. C2PA tools may help if implemented correctly from a technical side, but they’ll also need a robust media literacy education effort.
Looking at how far AI-generated video has come in 11 months, it feels inevitable that the quirks of AI-generated images and videos will resolve themselves in due time. This is now the worst AI-generated video will ever look. Let that sink in.
Prompt: “A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.” AI video credit: OpenAI |
Weaponized disinformation
Maybe it’s because I come from working for newspapers, magazines and TV journalism, but a world in which truth can be buried under fiction with such ease strikes me as hurtling dangerously close to dystopian.
I’m reminded of my family stories from India’s colonial period and the riots around the country’s 1947 partition. For generations, colonial leaders had pitted different religious and regional groups against each other to keep power isolated at the top. Misinformation was a pivotal tactic to support an effort to place Hindus and Muslims at odds in order to maintain control.
For a lighter example, consider 1975’s “Rollerball” (yes, really). In true ’70s fashion, the film imagines a future world where corporations and the technology they control shape our world. In one scene, the main character visits a library only to learn that global corporations have digitized and rewritten all books and bent historical knowledge to their liking. An alternative history, complete with “proof,” is used to control the public and maintain power.
The scary thing in both examples is that they’re both based on a truth: knowledge is power, a power that if used maliciously, can be used to distract or direct others toward an outcome they desire.
History is littered with examples of image manipulation and attempts to pass off inauthentic images as authentic; following Abraham Lincoln’s death, a famous image of the former US president was faked. However, unlike in the past, the prevalence of cheaper and easier-to-use image manipulation and fabrication tools, such as AI, has made it possible for anyone to create fake images, and soon videos, and quickly circulate misinformation as truth, either for fun or more nefarious goals.
“Without knowing what is accurate and true, everything becomes suspect and facts become subjective.”
Recently, social media has been flooded with visual misinformation on the Hamas-Israel conflict. Images from other parts of the world have been paired with new misleading headlines, AI images are passed as proof of war crimes, fake BBC-style videos share fictitious accounts from the ground, and videos of world leaders with inaccurate English captions sow dissent and confusion. The problem is so significant on X that the platform reminded users about its disinformation policy and how it has ramped up the use of Community Notes, its fact-checking feature, which some insiders say is a bandaid that isn’t working.
Today’s deluge of visual misinformation challenges society and those producing authentic images. Without knowing what is accurate and true, everything becomes suspect and facts become subjective. Suddenly, bad actors can flood social media and muddy the waters, making it difficult to sort fact from fiction.
When I look at Sora and the samples shared, this fear creeps in of a media landscape in which one cannot confidently know what is real and what is someone trying to pull the wool over our eyes.
Among the AI-generated videos Sora made of animated creatures and paper planes over a jungle are a few concerning videos. Photorealistic humans in real-world environments conjure scenarios of weaponized misinformation. A video created from the prompt “historical footage of California during the gold rush” is anything but historical documentation. Videos from global locals open the door to alternative histories of a place.
Among all the videos shared by OpenAI, there is one that alarms me most. A ten-second Chinese Lunar New Year celebration clip shows a large crowd gathered for a parade, flanking both sides of the street as two dragon puppets participate in a dragon dance down the center.
Prompt: “A Chinese Lunar New Year celebration video with Chinese Dragon.” AI video credit: OpenAI |
The video is pretty innocuous; not thinking too hard about the angle, you might assume it’s smartphone video. With its realistic lighting, lower image quality, lack of depth-of-field, slightly out-of-focus people masking lack of detail and motion blur, nothing triggers a reason to think someone would go to the trouble of making an AI video of such a scene. Coming across this video on social media, you may think it’s real and move on convinced.
This is the danger. It’s ordinary enough that one might wonder, “Why would anyone fake this?”
Now, consider a scenario where a bad actor wanted to place someone in this scene and have them doing something nefarious in the background; perhaps the target is meant to be seen cavorting with someone they shouldn’t be. At a later date, accusations are made against the targeted person, and soon, this fake video is presented as the smoking gun. Now, consider this targeted person as a country’s president and planting the seed that they are untrustworthy and harmful for the nation is suitable for the opposing party. That scenario shouldn’t seem too far-fetched. In the last year, we’ve seen this happen with AI-generated still images in the US presidential race.
I won’t pose the could/should cliche, but I will say there needs to be considerations of ethics, societal harm, media literacy and corporate responsibility. Now that the genie is out, humanity has a greater responsibility to place guardrails in place with the means to course correct in real-time, not pick up the pieces in the aftermath of harm.
Prompt: “Reflections in the window of a train traveling through the Tokyo suburbs.” AI video credit: OpenAI |
A value proposition
Every time I revisit AI-generative technology, I am also left with the same thoughts. It is undoubtedly impressive, but what exact problem does it solve? To borrow Silicon Valley’s favorite mantra, does this make the world a better place?
I understand that there is a gold rush. I see the surges in stock prices for Nvidia and Microsoft and understand how money motivates AI development. I also see people making inventive things that inspire creativity. I’ve used AI-generative images for storyboards and mood boards. But I also see the dangers.
“To borrow Silicon Valley’s favorite mantra, does this make the world a better place?”
In the example videos shared by OpenAI, none really struck me as having a compelling use case. At its core, Sora is trying to produce a photorealistic video that could pass for real, and I have to wonder, to what end? Fake videos can pass for real with a passing glance. Real videos can be alleged to be fake by anyone. “Truth” becomes fractured and in its place, a million echo chambers rise and are free to enshrine their own version of what is real for them and their followers.
I suppose hindsight will have to be our arbiter. Perhaps one day an AI-Chris Nolan will team up with an AI-Charlie Kaufman to make a meta-commentary AI-Oppenheimer on the moment the AI genie is fully out of the bottle to finally make clear what it meant and what we learned.