The generative AI tech race is still on and running at full tilt. In the past months, both startups and established players appear to have directed their attention towards video generators, as the announcements in this field continue to pour in. Yesterday, the maker of ChatGPT also officially joined the club. Yet, the news about the OpenAI launch of Sora might just be groundbreaking, as its deep-learning model shows an unprecedented level of realism. The tool is not available to the public yet, but the published cases have shocked creators all over the world. Let’s take a look at what Sora promises to do!
After the prolonged competition in text-to-image generators, the technology has finally advanced enough to simulate motion and create short video clips solely from written commands (the so-called prompts). So, now it’s all about video generators. Just this year, Sora marks the third similar launch from the leading AI developers, following news from Midjourney, and the introduction of Google Lumiere.
However, neither these nor the ones that were on the market long before (Runway, for example) have shown such consistent hyperrealistic results with the quality that Sora demonstrates. Trust me, I tested them and can confirm this! While previously we stated that the tech is not there yet to replace, say, stock footage, now we are starting to have our doubts. And we are not the only ones.
OpenAI launches Sora – will it be the next leap in the generative AI area?
So, what’s so special about Sora? First, unlike its competitors, this text-to-video model can allegedly create videos of up to 60 seconds. As explained and displayed in the announcement, generated clips feature “highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.“ The model is said to understand not only what a user asks for in the prompt, but also how things exist and work in the physical world. Let’s take a look at one of the examples:
Here, Sora not only depicted the drone shot of waves along Big Sur’s beach but also nailed the physics of waves crashing over rocks. Has any AI video generator managed something like this before? If so, I’ve never witnessed it. Would you even recognize that it’s an artificially generated video at first sight? I wouldn’t. Scary? Hell yeah.
What Open AI’s Sora will be capable of
Another thing that sets OpenAI’s newborn child drastically apart from its predecessors is its ability to combine different shots in one video. In the following example, the creator asked the deep-learning model to imagine a movie trailer featuring the adventures of a 30-year-old spaceman wearing a red wool-knitted motorcycle helmet. Here’s the result:
Of course, it’s far from perfect, but this is also the first time we have witnessed a model capable of sequencing different shots within one clip and setup. Also, take note of the extreme attention to detail, like a red wool-knitted motorcycle helmet, which is not something that one sees on the street every day.
Furthermore, apparently, Sora can interpret long prompts, as some published examples feature very detailed text directions with over 100 words. Thanks to the extensive Dall-E training (and Sora borrows several techniques from it), the style decision is also up to you, with 2D- and 3D-animated clips showcasing that:
You can see further sample videos on the official Sora’s webpage and in OpenAI’s X (ex-Twitter) account.
Limitations of the new AI video generator
Almost all of the published videos created by Sora appear remarkably realistic – especially if you’ve tried other video generators and know their limitations. For instance, for me, one of the scariest results was the subsequent one with Tokyo vibes. Why? Because it shows people in motion that is quite believable, real-looking reflections on the wet ground, and lights, just like in a cinematic shot:
However, if you take a closer look, you will see a couple of mistakes and weird artifacts that point out the AI’s handwriting. They also appear in other examples – with odd swimming sea creatures or people’s facial expressions in some close-ups.
Apart from that, OpenAI developers admit that the current model of Sora is still struggling with the physics of complex scenes and understanding the cause and the effect. For instance, in the release post, they mention that “a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.” Another example would be Sora mixing up left and right:
People’s reaction to OpenAI’s Sora launch
The announcement was met with extreme reactions – from adoration and amazement to pure hostility. A lot of people on social media expressed their concern that this development will lead to abuse (referring to faking reality both for fun and for serious matters, like jeopardizing upcoming elections), and to the loss of jobs in the creative community.
One commentator on OpenAI’s Twitter also mentioned that: “The entire stock footage industry just died with this one tweet“, which, in our opinion, might indeed be the case.
No public release for Sora yet
To be fair, it’s difficult to draw any conclusions while the AI model is still in beta. All we have to go on are sample videos carefully picked by OpenAI’s team. So, let’s wait for the public release first.
Developers didn’t mention when Sora is going to be widely available, yet they noted they will be taking several important safety measures first. This includes working with domain experts in areas like “misinformation, hateful content, and bias“ to prevent people from abusing the new technology. Also, the announcement promised to grant access to some visual artists, designers, and filmmakers. The goal is to gain feedback on how to advance the model so that it can become most helpful for creative professionals. Sounds good, but how will they deal with all the backlash, the attribution problem, and wide industry concerns? Time will tell.
What do you think about OpenAI’s launch of Sora? Does it excite you or are you worried? Can you imagine ethical and sustainable ways to implement this AI into your workflow? Let’s start a polite discussion in the comments below!
Feature image source: stills from videos, generated by OpenAI’s Sora