The generative AI company behind ChatGPT and DALL-E has a new toy: Sora, a text-to-video model that can (sometimes) generate pretty convincing 60-second clips from prompts like “a stylish woman walks down a Tokyo street…” and “a movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet…”
A lot of the AI video generation we’ve seen so far fails to sustain a consistent reality, redesigning faces and clothing and objects from one frame to the next. Sora, however, “understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” says OpenAI in its announcement post (using the word “understands” loosely).
The Sora clips are impressive. If I weren’t looking closely—say, I was just scrolling past them on social media—I’d probably think many of them were real. The prompt “a Chinese Lunar New Year celebration video with Chinese Dragon” looks at first like typical documentary footage of a parade. But then you realize that the people are oddly proportioned, and seem to be stumbling—it’s like the moment in a dream when you suddenly notice that everything is a little bit wrong. Creepy.
“The current model has weaknesses,” writes OpenAI. “It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”
My favorite demonstration of Sora’s weaknesses is a video in which a plastic chair begins morphing into a Cronenberg lifeform. Behold:
Sora is not currently available to the public, and OpenAI says it’s assessing social risks of the model and working on mitigating them, for instance with “a detection classifier that can tell when a video was generated by Sora.”
It’s fascinating as a research project, but OpenAI isn’t just interested in doing cool computer science. If it can outmaneuver copyright critics and legislators, it’s here to make bank. The company says it’s currently “granting [Sora] access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.”
One commenter on X optimistically wondered if models like Sora will one day allow the public to wrest control of filmmaking away from Hollywood by making movies purely with prompts—but I wonder where they think the source material for all this generated video will come from if not, you know, filmmakers? Hollywood movies may already look pretty homogenous, but auto-reproducing Marvel Cinematic Universe-style CGI and car commercial drone shots isn’t exactly bringing creative expression to the masses, if you ask me. (The blog post notably doesn’t mention Sora’s training material.)
Despite the often clumsy results of generative AI and the legal, ethical quagmire it presents, we’re already seeing it used in professional creative media. That includes videogames, both in ways that are directly visible to us, like to generate art and voices and on-the-fly dialogue, and in ways that are less obvious, like generating code snippets or early concept art. A recent survey found that 31% of game development professionals use generative AI in some capacity. Combined with other software, I wonder what this kind of machine learning-driven video simulation could do besides generate slightly-off CG-like clips?
I don’t think anyone really knows how generative AI will be used in five or ten years or what the consequences of continued development will be, but it isn’t slowing down, so it appears we’ll find out. OpenAI and other companies are explicitly working not just toward better image and video and text generators, but toward “artificial general intelligence” or AGI—as in, the science fiction idea of what AI is.
“Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI,” says OpenAI.