On Tuesday, Stability AI introduced a new AI image-synthesis model, Stable Diffusion XL Turbo. The bombastic name reveals its main advantage: it generates images at an unprecedented speed. It produces images in a single step, compared to the 20-50 steps its predecessor required. In other words, it can generate images as fast as you can type… maybe even faster.
The Stable Diffusion XL Turbo (SDXL Turbo) uses a technique called Adversarial Diffusion Distillation (ADD). ADD combines score distillation, where the model learns from existing image-synthesis models, and adversarial loss, which enhances the model’s ability to differentiate between authentic and generated images. Together, this makes image generation blazing and improves the realism of the output.
Real-time image generation and beyond
Stability AI’s research paper, released alongside the model, delves into the intricacies of ADD. The minus side of the model is that these single-step images aren’t as detailed as SDXL images created with more steps. However, its speed gains are remarkable. On an Nvidia RTX 3060, SDXL Turbo can generate a 3-step 1024×1024 image in about 4 seconds, compared to 26.4 seconds for a 20-step SDXL image with similar detail. I tried it out and it literally created and modified the images as I was typing.
But on the other hand, look at these versions of a “stressed-out female student with long dark hair screaming surrounded by books and papers.” Nightmare fuel.
Uses and availability
It looks like Stability AI is going strong despite the current issues. Just last week, they introduced Stable Video Diffusion, capable of transforming still images into short video clips. Some folks have already used the new AI video tool for memes, and it’s as scary as possible.
Currently, SDXL Turbo is available under a non-commercial research license. Its use is limited to personal, non-commercial purposes. This decision has drawn some criticism within the Stable Diffusion community, but Stability AI remains open to commercial applications. They invite anyone interested to reach out with any questions or requests.
Stability AI also offers a beta demonstration of SDXL Turbo on its image-editing platform, Clipdrop. This is where I tested it out, and I hope it will improve when it’s out of beta. An unofficial live demo is also available for free on Hugging Face.
The model’s real-time capabilities open up various possibilities. We can expect its use in generative AI video filters or video game graphics. Although, like all generative AI models, there is a coherency issue, so this needs to be addressed first. Unfortunately, I believe we can also expect more fake news to be created much faster – but that’s a topic for another time.
[via Ars Technica]