It was only a little more than a year ago that I started hearing about Stable Diffusion and Midjourney and the ability to create images from nothing. Just string a few words together, and a generative AI model sitting on a server transforms those written words into a graphic image. Magic.
Everything has progressed so fast and so frenetically since then. And suddenly, I was standing in the middle of MediaTek’s booth at MWC, looking at an Android phone running the Dimensity 9300 chipset and generating AI images on the fly.
The model generated and improved the image with every letter I typed, in real-time.
Every letter and word I typed triggered the Stable Diffusion model and changed the image to fit my description more accurately. In real time. Zero lag, zero wait, zero servers. Everything is local and offline. I was dumbstruck.
Just last year, Qualcomm was happy to show off (at MWC too) a Stable Diffusion model that could generate an AI image locally in under 15 seconds. We found that impressive then, especially compared to Midjourney’s more time-consuming and server-demanding generation.
But now that I’ve seen real-time generation in action, those 15 seconds seem like a lagfest. Oh, what a difference 12 months make!
Now that I’ve seen real-time AI generation in action, anything else feels like a lagfest.
The Dimensity 9300 was built from the ground up to withstand more on-device AI capabilities, so that wasn’t the only demo MediaTek was touting. However, the others weren’t as impressive and as eye-catching: local AI summaries, photo expansion, and Magic Eraser-like photo manipulation. Most of those features have become commonplace now, with Google and Samsung boasting them in their Pixel software and Galaxy AI suit, respectively.
Robert Triggs / Android Authority
Then there was a local video generation model, which creates an image and animates it as a series of GIFs to make a video out of it. I tried it a couple of times. It took over 50 seconds and wasn’t always accurate, so you can imagine that it didn’t catch my eye as much as the real-time image model.
MediaTek also showed off a real-time AI avatar maker that uses the camera to capture live footage of a person and animates it with multiple styles. The animation was a second or two behind her real movements, so it was not so laggy, but the generated image reminded me of the early days of Dall-E. Again, this was running locally and offline, which explains these issues. It’s still impressive tech, of course, but it didn’t feel “there” in the same way as the real-time image generation model.
As you can tell by now, I really liked that first demo. It just felt like the tech had finally arrived. And the fact that you could do it locally, without the extra costs of servers and the privacy concerns of sending requests online, is what makes this more practical to me.