GPT-4o’s New Image Tool: Gorgeous, Slow, and Slightly Less Weird
OpenAI’s latest image update is powerful—but not without trade-offs. Here’s the good, the “meh,” and what to do with it.
So… GPT-4o can now generate images and it isn’t DALL-E.
Not just better images—more realistic, more consistent, and finally, faces that don’t look like they’ve been through a blender. The photorealism is impressive. You can now generate high-res visuals directly inside ChatGPT, with stronger composition, more accurate proportions, and better stylistic control.
Sounds amazing, right?
Well… yes and no.
Unlike DALL·E 3, which could get wild and creative with a well-crafted prompt, GPT-4o tends to stick closer to the script. You’ll need to get really specific if you want it to step outside its polished comfort zone.
And here’s the kicker: it’s not faster. In fact, it’s way slower. What used to take 10 seconds can now drag past a full minute. So while the images are sharper and more polished, they also require a bit of a wait—and a whole lot more patience.
Also worth noting: image files are now PNG instead of WEBP (goodbye surprise file conversions), and many users are reporting more muted, moodier color palettes. So far, it feels a little more "studio photo shoot" than "wild creative playground."
Oh—and if you’re using a Custom GPT? You’re still working with DALL·E 3 under the hood. Which might actually be a good thing if you lean fantasy, sci-fi, or any kind of imaginative style. GPT-4o is less whimsical, less weird, and less inclined to play.
How GPT and DALL·E Think Differently
The key difference between models like GPT and DALL·E? It’s all about how they create things. GPT builds text one word at a time, like it's typing out a sentence. DALL·E, on the other hand, starts with a mess of noise and gradually paints it into an image—like magic developing on a Polaroid.
Autoregressive Model (e.g., GPT – “Next Token Prediction”)
How it works: Generates content one token (word or part of a word) at a time, predicting the next token based on everything it’s already produced.
Used for: Text generation, and some text-based tasks (like code, chat, or story writing).
Think of it like: Writing a sentence by choosing the next best word based on the ones that came before.
Pros: Excellent at language fluency, reasoning, and long-form generation.
Cons: Struggles with spatial relationships, images, or anything that requires “seeing” the whole layout at once.
Diffusion Model (e.g., DALL·E)
How it works: Starts with random noise and gradually refines it over many steps to form a coherent image, guided by a prompt.
Used for: Image generation.
Think of it like: Watching a Polaroid photo slowly come into focus—or painting over static until it becomes a picture.
Pros: Great for high-fidelity visuals, image realism, and texture detail.
Cons: Can be computationally heavy, slow to generate, and requires many steps to reach the final result.
GPT-4o Image: Better in Some Ways, Worse in Others
Now that you know how GPT and DALL·E think differently under the hood, let’s talk about what that actually looks like in practice—because GPT-4o's new image powers come with some shiny upgrades... and a few quirks that might have you side-eyeing your screen. Here's the good, the weird, and the stuff that still needs work.
But first some examples! I took images created in DALL-E and redid them with the same prompts in 4o. Have you tried this?



Consistency: A Blessing and a Buzzkill
GPT-4o is way more consistent than DALL·E 3. That’s great for brand style and repeatable outputs—but it also means your prompts often produce nearly identical results, even when you’re trying to shake things up.
Memory That (Sometimes) Works Against You
GPT-4o likes to "remember" what it just made. That can help keep a visual series consistent—but it’s also why that random detail from image #1 might show up again in image #3. If you want a truly fresh start, you’ll need to open a new browser window each time. Yes, really.
It’s More Photo-Realistic Now
The upside? You get more believable people, better lighting, and fewer mutant limbs. The downside? It's traded in some of the playful, imaginative edge that DALL·E 3 had. Expect polished portraits, not dreamlike chaos.
Artistic Style Defaults
Where DALL·E 3 might default to airbrushed fantasy art, GPT-4o leans painterly. Think soft brush textures and traditional art vibes. If you're into that, cool. If you're not? It can feel a little flat.
The Downside of All That Photo Data
All that realism came with a side of real-world flaws: grainy images, unnatural sharpness, and digital photo artifacts. And no, you can’t prompt your way out of them.
The Weird Stuff It Still Gets Wrong
Text in images: Ask for a scenic view and get “Sunrise Murray 9” written across the sky? Yeah, that still happens.
Orientation issues: You say “portrait.” It hears “landscape.” Be very specific—or prepare to crop.
Math + geometry fails: GPT-4o may crush photorealistic portraits, but when it comes to geometric precision? It’s still a little… creative. Think impossible crystal structures, warped lattices, and spatial logic that defies physics. Great for dreamscapes—not so much for teaching chemistry.
Want to Experiment?
Try these prompts with GPT-4o (and maybe DALL·E 3 for comparison):
Photorealistic brand portrait: “A female founder standing in front of a modern ecommerce dashboard, natural lighting, soft focus.”
AI + workspace aesthetic: “A robot writing content at a cluttered desk covered in sticky notes, monitors, and coffee cups—digital painting.”
Fantasy creative challenge: “A surreal dreamscape of an abandoned bookstore floating in space, stars visible through broken ceiling beams.”
Play with style modifiers like hyper-detailed, cinematic lighting, or digital illustration—and see where each model shines (or flops).
Wrapping up
GPT-4o is impressive. It really is. But it's also a clear case of "you gain some, you lose some." If you want sleek, production-ready visuals? You're in the right place. If you're chasing creativity, chaos, and that slightly unhinged AI magic? DALL·E 3 still hits different.
Here’s the takeaway: you don’t need to choose just one. Use GPT-4o when you need polish. Use DALL·E 3 when you need surprise. Trade-offs are the price of progress—and knowing when to switch tools is half the game.
P.S. If you're looking to build custom GPTs, integrate smarter AI workflows, or stop manually doing things AI can handle in 12 seconds flat—I help entrepreneurs and digital creators do just that.
Let's chat about what’s possible.
Until next week, keep Creating Smarter!
Lisa