This is The Stepback — a thoughtfully curated weekly newsletter designed to deconstruct and interpret one defining story from the constantly shifting landscape of technology. Each edition explores a single theme or development that reveals something deeper about the digital world we inhabit. For those particularly intrigued by the interplay between smartphones, visual technology, and the emerging universe of digital imagery—both authentic and artificially generated—Allison Johnson is your essential guide. The Stepback is delivered faithfully to subscribers’ inboxes every week at 8 AM Eastern Time, offering readers the opportunity to pause, reflect, and gain perspective before the news cycle accelerates once again. If you wish to be part of this ongoing exploration of technology and culture, you can subscribe to The Stepback through the provided link.
Cast your mind back to the early days of AI image generation. It was a period defined by novelty, humor, and experimentation—a time when the results of our imaginative prompts often bordered on the absurd. We laughed—perhaps with a touch of disbelief—when the images produced depicted people with far too many fingers, limbs twisted like rubber, and odd distortions that made distinguishing them as artificial creations almost effortless. Those obvious flaws became part of the charm. Yet, for anyone who hasn’t remained closely attuned to the rapid progress in this field, the news may come as a shock: the comedy has ended. The technology has matured dramatically, and the latest generation of AI image generators now produces images so convincing that their fabricated nature is alarmingly difficult to discern. Strangely enough, one of the innovations driving this newfound realism involves a counterintuitive step—making images look slightly worse.
It’s astonishing to realize that OpenAI’s image generator, DALL·E, entered public consciousness less than five years ago. In its earliest form, the model could conjure only small, 256-by-256-pixel thumbnails—tiny, pixelated windows into AI’s artistic imagination. Then came DALL·E 2 roughly a year later, representing a dramatic leap forward. The jump to 1024-by-1024 resolution brought with it a startling degree of realism. Still, there were unmistakable artifacts—visual tells that betrayed even the most sophisticated forgeries.
When Casey Newton tested DALL·E 2 during its beta phase, he prompted the system to create an image of a shiba inu dressed as a firefighter. The resulting picture was amusingly competent; at a casual glance, it might even fool an unsuspecting observer. Yet, closer inspection exposed the imperfections: the fur’s texture dissolved into fuzziness, the patch on the dog’s petite coat contained nonsensical scribbles rather than legible text, and an oddly bulky collar tag dangled awkwardly from its neck. Elsewhere in that same article, a batch of cinnamon rolls endowed with eyes somehow appeared more believable—proof, perhaps, that the whimsical was easier to fake convincingly than the familiar.
Around the same time, Midjourney and Stable Diffusion rose to prominence. These tools rapidly gained a devoted following among AI artists, digital experimenters, and, inevitably, individuals with less wholesome intentions. Iteration after iteration brought refinement—sharper detail, smoother rendering, and a growing ability to handle text and symbols accurately. However, a particular aesthetic persisted. AI-generated visuals often looked a touch too pristine, too airbrushed, radiating an unmistakable glossiness reminiscent of stylized digital portraits rather than authentic photographs. Now, however, that overpolished aesthetic is being replaced by something subtler and more convincing: an embrace of realism that deliberately tones down the technical sheen.
OpenAI may be a relative newcomer when set beside long-established technology giants like Google and Meta, but those older players have been far from complacent. In the latter half of 2025, Google unveiled a new addition to its Gemini suite—an image-generation model called Nano Banana. The model quickly captured public imagination when users began crafting strikingly lifelike figurines of themselves, prompting social media frenzy. My colleague Robert Hart joined the trend and discovered something remarkable: this model succeeded in retaining the nuances of his actual facial features and expressions with a fidelity that other AI tools had failed to achieve.
The truth about AI-generated images is nuanced. These systems often converge on a neutral, sanitized visual midpoint; their creations can appear broadly accurate yet strangely lacking in individuality or vitality. If you ask for an image of a table, the result will look like a table—not a specific one, but an average of every table the algorithm has ever encountered, devoid of personality. The elements that imbue an image with authenticity—an imperceptible asymmetry, the interplay of imperfect lighting, or subtle clutter—are exactly what algorithms traditionally iron out. Ironically, those imperfections are what make real images feel human. Recently, developers have begun purposefully reintroducing such flaws, even simulating the quirks of popular smartphone cameras, whose technical compromises have themselves become part of what we instinctively perceive as realism.
Google’s update to Nano Banana Pro, released barely a month ago, represents its most ambitious effort to date. The system now integrates improved world knowledge, enhanced textual rendering, and—most intriguingly—a visual quality that often emulates photos captured through a phone camera lens. The generated images display the telltale traits of mobile photography: selective contrast, distinctive sharpening artifacts, exposure quirks, and slight distortions of perspective typical of compact optics. The resemblance is uncanny.
Without realizing it, most of us have already internalized this aesthetic. Smartphone cameras rely on complex computational processes to offset the physical limitations of their small sensors and lenses. Through multiframe image stacking and aggressive fine-tuning, they balance exposure, accentuate sharpness, and elevate shadow detail—all optimized for display on small screens. The result is a distinctive look: high clarity, rich color, and hyperreal crispness. Curiously, Google’s latest AI generator reproduces these visual fingerprints almost perfectly, as if mimicking not only what we see but also how we have learned to perceive modern imagery.
Of course, Google is not alone in steering AI imagery toward greater authenticity. Adobe’s Firefly engine includes a feature named “Visual Intensity,” giving creators control over the exaggerated glow characteristic of AI-generated art. By reducing the intensity, users can achieve results that appear more organic—images resembling those taken with professional cameras, befitting Adobe’s design-oriented clientele. Meta’s equivalent model incorporates a similar toggle, called “Stylization,” allowing users to dial the realism up or down. Meanwhile, in the realm of motion imagery, tools like OpenAI’s Sora 2 and Google’s Veo 3 are going viral for generating clips that impeccably simulate the grain and low resolution of security-camera footage—proof that when the target aesthetic is supposed to look imperfect, AI can be astonishingly convincing.
Yet it would be unwise to interpret these achievements as proof of unbounded progress. While AI-based assistants still falter at simple tasks like purchasing shoes on our behalf, image synthesis models have undergone a more dramatic and visible transformation. The improvement is tangible; one can literally see it. The visual leap between early and current systems could be described as evolution at lightning speed.
During a recent conversation with Ben Sandofsky, cofounder of the acclaimed iPhone photography app Halide, he reflected on this convergence between AI images and smartphone aesthetics. By intentionally replicating the quirks and processing style associated with mobile photography—images that already feel slightly divorced from raw reality—Google, he suggested, may have cleverly bypassed the uncanny valley altogether. AI imagery doesn’t have to perfectly reproduce reality; in fact, doing so can backfire by highlighting its artificiality. Instead, it simply needs to imitate the way we capture and remember reality—flawed, filtered, and emotionally tinted—and in doing so, it achieves a deceptive sense of authenticity. That insight raises an unsettling question: in such a world, how do we decide what to trust when we look at a photograph?
Sam Altman has articulated one perspective: that the boundary between real and AI-generated imagery will eventually dissolve completely, and society will learn not just to tolerate this fusion but to feel comfortable within it. He may be partly correct. Nevertheless, it remains difficult to imagine a future in which human beings lose all interest in distinguishing truth from fabrication. As our capacity to discern becomes blurred, we will increasingly rely on technological aids to help us separate actuality from artifice. Encouragingly, such solutions are beginning to take shape—although, frustratingly, not at the same speed as image generation itself.
One initiative leading this charge is the C2PA’s Content Credentials standard, a framework gaining vital traction across the industry. Google has integrated this technology into its Pixel 10 series, where every photo taken automatically receives a cryptographic signature verifying how and by what means it was produced. This approach helps counter what Pixel camera director Isaac Reynolds calls the “implied truth effect.” If only AI-generated images carry labels, the absence of a tag implicitly suggests authenticity, fostering false trust. In contrast, Pixel cameras mark all images—whether artificial or real—ensuring that lack of information never masquerades as proof.
Of course, digital labels have limited use if they remain invisible. Here too progress is emerging. Earlier this year, Google Photos introduced native support for displaying Content Credentials metadata. The company has promised that search results and advertisements will soon also display these credentials wherever they exist. Still, the system’s success depends upon broad adoption: phone manufacturers, software developers, and social-media platforms must embed these credentials at the moment of creation and maintain them through every stage of sharing. Until such standards become universal, individuals are left largely to their own discernment—making it an especially crucial time to adopt a skeptical eye toward everything we see online.
Interestingly, Google’s Pixel 10 cameras now integrate generative AI directly into their imaging pipeline, most notably through a feature known as Pro Res Zoom. This tool aims to enhance digital zoom performance by synthesizing details that would otherwise be lost—essentially reconstructing clarity from limited data. Importantly, the function currently excludes human subjects, which may be reassuring for those concerned about overreach in personal imagery.
Meanwhile, traditional camera manufacturers are also embracing the Content Credentials standard, albeit at a measured pace. Leica’s M‑11P, priced at over $9,000, stands among the first to incorporate such verification, symbolizing the intersection of old-world craftsmanship with new-world accountability. In parallel, AI-enhanced editing tools within Adobe Photoshop—particularly generative fill—are empowering photographers to expand their creative range. The boundary between a purely captured image and one subtly guided by AI intervention is becoming not only blurrier but philosophically more complex.
My colleague Jess Weatherbed provided a superb overview of the C2PA initiative—an explanation that, somewhat frustratingly, remains strikingly accurate even a full year later. Wired’s in-depth conversation with Google’s Pixel team during the Pixel 9 launch illuminated the company’s evolving philosophy: treating photographs not merely as records of light but as emotional memories. Meanwhile, an investigation from Bloomberg exposed the strange and often unsettling ecosystem of creators using AI video systems like Sora 2 to mass-produce simplistic, algorithmically generated content for children on YouTube—a sobering reminder of what happens when technology outpaces ethics.
You can follow the authors and topics connected to this evolving story to receive curated updates tailored to your personal reading feed. Allison Johnson continues to explore the intersection of AI, consumer technology, and imagery across her columns. The Stepback remains the ideal venue for thoughtful reflection on how innovation reshapes not just our tools, but our perception of reality itself.
Sourse: https://www.theverge.com/column/843883/ai-image-generators-better-worse