Apple AdsAI & Automation

Intelligent CPP generation in 2026: why "AI made my screenshots" is the wrong question

Pablo Cabrera

AI Product Lead

Intelligent CPP generation in 2026: why "AI made my screenshots" is the wrong question

Every team I'm talking to now is asking some version of the same question about App Store creative: can I just have AI generate my CPP screenshots?

The honest answer is yes, you can. The tools work. A model can produce a screenshot set for a brand-keyword CPP in the time it takes to write the prompt, and the result will look professional enough to ship.

The answer is also useless. Because the question is wrong.

The right question — the one a paper that came out earlier this year actually answers — isn't whether AI can make App Store creative. It's whether the creative AI makes converts at the App Store, and under what conditions. And the answer there is much more interesting than yes or no.

What the paper actually found

In AI in Disguise (Exner, Hartmann, Netzer, Zhang, and Ding, 2026), the authors partnered with a major display ad platform — Taboola — to study what happened when advertisers were given an AI image generation tool inside their normal campaign workflow. The dataset is genuinely large for this kind of work: over 16 billion impressions, 116 million clicks, 4,633 ads in the quasi-experimental sample, all real money from real advertisers.

The headline finding looks anticlimactic at first. Across the careful within-campaign comparison, AI-generated ads performed insignificantly different from human-made ads. Same CTR, on average. The AI didn't win, but it didn't lose either — and it cost the platform roughly two cents per generation request to produce.

That's the boring version. The interesting version is what they found when they introduced a second variable: whether the ad looked AI-generated to human raters.

When AI-generated ads didn't look AI to humans, they significantly outperformed human-made ads. When they did look AI, they significantly underperformed. The average masked a sharp split. And the split wasn't random — it tracked specific visual features that humans, consciously or not, code as "this was made by a machine."

Some of the markers consumers used to flag AI were correct: high aesthetic polish, saturated color, lack of warmth. Others were exactly wrong — AI-generated images that had larger faces and higher clearness were more likely to be coded as human-made, even though those features are statistically more common in AI output. The AI, in other words, can disguise itself when it produces images with the right features.

The implication is uncomfortable for any "just have AI make your creative" pitch. Generation is necessary. It's not sufficient.

Why this matters specifically for CPPs

A CPP isn't a display ad, but the perceptual mechanics are similar enough that the finding transfers. When a user taps a brand keyword and lands on your Product Page, the screenshots and tagline are doing work in the same first-impression window the Taboola ads were measured on. If those screenshots read as "this app is from a real company that took the time," they convert. If they read as "this is a generated asset someone slapped together," they don't.

Three things make CPPs a sharper version of the same problem:

The volume requirement is higher than display. A display campaign can run two or three creative variants and learn. An ASA program optimizing properly across branded, generic, and competitor keyword themes needs ten to thirty CPPs in rotation to even start matching message to intent. Manual production caps out long before that volume.

The signal is slower. Display CTR reads in hours. CPP install-to-subscription performance reads in weeks. Which means manual testing isn't just expensive in production — it's expensive in calendar time. Every variant you don't generate is a quarter you don't get back.

The artificiality penalty is probably larger. Consumers tapping into an App Store have higher perceived stakes than consumers scrolling a content site. They're about to install software onto their device. A screenshot that reads "made by a model" probably triggers a stronger negative signal than one that reads the same way in a feed they're scrolling past.

Put those three together and the picture clarifies. The CPP function isn't "design more screenshots." It's "produce enough creative variants to actually test, at enough quality that they don't get penalized for looking generated, attached to enough keyword themes to be allocatable, all faster than a calendar quarter."

No human team I've ever seen at Phiture runs that operation manually. The ones that try produce four to twelve variants a quarter and call it a program.

What intelligent CPP generation looks like

This is the part of the conversation that usually gets reduced to "AI vs human designers," and that framing misses the structure of the work. The structural change isn't one tool replacing another. It's a generation-and-screening loop where each step does what humans either can't do at volume or can't do reliably.

A modern CPP generation system has four functions. Most teams I see have one of them, sometimes two.

Variant generation at scale. Given a product, a positioning angle, and a target keyword theme, produce dozens of candidate screenshot sets and taglines. This is the part everyone fixates on because it's the visible part.
Perceptual screening. This is what the AI in Disguise paper points to as the actual lever. Before any variant ships, filter out the ones that exhibit the perceptual markers humans associate with AI: over-saturation, excessive aesthetic polish, the wrong kind of symmetry, the textures that read as generated. The paper's contribution isn't that AI can make ads. It's that some AI-made ads work and others don't, and the difference is screen-able.
Keyword-theme allocation. Variants don't perform uniformly across keyword intent. A CPP that wins on branded search may lose on competitor terms. Intelligent allocation requires testing variants against keyword themes at a granularity that manual operation can't reach — which is why the allocation function has to live in software, not in a Monday meeting.
Performance learning. As variants run, the system needs to pull which ones won where, attach that signal to the keyword theme and the intent, and feed it back into both allocation and the next round of generation. This closes the loop. Without it, you're producing variants, not running a program.

Every one of those functions can be done manually. None of them can be done manually at the volume an ASA program above $50k a month actually requires. That's not an opinion. That's just the math of designer hours times calendar weeks.

What this changes for the operator

If you run ASA at scale — in-house, agency, or consultancy — the audit prompt for CPPs is concrete. Take ten minutes and answer four questions about your last quarter's CPP testing program:

How many distinct CPP variants did you actually run? (Not planned. Ran.)
Of those, how many were tested against more than one keyword theme?
How many quarters of calendar time did it take to gather enough signal to make a decision?
Of the variants that won, can you point to the perceptual reason they won — or do you just know they did?

If the answers are some version of "fewer than fifteen," "a handful," "more than one," and "not really" — that's not a creative quality problem. That's a volume and feedback problem, and it's the problem that intelligent CPP generation is built to solve.

The advisors I've spoken with over the last six months are starting to ask their clients about CPP testing programs the same way they used to ask about bid management three years ago: not "do you have one," but "what's the cadence, what's the volume, and what's making the variant decisions when nobody's in the room."

Catchbase's CPP generation feature exists inside that frame. It's not "AI makes your screenshots so you don't have to brief a designer." It's the four-function loop above, with the perceptual screening step built around the kind of finding AI in Disguise identified — generate at scale, filter for what doesn't look generated, allocate against keyword themes, learn from what wins. The product question to ask any tool in this category, ours included, is which of those four functions it actually does, and which one it just claims to do.

The wider lesson of the Exner et al. paper isn't that generative AI works for marketing creative. It's that the generation is the cheap part. The expensive part — the part that determines whether the creative is worth shipping — is what happens between the model finishing and the asset going live. For CPPs in 2026, that gap is where the program lives or dies.

What the paper actually found

Why this matters specifically for CPPs

What intelligent CPP generation looks like

What this changes for the operator

Ready to optimize your Apple Ads?