Creative Intelligence
5,400 ads, integrated with LLM
DTC Shopify Subscription Brand · DTC Shopify Subscription
Client Context
Before the system existed
The growth team was running paid social at modern DTC scale: hundreds of variants live at any time, thousands cumulative, and creative fatigue measured in days. They knew certain patterns won. They had no way to query which.
The setup was the usual one. A naming convention nobody fully followed. A spreadsheet of roughly 80 tracked winners the analyst kept current by hand. A senior strategist who remembered what hit in past tests. The naming convention drifted. The spreadsheet went stale every week. The strategist’s memory topped out long before 5,400 assets.
We knew some patterns won. We didn’t have a way to query which.
The Challenge
Why this couldn't be solved by more people or an off-the-shelf tool
Meta has been blunt about the 2026 landscape: creative quality and strategy drive about 70% of campaign performance now that targeting is mostly automated. CACs across the category are up 25–40% year over year. The one input still under your control is which creative patterns work, and that input is only as queryable as your tagging.
Manual tagging breaks down fast. Two analysts label the same ad differently. A single analyst labels the same ad differently in week 1 versus week 8. And the granularity that actually moves spend, “fast cuts, male voiceover, UGC hook” versus “static carousel, value-prop opener”, is the first thing dropped when deadlines hit.
The off-the-shelf answer is a creative intelligence platform like Segwise or Motion. They’re well-built. We built in-house for two reasons specific to this brand. First, we wanted full control of the descriptor axes, 14 visual dimensions like pacing, color treatment, hook style, scene count, on-screen text density, dominant subject, lighting, and framing, plus 8 audio dimensions like voiceover gender, music genre, audio energy, dialogue density, music intensity, vocal cadence, silence ratio, and sonic branding. Each one kept orthogonal so the queries compose cleanly. Second, we wanted the whole corpus exposed to an LLM the team could ask in plain English, with the LLM grounded in the actual contents of the ads (including the video), not just the copy and the performance numbers. The Slack surface is just where the team happens to type the question.
Build vs. buy
What orthogonal descriptors means
Every dimension carries independent signal, fast pacing and short scene count aren’t double-counted as the same thing. When the team asks “top patterns by CTR,” each result tells them something new instead of repackaging the same insight under three different labels.
The Approach
The system, in plain English
The pipeline ingests every ad the brand has run on Meta in the last 18 months, 5,400 assets and counting. Gemini multimodal watches each ad frame by frame and extracts the 14 visual descriptors. Whisper transcribes audio, and a deterministic post-processor maps the raw output onto the 8 audio descriptors. Every value is bounded. Every output is auditable. Critically, the parsed video and audio context lives alongside the descriptors, so when the LLM answers, it has the actual contents of the ad to draw on, not just labels.
Meta Ads API joins performance data to each asset by creative ID. The corpus, descriptors, ad contents, and performance data, is exposed to an LLM the growth team queries in plain English (today through Slack, but the integration is the LLM, not the chat surface): “which hook styles are driving top-quartile CTR with the 25–34 cohort this month?”, “how does pacing correlate with conversion rate on the lipid SKU?”, “which audio energy levels are fatiguing fastest in the last 14 days?”
System architecture
The parsing wasn’t the hard part. Multimodal models handle visual and audio extraction well now. The hard part was the schema. Every dimension added multiplies the query surface, but only if the dimensions stay independent. The first phase was pruning overlapping descriptors before scaling the pipeline, because a brittle schema would have poisoned every downstream query.
Integrations
How it works for the team
Creative strategist asks: “Which hook styles are working with the 25–34 cohort right now?”
→ Ranked list of top hook descriptors with five sample thumbnails and the underlying CTR percentiles, sourced from the live pipeline.
Growth lead asks: “Are our high-energy audio ads fatiguing faster than mid-energy this month?”
→ Side-by-side day-over-day CTR decay curves grouped by audio energy, flagging the cohort that’s dropping fastest.
Performance creative asks: “What color treatments are converting on female 25–34 in the last 14 days?”
→ Color treatment performance breakdown for the cohort with the top five sample assets and their ROAS surfaced inline.
The Results
What changed after launch
5,400 ads parsed, tagged, and joined to performance data, and the whole corpus exposed to an LLM. The team’s “what’s working?” question went from a half-day analyst project to a plain-English query, with the answer grounded in the actual ad contents (including video) rather than just labels or copy. The deeper win is fatigue: the pipeline now flags decaying descriptor combinations roughly 3 days before performance crashes, where the team used to catch fatigue only in hindsight. That’s enough lead time to write the next brief before the spend gets ugly.
The pipeline is also institutional memory. When a new creative strategist joins, they don’t have to inherit the previous strategist’s pattern intuition, they ask the LLM. Knowledge that used to live in one person’s head now lives in an LLM-queryable index every member of the team can hit.
| Metric | Before | After |
|---|---|---|
| Time to answer “what’s working?” | Half-day analyst project | Plain-English question to the LLM, sourced answer in seconds |
| Creative pattern coverage | ~80 tracked winners in the analyst’s spreadsheet | 5,400 assets across all 22 dimensions |
| Tagging consistency | Analyst-dependent, drifts week to week | Deterministic, schema-locked, auditable |
| Fatigue detection | Lagging, caught after creatives crash | ~3 days ahead of crash, descriptor-level |
Outcome
5,400 ads. Integrated with an LLM. Orthogonal by design.
Takeaway
Is this the bottleneck you have?
If you’re running paid social at scale and your creative analysis lives in spreadsheets, or in a category tool that hides the descriptor schema from you, you have a hard ceiling on how much you can learn from your own tests. The ceiling isn’t the data. The ceiling is the schema.
Building in-house instead of buying makes sense when you want to own the descriptor axes, want the corpus integrated with an LLM that has access to the actual ad contents (not just labels and copy), and want the orthogonality guarantee, so each query result tells you something new. If you don’t need those, buy Segwise. If you do, this build pays for itself the first time it flags a fatiguing pattern before spend craters.
You probably have this bottleneck if…
- Running 500+ creative variants per month on Meta
- Creative analysis happening in spreadsheets
- A senior strategist who is the institutional memory
- Tried Segwise or Motion and bounced off the schema
- Need fatigue detection at the descriptor level, not the asset level
FAQ
Common questions
Why not just use Segwise, Motion, or another creative intelligence platform?
They’re well-built and the right answer for most brands. Custom is the right answer when you want the descriptor axes orthogonal so queries compose cleanly, when you want the schema editable in-house as your creative strategy evolves, and when you want the corpus exposed to an LLM that has the actual ad contents (including video) in context, so answers are grounded in what the ad is, not just what the labels and performance numbers say.
What does “orthogonal descriptors” actually mean?
Each dimension carries independent signal. “Visual pacing” and “scene count” are not orthogonal, they encode similar production cues, and you’d double-count the same pattern under two labels. “Visual pacing” and “audio energy” are orthogonal, different sensory channels, no overlap. We designed all 22 dimensions so the team’s queries stop rediscovering the same insight in disguise.
How do you parse video and audio descriptors at scale?
Gemini multimodal handles visual extraction frame by frame. Whisper transcribes audio. A deterministic post-processor maps the raw model output onto the bounded 22-dimension schema so every value is auditable and consistent across the 5,400-ad corpus.
Can it handle TikTok, YouTube Shorts, and other channels?
Yes. The parser is channel-agnostic. What changes per channel is the performance data join, Meta Ads API for Meta, TikTok Marketing API for TikTok, and so on. We built the Meta version first because that’s where the brand’s spend was concentrated.
What’s the build timeline?
Pilot in about a week. Schema design is the long pole; the parsing pipeline itself ships faster than people expect. Volume and how deep you go on the descriptor axes are the two variables that move the full-deployment timeline.
Related systems we’ve built
Central Queryable Knowledge Base for Customer Service
How a 30+ Person Customer Service Team Cut Query Resolution Time by 28% with a Central, Queryable Knowledge Base
Read case study →Unified Customer View via MCP Server
How We Replaced a Data Analyst With a Data Analyst: Shopify, Gorgias, Recharge, and Multiple Databases Unified Into One Queryable System
Read case study →Have this bottleneck? Let’s map your version of this system.
One call. A concrete roadmap, whether you build with us or not.