Operations · Data

Data analyst replaced by a data analyst

DTC Shopify Subscription Brand · DTC Shopify Subscription

Client Context

Before the system existed

The brand had outgrown the one-tool-per-team setup. CS lived in Gorgias. Ops lived in Shopify admin. Growth lived in Slack and Looker. Subscription state lived in Recharge. Several pieces lived in internal databases the engineering team had stood up over the years. When something cross-functional came up — and most of the questions worth asking were cross-functional — the truth was scattered across all of them.

Most teams in this spot do one of two things. They buy a “customer 360” platform that promises a unified view and almost never delivers, because their data isn't clean enough for the platform's assumptions. Or they accept that any non-trivial question requires a data analyst to spend a half-day stitching CSVs together. We did neither. We built the data analyst.

We didn’t replace the data analyst’s tools. We replaced the data analyst, with a data analyst the whole team can talk to.

The Challenge

Why this couldn't be solved by more people or an off-the-shelf tool

The interesting questions were all cross-functional. “How many California subscribers churned in the last 30 days after a delivery issue, and what's the LTV impact?” pulls from Shopify orders, Gorgias tickets, Recharge subscription state, and the internal ops database that tracks shipping incidents. Answering it required a data analyst to export from each system, normalize the joins by hand, reconcile the inconsistencies, and write up the result. Tens of hours, every time, for the kind of question leadership wanted weekly.

There was also a data quality problem nobody talked about. The Shopify orders dataset, as exported, wasn't categorized in a way that matched the questions the team actually asked. The team's vocabulary (“delivery-issue churn,” “price-sensitive cancel,” “paused-then-skipped”) didn't map cleanly onto Shopify's native tags. Without a unifying categorization layer, every analysis started from raw rows and rebuilt the same logic from scratch.

We tried the off-the-shelf options. Customer-360 platforms wanted to import everything into their own schema, which assumed our data was cleaner than it was. Reverse-ETL tools could sync data but didn't give the team a way to actually ask questions. BI tools answered known questions well but turned every new question back into a ticket for the data analyst. Each tool was good at one piece. None of them was a data analyst.

Build vs. buy

Why we built a data analyst instead of buying a customer-360 platform or BI tool

Customer-360 platforms force you to migrate to their schema and re-do work you've already done. BI tools answer the questions you already knew to predefine. Neither replaces the part of the analyst's job that matters: taking an ad-hoc, cross-functional question in plain English and returning a sourced answer. So we unified the data in our own Postgres and built a queryable LLM layer that does exactly that, and the data stays the brand's.

The Approach

The system, in plain English

Two pieces. First, a unified data layer: Shopify orders, Gorgias tickets, Recharge subscription state, and the brand's internal databases, all pulled into one Postgres with a categorization layer the team actually uses, damaged-shipment cancel, price-driven churn, delivery-issue refund, paused-then-skipped, about 12 categories total. ~280,000 historical orders, joined and categorized once, owned by the brand.

Second, the queryable layer: an MCP (Model Context Protocol) server that exposes the unified dataset and the source APIs as tools, with an LLM on top the team talks to in plain English. Today they ask through Slack; tomorrow they could ask through any LLM surface. The integration is the LLM, not the chat client. The result is, effectively, a data analyst on call, that pulls live data from the source systems, runs the join, applies the categorization, and writes up the answer with sources.

Source systemsShopify Admin APIorders, customersGorgias / helpdesktickets, historySubscription toolstate, churnPostgresowned orders datasetMCP Serverqueryable tool surfaceSlacknatural-language queryCS · Ops · Growth

The design choice that did the heavy lifting: the unified data layer lives in our Postgres, not a vendor's warehouse. The team's questions stay answerable in the team's vocabulary, and the categorization logic is visible and changeable without a vendor ticket. The MCP server and the LLM on top are the queryable face of that dataset, the data analyst. The dataset itself is the asset.

Integrations

SlackShopify Admin APIGorgiasRechargePostgresMCP server runtime

How it works for the team

Growth lead asks in plain English: “What's the LTV impact of damaged-shipment cancellations in the Pacific Northwest over the last 90 days?”

Two days of analyst spreadsheet work before. Now: a sourced answer in under 10 seconds, joining Shopify orders, Recharge subscription state, and the internal shipping-incident database, with the SQL and category logic visible so anyone can audit it.

CS lead asks: “How many California subscribers churned after a delivery issue in the last 30 days, and what cohort were they in?”

142 cancellations tagged delivery-issue, broken out by subscription cohort and tenure, in seconds. The kind of question that used to wait in the analyst's queue for a day.

Ops lead asks: “Pull the last 5 orders and full context for sarah.k@example.com.”

Order summary from Shopify, open and closed Gorgias tickets, current Recharge subscription state, and any related internal flags, all in one reply, ready to act on.

The Results

What changed after launch

Cross-functional analysis that used to consume tens of hours of a data analyst's time, the kind of questions leadership wanted weekly but only got monthly, became plain-English questions answered in seconds. Same depth. Same sources. Same auditability. The data analyst's queue stopped being a bottleneck because everyone on the team could ask the data analyst directly.

The deeper win is the dataset itself. Because the brand owns the categorization, the system answers questions the source platforms didn't anticipate. And because it's a queryable LLM layer rather than a fixed dashboard, new questions don't require new dashboards, they require asking.

MetricBeforeAfter
Cross-functional analysis turnaroundTens of hours of data analyst workPlain-English question, sourced answer in seconds
Sources joined per queryManual CSV exports from 4+ systems1 unified Postgres, queried by an LLM
Question latency for leadership asksDays to weeks (analyst queue)Self-serve, on demand

Outcome

A data analyst, replaced by a data analyst.

Takeaway

Is this the bottleneck you have?

If your team's most useful questions are the ones a data analyst has to spend tens of hours stitching together from Shopify, your helpdesk, your subscription tool, and a couple of internal databases, you don't have a tooling problem, you have a data analyst bottleneck. Buying another BI tool doesn't fix it. The shape that does fix it is a unified data layer in your own database plus a queryable LLM layer on top that the team can ask in plain English.

The categorization piece is the part most teams underestimate. Owning the ~12 categories that match how your team actually talks, in your own Postgres rather than a vendor's data warehouse, is what makes the system answer the questions you actually ask, not the questions a vendor predicted you'd ask.

You probably have this bottleneck if…

  • Cross-functional questions that require a data analyst to join 3+ systems by hand
  • A weekly leadership ask that the analyst can only deliver monthly
  • CS, ops, and growth all asking the data team variations of the same question
  • A Shopify orders dataset exported to CSV and filtered by hand
  • A “customer 360” evaluation that stalled because the data migration was too messy

FAQ

Common questions

What do you mean by “we built a data analyst”?

Practically: the team can ask any cross-functional question in plain English, the same way they'd ask a senior analyst, and get a sourced answer in seconds. The LLM has access to the unified Postgres (Shopify orders, Gorgias tickets, Recharge subscription state, and internal databases) plus live calls back to the source APIs when something's truly fresh. It writes SQL when SQL is the right answer. It reconciles across systems when reconciliation is the right answer. It cites sources either way.

Is this just a customer-360 platform or BI tool with extra steps?

No. Customer-360 platforms ingest your data into their schema and become the destination. BI tools answer the questions you already knew to predefine. This is a queryable LLM layer on top of a unified database that you own, which means it handles the ad-hoc, cross-functional questions a real analyst would handle, without the multi-day migration or the dashboard-per-question overhead.

Why unify the data in your own Postgres?

Two reasons. First, categorization: the team's questions assume their own vocabulary, “delivery-issue churn,” “price-sensitive cancel,” which the source systems' pre-built tags don't match. Second, ownership: the ~280,000 categorized orders plus the joins to tickets, subscription state, and internal data become a long-term asset the brand can query without a vendor's permission or rate limits.

Can this work with Gorgias, Zendesk, Re:amaze, or other helpdesks?

Yes. This build runs on Gorgias, but the MCP server is helpdesk-agnostic, Zendesk, Re:amaze, Kustomer, and Front all expose the APIs we need. The integration layer is the part we customize per client.

What's the build timeline?

Pilot in about a week. Full deployment depends on the number of source systems and how deep you go on the categorization layer, but the MCP server is answering real Slack queries within days of starting.

Have this bottleneck? Let’s map your version of this system.

One call. A concrete roadmap, whether you build with us or not.