Most growth teams now have a ChatGPT tab open, a Claude subscription, a Midjourney seat, and a Notion workspace full of prompts.
Almost none of them have an AI architecture.
That's the gap. And it's the reason the AI line item keeps growing while the growth metrics don't.
A tool is something a person opens. An architecture is what a system runs on. The first produces artifacts. The second produces compounding leverage. If your AI spend is rising while your activation rate isn't, the problem is rarely the model.
It's the layer underneath it.
Tools vs. Architecture
When a growth team adopts an AI tool, the usual path looks like this:
Someone on the team starts using ChatGPT for landing page copy.
Someone else uses Claude to draft lifecycle emails.
A third person wires Midjourney into an ad creative.
A quarter later, the AI line item is real money, and nobody can point to a metric that moved because of it.
Each use is reasonable in isolation. None of them shares context. None of them feed back into anything. The outputs are artifacts in a doc, not state in a system.
An AI tool is used. An AI architecture gets invoked. Tools sit beside your growth system. Architecture is load-bearing inside of it.
The difference shows up in three places:
Tool Adoption | AI Architecture | |
|---|---|---|
Context | Lives in a prompt | Lives in your data layer |
Output | Delivered to a person | Delivered to a system |
Feedback | None | Closes the loop |
If you can't answer where context comes from, where output goes, and how results feed back, you have tools. Not architecture.
The Four Layers
A working AI architecture for growth has four layers. They stack. Skipping one doesn't make the stack shorter; it makes it broken.
Layer 1: Context
This is the data the model needs to be useful for your specific product and user.
Not the training data. Not what the model already knows. The context that makes a generic model behave like it understands your business: product events, user properties, lifecycle stage, recent behavior, plan tier, feature usage, and experiment exposure.
Most teams skip this layer entirely. They paste a prompt into ChatGPT and expect it to write a lifecycle email that converts. It won't, because the model has no idea who the user is, what they've done, or where they are in the product.
The engineering question at this layer: where does context come from, and how does it reach the model at inference time?
Usually, the answer is a warehouse-to-prompt pipeline. Events land in the warehouse, a transformation shapes them into a user-level context object, and that object is injected into the prompt at runtime.
If you don't have this, nothing downstream will work.
Layer 2: Invocation
This is how the model gets called. Who calls it, when, and with what inputs?
A human opening a chat window is one invocation pattern. It's also the least useful one for growth, because it scales linearly with headcount.
The invocation patterns that compound:
Event-triggered: a signup, a feature use, a churn signal, an experiment result
Scheduled: daily summaries, weekly insights, batch personalization runs
API-triggered: called from your product, your CRM, your growth tool
Agent-triggered: called by another model as part of a workflow
These patterns have one thing in common: a human is not in the loop for each call. The system decides when to invoke. That's what makes AI act as infrastructure instead of a productivity hack.
Layer 3: Output Routing
Where does the output go?
If the answer is "into a Google Doc," you have a tool. If the answer is "into the product, the database, the experimentation platform, the CRM, or another model," you have architecture.
Output routing is the layer that growth teams underinvest in most. The model generates something good, and then a human has to copy, paste, format, and ship it. That cost is hidden because it feels like normal work. It's not. It's the coordination tax that eats the return on the AI spend.
The fix: treat the model's output as a structured response that another system consumes. JSON, not prose. Validated, not free-form, and written to a known location, not to a human's clipboard.
Layer 4: Feedback
Did it work?
Almost no team answers this for their AI-generated content. The email went out. The ad ran. The headline shipped. Nobody connects the outcome back to the model call that produced it.
Without a feedback layer, you can't learn. You can't tell which prompts work. You can't tell which context windows matter. You can't tell whether the expensive model is worth it over the cheap one. You're running a system with no sensors.
The minimum viable feedback layer: log every model call with its inputs, its outputs, and a reference to the downstream outcome. That's it. You don't need a fine-tuning pipeline on day one. You need the data that would make it possible.
The Cost Conversation
Cost is becoming the deciding factor in AI adoption, and the conversation is usually framed incorrectly.
The question isn't "which model is cheapest?" It's "what's the cost per unit of growth outcome?" A model that's ten times more expensive per token can still be ten times cheaper per conversion if it produces output that actually ships and performs.
Without the feedback layer from above, you can't answer that question. So the cost conversation collapses into token price, and teams pick the cheap model and wonder why nothing moves.
Architecture makes the cost conversation possible. Tools make it a guessing game.
The Build Order
If you're starting from zero, build the layers in this order:
Feedback first. Log every call. You don't need to do anything with the logs yet. You need them to exist.
Context next. Build the pipeline that gets user and product data to the model. Start with one use case.
Invocation third. Move one workflow from a human opening a chat window to a trigger.
Output routing last. Convert one free-form output into a structured response that another system can consume.
Most teams do this in reverse. They start with flashy output (AI-generated landing pages), skip invocation (a human is still driving), ignore context (the model doesn't know the user), and forget feedback entirely. Then they wonder why the pilot didn't scale.
Final Thought
The teams that get leverage from AI aren't the ones with the best prompts. They're the ones who built the layers underneath the prompts.
A prompt is a request. An architecture is a system. The first depends on who's typing. The second runs whether anyone is watching.
Every AI tool your team adopts is a decision about which of those two you're building.
FAQ
Do I need all four layers to start?
No. You need to know all four exist, and you need to build in the right order. Most teams fail because they start at layer 3 (shiny outputs) and never get to layer 1 (context) or layer 4 (feedback).
What's the smallest useful version of this?
One trigger, one context pull, one structured output, one logged outcome. That's a working architecture for a single workflow. Everything else is scale.
Where does fine-tuning fit?
Later than you think. Fine-tuning is a layer 1 optimization. It only pays off when your context and feedback layers are already producing a signal. Teams that fine-tune before they log are optimizing in the dark.
How is this different from "using AI in your growth stack"?
Using AI is what happens when individuals open tools. Architecting AI is what happens when the system calls the model without asking anyone. The first is a skill. The second is infrastructure.
What's the one thing to do this week?
Start logging. Every model call, every input, every output, every downstream reference. You can't architect what you can't see.
Subscribe to get more growth engineering content in your inbox.