Trusted by over 68k marketers (LinkedIn & TikTok)

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© Copyright 2025. All rights reserved

AI Pipeline Architecture: The Decision You're Not Making

By
Oren Greenberg
June 25, 2026

Last updated: 2026-06-25

87% of data science projects never reach production [Industry analyses, 2026]. Most fail before the first API call is written. Not because of poor engineering. Because the team never answered the foundational question: what is this system actually authorised to decide?

That's an architecture question dressed as a governance question. And until you've mapped every pipeline output to either an autonomous action or a human handoff, you're building infrastructure for an outcome you haven't defined.

Almost everyone rebuilds. You will too, if you skip this.

The conflation that breaks builds

The conflation that breaks builds

The failure pattern is consistent across mid-market B2B SaaS right now.

A CMO or VP Marketing commissions a custom AI pipeline - lead scoring, intent signal routing, ICP fit classification, something along those lines. The engineering team asks: what data do we have? That question shapes everything that follows. The data model, the feature set, the model type, the latency requirements. Months later, the system scores leads. Nobody acts on the scores. The pipeline gets rebuilt.

Root cause, every time: the team conflated "what data do we have" with "what decision are we engineering toward."

These are not the same question.

"Traditional pipelines move data to dashboards. AI data pipelines move data to decisions." - Solved (Scality)

A dashboard tolerates ambiguity about who acts on it and when. A decision system does not. If you're building a pipeline that routes a signal to a human for review, the architecture looks different from one that triggers an automated sequence. The model type differs. The latency requirement differs. The failure mode differs. The human override threshold has to be specified before you choose an orchestration framework, not after.

The broader AI engineering literature treats this as a monitoring and compliance footnote. It isn't. It's the primary architectural constraint.

Decision authority as the first artefact

Decision authority as the first artefact

Before any data architecture work begins, you need a decision authority map.

One document - a table works fine - that lists every output the system is expected to produce and answers 3 questions for each: Is this output an autonomous action? Is it a human-assisted recommendation? What's the confidence threshold below which the system must escalate rather than act?

This sounds obvious. It's almost never done.

Teams jump to data discovery because data discovery feels like progress. It produces artefacts - schemas, source inventories, lineage diagrams. Decision authority mapping produces a document that forces organisational disagreement into the open. That's uncomfortable.

The disagreement is usually about trust and liability. Sales leadership doesn't trust the model's ICP classification enough to let it suppress outreach automatically. Legal hasn't signed off on automated personalisation at the account level. The CMO wants autonomous email sequencing but the CRO wants every trigger reviewed. These aren't technical problems. They're governance problems that, if deferred, become architectural problems - because the system gets built for the optimistic scenario and then constrained post-launch into a shape it wasn't designed for.

"The objective is to transition from isolated model experiments to a production-level AI system. An architecture-first approach ensures AI systems are not only intelligent but also durable, delivering predictable results and adapting as data changes." - DSG.ai Editorial Team

Architecture-first means decision-first. The data architecture flows from the decision map, not the other way around.

The rebuild pattern in practice

The rebuild pattern in practice

The canonical failure scenario runs like this.

A company builds a lead-scoring pipeline around the data it has: CRM activity, firmographic enrichment, web session data, email engagement. The model achieves reasonable accuracy in testing. Goes to production. Three months later, sales reports the scores don't correlate with deals closing. An audit reveals 12% of training labels were wrong - leads marked as converted that were actually closed-lost, or vice versa - combined with data distribution shift as the ICP evolved and schema changes in the CRM that nobody flagged upstream [Mahi Mullapudi, TutorialQ, 2026].

Three months of model work, gone.

But the label problem is fixable with better data governance. The deeper issue is that the model was optimised for the wrong outcome. The business goal was to increase pipeline quality - reduce the volume of low-fit leads entering the sales process. The model was trained to predict "will this lead engage with outreach." Engagement and fit are correlated. They're not identical. A model with 95% accuracy on engagement prediction may only deliver a 4% reduction in pipeline waste when the target was 8% [DSG.ai, 2026].

Technically correct. Commercially useless.

The model answered the question it was trained on. Nobody had precisely specified the question the business needed answered.

What the architecture actually needs to decide

For a GTM pipeline specifically - and this is where most technical AI pipeline content misses the mark - the decision layer has to be designed around commercial intent, not data availability.

At the signal layer: Is this behavioural signal sufficient to trigger any action autonomously, or does it need to be combined with firmographic fit before it crosses the threshold? Who owns the definition of "sufficient"?

At the classification layer: When the system classifies an account as ICP-fit, what is it authorised to do with that classification? Route to a named rep? Trigger an enrichment workflow? Suppress from a nurture sequence? Each action has a different risk profile and requires a different confidence threshold.

At the handoff layer: What does the human actually receive? A score, a recommendation, a pre-drafted action, or a completed action with a log? The further downstream the handoff, the more the system needs to have gotten right upstream. Designing the handoff last - which is the natural tendency when you build data-first - means you often discover the human can't act on what the system produces.

"An AI training data pipeline is not just ETL with a machine learning label. It's a purpose-built system that handles ingestion, validation, transformation, labeling, versioning, and serving - all while maintaining full lineage and reproducibility." - Mahi Mullapudi

That's the technical reality. The commercial reality is that every one of those stages should be designed backwards from the decision, not forwards from the data source.

I've written about the upstream version of this problem before - most GTM stacks fail not because of poor tooling but because the architecture question was never asked. The same logic applies inside the pipeline itself. See: Your GTM Stack Is an Expensive Mess. AI-Native Companies Figured Out Why.

The hardest part isn't technical

The hardest part of building AI marketing systems isn't the architecture. It's getting marketing and sales experts to articulate tacit knowledge - about what a "good" lead actually looks like, about which signals they've learned to ignore, about the edge cases where the normal ICP definition breaks down.

A GTM engineer can build any pipeline you can specify. The specification is the hard part.

When a senior AE says "I just know when an account is ready," that tacit pattern-recognition is what the model needs to learn. Extracting it requires structured knowledge-elicitation work that most builds skip entirely, because it doesn't feel like engineering.

The decision authority map forces this work into the open. When you ask "what confidence threshold should trigger autonomous action," you're asking the sales leader to externalise their mental model of risk. When you ask "what does the human handoff need to contain for the rep to act on it immediately," you're asking the rep to articulate what they actually need, not what they think they should need.

This is also why the argument that executives can lead AI transformation through delegation alone doesn't hold. If you're commissioning a custom AI pipeline capability and you can't describe the decision it's engineering toward, you're not leading the build. You're funding a rebuild.

A pre-build checklist

Before a single API call is written, these questions need documented answers:

  • What is the specific decision this pipeline is designed to produce? Not "better lead scoring" - "route accounts with ICP fit score above X and intent signal Y to SDR queue Z within N minutes of signal."
  • For each pipeline output: autonomous action, human-assisted recommendation, or human-required decision?
  • What is the confidence threshold below which the system escalates rather than acts?
  • Who has sign-off authority over autonomous actions? Is this documented?
  • What does the human handoff artefact look like? Did the humans who will receive it validate it?
  • What is the failure mode if the system acts on a false positive? Who absorbs that cost?
  • What data do you actually need to make this specific decision - not what data do you have?

The last question is the most important. It reverses the default direction of most builds.

Most teams start with a data inventory and work forward to a model. The correct sequence is to start with the decision, work backwards to the minimum viable feature set, and then audit whether the data you have supports it - or whether you need to instrument new sources before you build anything.

"Get the data pipeline right, and your models improve almost automatically." - Mahi Mullapudi

True. But "right" means designed for the decision, not designed for the data.

GTM engineering vs. data engineering

One more distinction that matters.

The people who should own the decision authority map are not data engineers. Data engineers own the pipeline mechanics - ingestion, transformation, serving, monitoring. The decision layer is owned by GTM engineers, who sit upstream of the pipeline in the value chain: they own the pre-pipeline work of defining what the system is trying to do commercially, how it connects to pipeline creation and qualification, and where the human-machine boundary sits.

Treating this as a data engineering problem - which is exactly what happens when you start with "what data do we have" - is how you end up with technically sound pipelines that produce commercially useless outputs.

If your organisation doesn't have this function clearly defined, the decision authority work defaults to whoever is running the project. Usually someone whose incentive is to ship the build, not interrogate whether it's solving the right problem.

A growth audit that surfaces this gap early is almost always cheaper than the rebuild it prevents. See: When a Growth Audit Finds the Problem You Weren't Looking For

---

The architecture question for any custom AI GTM build is not "how do we move data from source to model to output." It's "what is this system authorised to decide, and what does it hand to a human."

Answer that first. Everything else - the data model, the feature set, the orchestration layer, the monitoring strategy - follows from it.

Build in the other order and you'll build it twice.

Article by

Oren Greenberg

A fractional CMO who specialises in turning marketing chaos into strategic success. Featured in over 110 marketing publications, including Open view partners, Forbes, Econsultancy, and Hubspot's blogs. You can follow here on LinkedIn.

Spread the word

Sign up to my newsletter

Get my 6-part free diagnostic email series.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Get my 6-part free diagnostic email series.

Send it Over