Context, Codification & Cognitive Capabilities

Links and musings on the challenge of building continuous learning engines for agentic AI in the enterprise

Jun 23, 2026

Context & Codification are still hard problems

A lot of enterprise AI investment has gone into models and prompt quality. But there is a growing body of evidence that points to the main constraints on reliability, cost, and organisational capability still being mostly related to context and codification.

Context defines what an AI model knows, when it knows it, how it’s managed, and who owns it. Codification is how we turn rules, processes, workflows and other operational knowledge into systems that AI agents can use to guide their work.

In many ways, the next phase of enterprise AI is less about teaching machines new things and more about teaching organisations how to express what they already know.

Chroma’s “Context Rot” study recently found that 65% of enterprise AI failures in 2025 could be traced to agents losing track of their own reasoning mid-task — not hallucination or training data quality. Performance degrades as token count increases across all major models.

Tech learning hub PrepStack has published a six-part technical series on context engineering, which shows how even small improvements to managing the context window, memory management and agentic collaboration can substantially reduce token costs and hallucinations in long-running and multi-turn agentic workflows.

There are several reasons why it is hard for models to maintain context in long-running tasks, but the absence of real-time data architecture is in many firms is certainly one key limitation.

A recent Gartner report argues that batch architecture for data platforms is one of the main culprits for the ROI gap, as scheduled data releases lead to agents working with stale world knowledge. Forrester makes the same point from the infrastructure side: real-time data streaming is not an optional enhancement but a foundational requirement:

“agents act at digital speed, stale context means wrong decisions at scale affecting customers, operations, workforce, and market.”

However, this is not just a challenge for AI and data infrastructure; it is also about process legibility and observability.

Rudy Kuhn of Celonis argues in Diginomica that most enterprise AI fails not because models are too weak but because they lack knowledge of how work actually flows: which systems are used, in what sequence, with what exceptions. Organisations have over-indexed on data quality when process observability is the more pressing issue. They are discovering that scaling AI requires many of the same disciplines that allowed software systems to scale: versioned knowledge, observable processes, testable workflows, and continuous feedback loops.

Microsoft CEO Satya Nadella has recently started framing enterprise AI advantage as deriving from how we combine token capital (models) with human capital (workflows and learning loops) to improve how work gets done. The implication is that model commoditisation is a feature, not a threat, for organisations that have invested in the context layer on top and therefore own (and can mobilise) their own core IP.

“A frontier without an ecosystem is not stable.”

Governance as a special case of codification

Enterprise AI governance is reaching an important inflection point. The dominant approach of guardrails, human-approval loops, and compliance overlays added to deployed systems is visibly failing. In the past week, we have seen a cluster of signals pointing toward the need for a different architecture: baked-in governance embedded at the workflow level, rather than applied afterwards.

Gartner predicts 40% of enterprise AI agents will be decommissioned by 2027 because of poor governance design, such as organisations applying the same oversight regime to all agents regardless of autonomy, scope, or the level and ‘blast radius’ of risk. This echoes how the RPA and chatbot waves played out: overpromise, under-govern, then scale back.

As an example of the problem, one AI practitioner recently shared their field notes based on building agents at a large manufacturing firm, and concluded that most AI-specific guardrails are theatre, and the actual load-bearing safety infrastructure in production agentic AI is still quite conventional - e.g. IAM, network egress, audit trails, and secrets management.

Traditional software governance was designed to audit the artifact, not the action, but Ken Huang argues this needs to change. Although AI agents generate, execute, and discard code within single sessions, the side effects (database writes, API calls, transactions, etc) can persist indefinitely, so we need validation discipline that covers the side-effect layer, just as we do already in financial trading:

“you don’t review every algorithm line by line; you reconcile every transaction.”

Another recent take on the problem comes from Microsoft, who propose an approach they call Information Flow Control (IFC) to track how data moves through agent networks, enforcing policies at the data level rather than focusing only on action approval - a shift from output guardrails to provenance.

Ultimately, the direction of travel is towards more richly-defined accountability for specific agents — perhaps even legal identity — as we look to the possibility of autonomous agentic markets and agents that can make contracts with each other. Estonia, as you might expect, is already on the case.

But all these approaches, whether applied at the infrastructure, data, process or outcomes layers, ultimately depend on clarity about what the specific rules and guidelines actually are, and how we can deploy them as part of run-time governance, not after the fact.

If you are not already working on codifying governance, for example by creating code-like repositories of different rulesets with auditing and branching and using AI to collate the specific governance blueprint for each agent and context on the fly, then you are probably condemned to waste a lot of time exploring the many inadequacies and frustrations of governance by committee.

Human-in-the-loop governance sounds nice until it inevitably slides into becoming something closer to a Home Owners Association (no plant pots on the terrace!) rather than a smart system to keep autonomous agents on track.

Learning loops as competitive advantage

In the piece quoted earlier, Microsoft’s Satya Nadella also made the point:

“Governance, private evaluation loops, and workflow data [are] compounding assets rather than overhead.”

Organisations that are able to create systems where AI outputs feed back into human judgment, which feeds back into better AI context, which produces better outputs, will be best placed to compound their AI advantage. In other words: learning loops.

For example, a customer support agent proposes responses and next actions for a case. Human operators accept, reject or modify those recommendations, and the outcomes are used to update guidance, workflows and future agent behaviour. Every interaction becomes both productive work and training data for the next cycle. Over time, the organisation is not simply serving customers; it is continuously improving its ability to do so.

AI infrastructure, systems and capabilities are all important, but their speed of evolution and learning is the key differentiator. Right now, there is a ‘tempo gap’ between the speed of AI deployment and the speed of organisational capability absorption as organisations are deploying faster than they’re learning.

As one commentator put it last week:

The companies that win at enterprise AI won’t necessarily be using better models. They’ll be the ones who built better feedback loops — systems where every interaction becomes data, every evaluation becomes a learning opportunity, and every deployment becomes an experiment rather than a one-time launch.

Without strong, rapid learning loops for AI agents, there is a risk people end up spending too much time botsitting, as this study of 6,000 workers from Glean’s Work AI Institute warns. And without rapid learning loops for people working with AI agents, we will not be able to improve our practice and get the most out of both our people and the technology, increasing the risk of slopification.

There is some evidence that AI native firms are able to do this better than others, resulting in smaller, more engineering-led and less managerial teams that produce higher valuation per employee. Hyunjin Kim of INSEAD and Rembrand Koning from HBS studied YCombinator batches W20–F24 and US venture-backed startups in the same 2020-2024 timeframe, finding that AI-native firms were 25% smaller than comparable non-AI startups, carried 13% more engineers, with 15% fewer entry-level workers and 15% fewer managers, and yet reached comparable valuations. The key distinction they found was that the product channel (AI capabilities embedded in what the firm sells) is the primary mechanism for scaling knowledge work without large headcounts, rather than the process channel (AI changing how people work inside the firm).

Do AI agents dream of electric sheep?

If context provides the raw material, codification provides the structure, governance provides the constraints, and learning loops provide the mechanism for improvement, then memory is what allows those improvements to persist over time.

In both human organisations and agentic systems, learning only becomes a durable capability when experience can be retained, consolidated and reused. The emerging question is therefore not simply whether agents can learn, but how they remember.

The question of how agents run learning loops by managing, compacting and evaluating the memory of their actions is fascinating and quite technical; but it is worth thinking about if we hope to create closed-loop agentic processes that we can rely on.

Ken Huang has been writing about this recently, and has shared two introductory pieces about how agents remember here and here. He sees agent memory consolidation as both a technical pattern and an organisational design problem:

“who decides what the agent should remember, and how that memory is governed, matters as much as the mechanism itself.”

Perplexity Brain seems to be trying to made this concrete as a product — a context graph the agent reviews overnight, teaching itself to do the work better to create a learning and improvement flywheel. But this also surfaces an ownership questions: if the learning loop belongs to the vendor, the accumulated operational intelligence accretes to the platform, not the customer.

As Ken Huang notes, yes this is dreaming, but no it does not imply some kind of imaginative inner life that agents share with people.

The Turing Post also has a recent piece on AI models needing sleep to dream, with a round-up of other relevant research into the phenomenon.

At the risk of anthropomorphism, I should confess that I advise my agents not to eat virtual cheese before they sleep, just in case they have nightmares about human AI governance committees, as I sometimes do.

Discussion about this post

Ready for more?