Shift*Academy

How AI Can Make Organisational Capabilities More Navigable

Cerys Hearsey — Tue, 02 Jun 2026 14:33:53 GMT

Think about the last time you joined a new organisation. Not the formal onboarding process or the HR paperwork, but how long it took before you genuinely understood how things worked. Who knows what they are talking about? Which teams influence decisions? Where to find previous work? How to navigate approvals? Who to ask when something unexpected happens?

For many people, this takes months to acquire, sometimes years. This is curious when you consider how quickly we learn to navigate almost everything else.

Most people can become comfortable using a new digital service within minutes. Whether using Spotify, Google Maps, or an online retailer, the underlying complexity is hidden behind an interface designed to help us find what we need. Organisations are different. Despite decades of investment in digital transformation, knowledge management, and collaboration platforms, the experience of navigating a large organisation remains surprisingly manual. People still rely on personal networks, accumulated experience, and knowing the right person at the right time, not because organisations lack capability, but because much of that capability remains difficult to discover and difficult to access.

What if the next frontier of organisational design is not creating more capability, but making existing capability dramatically easier to access? For most of organisational history, that question had no satisfying answer. That is starting to change.

We Improved Software UX But Neglected Organisational UX

Over the past three decades, one of the great success stories of the digital age has been user experience design. Early software required training courses, specialist knowledge, and thick instruction manuals. As digital products became more competitive, this changed. Complexity disappeared behind experiences designed around user intent rather than system architecture.

Yet organisations evolved differently. As they grew, they accumulated processes, functions, governance structures, and specialist expertise. These investments created real value, organisations became more capable, more efficient, more compliant. But the effort focused overwhelmingly on creating capability rather than making it easier to access.

Many organisations now possess extraordinary internal capabilities: specialist teams, sophisticated processes, vast knowledge, mature governance, and years of accumulated experience. Yet employees frequently struggle to discover what is available, understand how to access it, or navigate the pathways needed to get things done. In software, we learned that capability alone is not enough. Powerful systems fail when people cannot easily use them. User experience became the discipline that bridged this gap. Organisations face a similar challenge and have largely not yet met it.

The Hidden Cost of Complexity

Capability and accessibility are not the same thing. An organisation may contain exactly the expertise required to solve a problem, but the people facing that problem may have no practical way of finding it. It may have documented a process, but people remain uncertain which version applies. It may have solved a problem before, but teams unknowingly solve it again. The capability exists. The experience of accessing it remains difficult.

This gap creates three recurring friction points that compound as organisations grow.

The first is navigation: the capability discovery problem. Most organisations invest heavily in creating expertise, yet comparatively little attention goes to helping people find it. Navigation becomes dependent on personal networks and accumulated experience. Long-serving employees know who to contact and where knowledge lives. Newer employees rely on asking around or stumbling across useful information by chance. The result: teams duplicate work, problems take longer to solve, and valuable knowledge never reaches the people who could use it.

The second is coordination: the capability flow problem. Most organisational work no longer happens within a single team. Delivering a new product or implementing a strategic initiative requires contributions from multiple disciplines and specialist groups. The challenge is rarely a lack of expertise; more often, it is connecting expertise effectively. Teams become increasingly productive, yet organisational progress fails to keep pace. A team can be highly productive whilst remaining poorly connected, valuable work created quickly, then delayed by dependencies, handovers, and the difficulty of aligning multiple groups around a shared objective.

The third is memory: the capability retention problem. Organisations often describe themselves as learning organisations, yet many struggle to remember. Projects are repeated because previous lessons cannot be found. Decisions are revisited because the original reasoning has been lost. Experienced employees leave, taking valuable context with them. The issue is not that knowledge disappears; it is that much of it remains trapped within people, teams, and systems that are difficult to access once the immediate need has passed. Organisations generate more knowledge than ever before, yet people often struggle to access the knowledge that matters most.

Together, navigation, coordination, and memory explain why many organisations struggle to convert capability into momentum. The expertise exists. The resources exist. The intent exists. Yet progress slows because capability cannot flow efficiently through the system.

Why AI Changes The Equation

None of these challenges are new. Various attempts have been made to address them - intranets, knowledge management systems, process repositories, service catalogues. These investments often delivered value, but they shared a common characteristic: people still had to navigate the system themselves. They needed to know where to look, which repository was relevant, which process applied. The burden of interpretation remained with the user.

What has changed is the underlying capability of the technology.

Previous tools were essentially sophisticated filing systems. They could store and retrieve information, but only if you already knew roughly where to look and how to ask. Large language models work differently. Trained on vast amounts of text (documents, conversations, explanations, decisions) they understand intent rather than just matching keywords. When someone asks “who has done something like this before?”, the technology can reason about what that question means, connect it to relevant people, projects, or documents, and return something genuinely useful even when the question is loosely formed. The interface shifts from navigation to conversation.

This matters enormously in an organisational context, because most employees most of the time do not know exactly what they are looking for. They know what they are trying to accomplish. The friction has always been the gap between that intent and the structures (repositories, directories, process maps) through which the answer was buried. When that gap closes, the experience of the organisation changes.

“I need to onboard a supplier in Japan.” “Who has experience with this customer?” “What happened the last time we attempted something similar?”

Historically, people adapted themselves to the structure of the organisation. Increasingly, organisations may be able to adapt to the needs of the individual instead. Those familiar with earlier thinking about organisational operating systems or organisation-as-a-platform will recognise this moment. The conceptual frameworks have existed for some time. What was missing was not the vision but the enabling layer, the practical means by which intent could be translated into action without requiring every individual to become an expert navigator of systems they should never have needed to understand. That layer is now arriving.

Capability creates potential. Accessibility creates value.

Read on for some practical steps strategic, operational and transformation leaders can take to make their organisational capabilities more legible, connected and accessible.

Agents are Easy; the Agentic Enterprise is Not

Lee Bryant — Tue, 26 May 2026 14:17:47 GMT

Agentic AI is showing great promise in coding and personal productivity, but the shift from personal to organisational agent usage in the enterprise is harder and more complicated than it looks. It also heralds a profound shift in what we consider to be ‘work’ and ‘workers’, and that will take some getting used to.

How should leaders balance internal demand for agents in the short term with the need to build out the invisible infrastructure they rely on to make them safe, relevant and reliable over the long term?

A social technology that could transform human collaboration?

Rohit Krishnan is fascinated by this new cadre of what he calls Homo Agenticus Sapiens. He recently shared a thought-provoking ‘live’ list of ways in which agents are different to people and what this means for how we coordinate them. Such observations are useful in helping us understand how much we have to learn about agentic AI.

We often think of agents as units of automation, but they are more than that, and could play an interesting role in helping us improve the experience of work.

The application of even limited synthetic intelligence to process management and the coordination of work could enable us to run smarter organisations with less of the bureaucratic management overhead that is such a cost drag today. But also, by making the infrastructure of organisational coordination more machine-like, we can free people up from so much of the pointless busy work they do today and let them focus on what humans do best.

At least that is the hope.

Henry Farrell and Cosma Rohilla Shalizi considered this from a social and political science perspective in their recent (substantial) paper entitled AI as Social Technology, arguing that “AI does not hold out the promise of truly post-human bureaucracy.” It is an excellent read, but perhaps too distracted by the craziness of US politics, policy and the Doge episode to fully consider the humanising potential of de-bureaucratisation and peer-to-peer coordination in less dystopian environments. But the authors raise some quite reasonable questions:

The interesting questions involve the interaction between the ways bureaucracies abstract reality and the coarse-grainings that new AI applications will lead to. When will one system compensate for the deficiencies of the other? When will their different flavors of lossiness prove mutually reinforcing? What new problems may result from combining very different systems for managing complexity that are themselves highly complex? How will power relations change as a result? Who will benefit, and who will be hurt? These and other questions might be asked, pari passu about the relationship between AI and other social technologies such as markets and democracy too. We absolutely ought to start asking them.

The potential for agentic AI to support human collaboration was outlined and explored recently by Timour Kosters, who is studying how to use it to bring people together to achieve common goals without top-down management control.

The current discourse about AI agents centers mostly on personal agency … But personal agency is just the beginning. The question I am more interested in is what happens when agents become coordination technology: shared tooling that helps groups of humans achieve their goals. Can agents enable new interfaces for collaboration, helping people turn shared context into shared action? If agents can expand what we can do together, they could become new infrastructure for communities, cities, movements, polities, and even democracies.

I share this optimism that agentic technologies could bring exciting new horizons for cooperation by doing the boring but necessary background work of coordination, information aggregation and admin.

We tend to think a lot about agentic capabilities, but we also need to focus on agentic responsibilities if they are to co-exist with us in a messy human world.

In a recent piece for O’Reilly Radar, Artur Huk builds on Carl Hewitt’s Actor model to describe what you might call a Rendanheyi-infused agent model where responsibilities and micro-contracts bring greater governance and control to multi-agent interactions:

The Responsibility-Oriented Agent (ROA) does not invent a new distributed-systems primitive. Instead, it composes proven patterns—bounded actors, RBAC-style authority envelopes, audit trails, and execution-boundary validation—around an unpredictable LLM core. In truth, ROA is closer to a decision actor than a full computational actor: It maintains its own internal state but does not directly mutate the external world. Within a stable role, a fixed mission, and a machine-enforceable contract, it receives business events, reasons over relevant context, and emits a PolicyProposal for the Runtime to validate.

Given the success of the Rendanheyi model, perhaps agentic AI coordination can help ordinary firms enjoy some of its benefits without visionary leadership or wholesale organisational re-design.

Building the new vs. changing the old

It goes without saying that the challenges of building agentically-enhanced organisations should not to be underestimated.

If you are building agentic business infrastructure from scratch, then there are at least some architectural and infrastructural options available that are good enough to build on and evolve as new tools, tech and capabilities enter the market. Anthropic is rapidly evolving its enterprise offerings and tools, but Google and Microsoft are also developing strong platforms.

Google’s recent I/O event was heavily focused on agentic AI, and this seems to be a key focus for bringing together the company’s various AI tools into an integrated platform.

Microsoft is also working hard to evolve their agentic capabilities, with the recent release of Agent 365 bringing a control plane to Copilot studio and their Agent Framework; plus they have the advantage of being the default platform choice for most larger enterprises.

But when you really engage with the current reality and constraints inside large firms who typically suffer from legacy architecture and tools, an over-reliance on lowest-common-denominator last-gen SaaS platforms, and with a patchy history of outsourcing a lot of process work, then you start to sympathise with IT functions who are trying to respond to the growing clamour from their colleagues for agentic AI.

This is what HBR described earlier this year as the last-mile problem for enterprise AI, and their advice to treat this as an opportunity to do clean-sheet process redesign makes a lot of sense.

Just scanning my feeds for the past week is enough to make my head spin in terms of the emerging challenges agentic projects are facing:

The pipeline tax: why putting together real-time data pipelines to support agentic AI is harder than we thought, and methods like RAG are not going to cut it.
Context and memory persistence for long-running tasks: why we need better techniques for avoiding agentic drift over time.
Don’t blame the model blame the harness: why harness engineering requires new developer skills to support agentic AI.
How to prevent / manage agent sprawl if, as analysts predict, large firms end up with 100k+ agents over time.

However, the more pressing challenge, and in some ways the most worrying for many CIOs because of its unpredictability, is ballooning token costs.

Azeem Azhar and team have ben tracking this recently, and also looking at how elasticity of demand means more agentic apps become economical as token costs fall, which means the total spend continues to increase.

Today’s architectural decisions will shape the future of the agentic enterprise

Surely intelligence is not something to be outsourced over the long term, especially given the unpredictability over compute and token costs?

If we break down the level of intelligence individual agents need to do most tasks in the enterprise today, small or open models that run on your own infrastructure are sufficient for the most part, with the added benefits of more local post-training, lower latency and greater governance and control, in addition to avoiding what could be a very expensive form of vendor lock-in. Using external frontier models only for higher-level reasoning and knowledge synthesis could minimise or at least hedge against the impact of rising token costs.

But it is a brave CIO who sets a course for model sovereignty today in such a fast-changing environment. Right now there is so much work to be done at the infrastructural, data, services and apps/agents levels - all whilst keeping the lights on and putting out fires - regardless of model choice. There is a long journey of discovery ahead for agentic AI; it will not be a one-and-done shift, but more of a gradual transition.

One way or another, we need to think in terms of platforms and architectural layers, and focus on building out our own capabilities rather than ending up dependent on external vendor lock-in for what are likely to become core features of the organisation.

At the same time, however, CIOs face strong internal demand for AI tools today, and organisations need to get people thinking and learning about what agentic AI can do. If they respond to this demand by buying stand-alone point solutions that don’t deliver on integration promises, or continue the dependence on last-generation SaaS platforms with AI magic dust sprinkled over 1990s interfaces, they will be storing up hard problems (and an unmanageable estate) for later.

Of course everyone wants Claude Cowork and other advanced frontier models! For software development that is probably a good, if expensive, choice. But for most other enterprise use cases, a wholly owned open model approach, or running with the Microsoft stack if that is already the basis of your digital workplace, is likely to be enough to handle most use cases and capability needs.

A better interim solution might be to stand up just enough infrastructure in terms of connected data, MCP servers and automatable service end-points to allow people to start using personal agents in their work, and begin sharing skills, context and ideas. This would already be a step up from chatbots, and perhaps buy time and understanding for the hard work of creating the underlying infrastructure that agentic AI will need to go from personal to organisational use cases and agents.

The Soft Edge of AI Transformation

Cerys Hearsey — Tue, 19 May 2026 14:08:39 GMT

A pattern is becoming visible across the organisations making real progress with AI. It’s not what most transformation programmes are designed to produce, and it’s not what most leadership teams are being shown.

These organisations have not necessarily moved fastest on technology. They do not always have the clearest AI strategy, the most advanced tools, or the highest rates of individual adoption. What they tend to have is something harder to name and harder to copy: the right conditions for capability to accumulate, for intent to travel without distortion, and for gains at the edge of the organisation to become gains at the centre.

This sits awkwardly with the dominant approach to AI transformation, which is built on hard edges - clear use cases, measurable adoption rates, defined ROI, governance frameworks, structured rollout plans. None of these things are wrong in themselves. But together they reflect a mental model of how transformation should work that is increasingly at odds with how AI actually creates organisational value.

That model is Pipeline Thinking. Strategy enters at one end, capability is built in the middle, performance emerges at the other end. Each stage is legible, sequential, and - in principle - controllable. It is the model that has shaped most large-scale change efforts for decades, and it feels rigorous precisely because it has hard edges.

But organisations do not behave like pipelines. And in the age of AI, the cost of assuming they do is rising fast.

Pipeline Thinking and Its Limits

Pipeline Thinking has a seductive logic. Define the strategy clearly enough, build the right capabilities, communicate intent effectively, and execution should follow. If it does not, the diagnosis is usually the same: the strategy was unclear, the capabilities were insufficient, or the communication failed. The solution is more of the same, but better.

This assumption shapes what gets counted as a legitimate intervention, e.g. hard skills, certified training, platform adoption, defined processes. These are the things that Pipeline Thinking can see and measure. What it systematically underweights is the more complex layer in between: the conditions that determine whether strategy actually takes hold, whether new capabilities spread and reinforce one another, and whether individual gains accumulate into organisational performance.

In stable environments, organisations could compensate with informal structures that absorbed variation, experience and familiarity that held things together, and managers stepping in to reconnect fragmented work.

The pipeline worked, more or less, because the gaps were filled by things no one had explicitly designed - but AI removes the sequential stability that allowed this to function. Individuals and teams can now experiment more rapidly, develop new ways of working more quickly, and generate outputs at a speed that was not previously possible. Capability accumulates fast, but so does inconsistency. Different interpretations of the same objective become embedded simultaneously. Local optimisation becomes easy, while coherence becomes harder.

The challenge is not that organisations become less capable. The challenge is that capability can now evolve faster than the conditions required to integrate it. Pipeline Thinking has no good answer to this challenge. It can accelerate the inputs, but it cannot address what happens in between.

Conditions Thinking: The Alternative Frame

The alternative is not to abandon rigour, but to apply it to the right things.

Conditions Thinking starts from a different premise: that performance is not the output of a pipeline, but an emergent property of the environment in which people and teams operate.

It does not flow directly from strategy or capability. It emerges from how those things interact, how intent travels through the organisation without distorting, how new capabilities spread and reinforce one another, how individual gains accumulate rather than remain isolated.

This reframe matters because it changes what leaders need to attend to:

Pipeline Thinking asks: have we communicated the strategy clearly? Have we built the capability? Are adoption rates on track?
Conditions Thinking asks: what allows intent to propagate without becoming distorted? What determines whether new capabilities spread or remain isolated? What is preventing individual gains from compounding into organisational performance?

These are softer questions, which help identify the conditions for success, but are often dismissed in traditional change thinking. They do not submit easily to the hard-edged measurement and control that Pipeline Thinking prefers. But soft, here, does not mean vague or secondary. It means distributed rather than centralised, relational rather than procedural, and accumulative rather than instantaneous.

Importantly, Conditions Thinking describes precisely the terrain where AI transformation is most likely to create (or fail to create) durable and sustainable organisational value.

What the Conditions Actually Are

Some of the conditions that matter are structural:

The shared artefacts that carry context across boundaries
The decision processes that make local choices visible to others
The feedback mechanisms that allow the organisation to understand whether it is moving coherently

Others are cultural:

The norms that determine whether new practices spread or remain isolated
The shared language that shapes how problems are understood
The habits that determine whether learning accumulates or dissipates

And some are coordinative:

The specific points where work connects across roles, teams, and functions

In an earlier edition, I wrote about the coordination layer: the part of the organisation where work is connected rather than created, and where most current AI initiatives are not yet operating. In that edition, we looked at agentic process surrounds, narrow AI interventions placed at points of high coordination leverage, as one practical response to this. Not broad agents that manage entire processes, but small ones that carry context across handovers, prepare inputs for recurring decisions, or surface divergence before it becomes a problem.

Consider what happens to a recurring decision-making process when three people on the same team are each using AI differently - different prompts, different tools, different assumptions about what a good output looks like. The decision still gets made, the output still arrives on time, but the criteria have quietly diverged. Context doesn’t transfer cleanly from one cycle to the next. The manager becomes the integration point by default, spending significant time reconciling outputs that were never designed to be reconciled. Nobody names this as a conditions failure. It looks like a coordination problem, or a communication problem, or simply a busy week. But what it reveals is an environment where individual capability has advanced faster than the shared frame of reference required to make that capability compound.

These conditions do not execute strategy. They determine whether execution remains coherent as strategy moves through the organisation. Strategy survives through systems of reinforcement more often than through systems of instruction.

Which Thinking Are You Actually Using?

The distinction between Pipeline Thinking and Conditions Thinking rarely shows up in strategy decks. Both can produce the same language: capability building, adoption, transformation, impact. The difference shows up in what leaders choose to attend to, and what they leave implicit.

The difficulty is that Pipeline Thinking is not just a structural habit. It is a comfort habit. It offers legibility: visible inputs, measurable outputs, defensible decisions. Conditions Thinking requires tolerating a degree of ambiguity that feels uncomfortable when boards want hard numbers and leadership teams are under pressure to show progress. Asking “what conditions are preventing intent from propagating?” is a harder conversation to have than “are adoption rates on track?”, even when it is the more important one. The diagnostic questions below are useful partly because they make that discomfort productive.

CHROs as Systems Architects not Programme Owners

Lee Bryant — Tue, 12 May 2026 14:07:29 GMT

Enterprise AI-led transformation is changing the focus of most leadership roles to a greater or lesser extent, but one of the most impacted is likely to be the HR function and CHRO roles in particular.

CHROs are being asked to lead AI transformation, ensuring workforce readiness, embedding new tools into everyday workflows, and holding the culture together through a period of profound disruption. It is a significant mandate, but it sits uncomfortably alongside a question that has been building for years and that AI has now made urgent: what exactly is HR for?

The old answer was that HR owns the programs; talent acquisition, compensation, learning and development, each with its designated custodian, but that no longer seems strategic enough. Organising around programme ownership rather than whole system performance has left a critical gap, one that AI adoption and change efforts are now falling into.

Some commentators argue that the existing mandate and role focus can be expanded to cope with the manifold challenges posed by AI transformation.

Oracle’s Yvette Cameron recently wrote for Diginomica that HR can step up in its current form to play a leading role in AI adoption through workforce readiness, personalised workflow integration, and a supportive culture, and shared the following starting points:

Build trust through transparent AI
Embed AI in everyday workflows
Scale AI pilots with human insight
Follow a practical roadmap for HR leadership

These are necessary but not sufficient. They assume that the existing model of HR, organised around programmes and interventions, can stretch to accommodate AI. Increasingly, that assumption looks fragile.

Others think it is time for a reset and a re-focus within HR leadership.

TalentSherpa recently shared an article about the need for a new Human Operating System, arguing that the CHRO should evolve into a human systems architect without accountability for outcomes or risk falling down the strategic food chain.

The old model organized HR around program ownership. Someone owns talent acquisition. Someone owns compensation. Someone owns learning and development. These roles are defined by the programs they run, not the outcomes those programs produce.
The new model organises around system performance.

At its core, this is a shift in responsibility. Not just from programmes to systems, but from delivery to stewardship.

If leadership in the age of AI is increasingly about world-building (shaping the conditions, constraints, and environments in which people and machines operate) then the CHRO’s role becomes one of enabling and stewarding that world.

Not just owning outcomes directly, but ensuring the system in which those outcomes emerge is coherent, legible, and sustainable.

Missing Infrastructure to Support Human Performance

When you look closely at what determines whether AI actually takes hold in an organisation, the same missing layer keeps appearing. It isn’t the technology or even the strategy, but more often the human infrastructure underneath that is holding back progress, including the conditions and incentives that encourage people to learn, adapt, make sense of change, and to keep performing well through it.

In practice, this missing infrastructure shows up in identifiable ways:

Teams using AI differently with no shared standards of judgment
Little visibility into how decisions are being made with AI support
No mechanisms for comparing, learning from, or improving those decisions over time

Ashley Goodall points out that we know a great deal about the ingredients of human performance, but most of that knowledge never makes it into management or HR practice. Human performance isn’t treated as an HR problem, so the accumulated understanding of how to help people do their best work has nowhere to land:

The opportunity is clear: we know an awful lot about the ingredients of human performance—but much of what we know doesn’t make it into our management or HR practices, with the result that too many workplaces actually impair human productivity. This is in large part because human performance isn’t considered an HR problem, and so the accumulated knowledge of how to help people do their best work has no place to land inside a typical organization.

In other words, we already understand a great deal about human performance, but we have not designed our organisations to operationalise that knowledge. That gap matters enormously in an AI context, because adopting AI is a continuous, social, often disorienting process of figuring out what your work means now, what you’re responsible for, and what good judgment looks like when a machine is doing some of the labour.

We are seeing too many policies mandating the use of AI without providing the learning, guidance and context that people need to make sense of how it connects with, and ideally enhances their existing work.

AI adoption must also engage with the cognitive impact - positive and negative - of using AI for more and more of our work. In HBR this month, Guy Champniss lists six areas of psychological cost relating to AI usage, and points to existing knowledge and practices in behavioural science that could help mitigate them if applied:

Cognitive Debt
Autonomy Debt
Competency Debt
Relatedness Debt
Credibility Debt
Professional Identity Debt

It is impossible to predict at this point just how AI will transform the workplace. However, one thing is certain: understanding and building the right human infrastructure will be as important as picking the right AI tools

What Happened to Social Learning?

Another area of missing HR infrastructure relates to social learning and collaboration. We did a lot of work in the early 2000’s on social learning as a way to accelerate digital collaboration skills within organisations, and it was a very effective area of intervention. But it feels like most companies let this learning slide and reverted to their old ways as the economy tightened after 2008.

More recently, we have written a lot about the relationship between learning, HR, and change; and how AI could transform each of them. Learning is not a separate activity that happens outside the flow of work, but it needs to be integrated with knowledge development and change to have a direct impact on human performance.

As Jane Bozarth put it for the Association for Talent Development, sense-making and meaning need to come before action, including in relation to AI adoption.

This is why work improves when learning happens socially. When colleagues talk through cases, narrate decisions, and share lessons learned, they begin to see patterns sooner. They recognize implications earlier and, ultimately, make better judgments.

She lists a few of these areas that make up a social learning infrastructure:

Networks that connect people across boundaries
Communities of Practice that sustain professional dialogue
Habits of working out loud that make thinking visible
Cultural signals that curiosity and sharing are valued rather than risky

These capabilities don’t sit neatly inside any single HR program, so often nobody owns them, which is precisely why they don’t get built.

The influential learning consultant Josh Bersin goes further, arguing that our entire approach, philosophies, tech stack, and operating models for learning are out of date.

HIs latest research report into corporate learning found that 74% of companies tell us they are not keeping up with their company’s demand for new skills. The traditional response of more training, better content, or a refreshed LMS won’t close that gap. The problem isn’t a shortage of courses, but the absence of a dynamic knowledge infrastructure where information flows across boundaries, people can explore and question in real time, and learning is woven into the flow of work rather than scheduled around it.

Our skills challenge at work is not one of “learning” or “training.” Rather it’s a problem of dynamically sharing information, enabling people to explore, question, and apply new ideas. The traditional pedagogical paradigm of “training” is holding us back.

From Program Owner to System Architect

This is where the CHRO’s role needs to evolve.

The shift is from owning programs to architecting system performance, which means taking accountability not for whether the learning platform has good content, but for whether the organisation actually gets better at what it does. In that sense, the CHRO becomes a key steward of the organisation’s operating environment, the human layer of the world leaders are now being asked to build.

Having an AI adoption programme and dutifully tracking its roll-out status is less important than ensuring people are genuinely equipped — psychologically, practically, socially — to work well alongside AI over time.

That is a bigger job that requires a different relationship with data, with line leadership, and with the outcomes the business cares about. It means HR stops being the function that runs things and becomes the function that understands, designs, and continually improves the conditions under which people perform.

Perhaps due to its people mandate and CHRO seat at the top table, HR is uniquely positioned to bring structure to the balance between AI innovation, culture, and governance. But only if HR is willing to claim that territory with genuine authority, rather than waiting to be handed it.

The CHROs who will matter most in the next five years won’t be the ones who ran the best AI programmes.

They will be the ones who recognised that AI does not fail at the level of tools or training, but at the level of systems, and who took responsibility for building the human infrastructure those systems depend on.

HR will not lose relevance because of AI. It will lose relevance if it continues to organise around programmes in a world that now runs on systems.

The Missing Middle of AI Adoption

Cerys Hearsey — Tue, 05 May 2026 14:03:57 GMT

AI has helped speed up the edges of work, but not yet the system that holds it together. Individuals can now produce artefacts at extraordinary speed, but the moment that work needs to be shared, combined, or acted on, the old constraints reappear. Decisions stall. Context fragments. Coordination absorbs the gain.

The result is a strange pattern: more activity, without a corresponding increase in momentum. This is partly a question of where AI has been applied. Most adoption today sits at one of two extremes. At one end, organisations pursue large-scale transformation: redesigning processes, introducing automation, and attempting to remove bottlenecks from entire workflows. At the other, individuals experiment at the edge: using AI to support their own work in small, often isolated ways.

Both directions create value but they operate at very different levels. Transformation operates at the level of the system. Individual use operates at the level of the task. Between them sits a third space that is far less visible, but more consequential: the shared work of the team.

This is where the next generation of human-AI collaboration is most likely to emerge. Not through a single, centrally approved transformation use case, and not through everyone individually prompting their way through the working day, but through narrow, focused forms of AI support that help teams coordinate specific pieces of shared work more effectively: an agent that maintains context across a handover; an agent that prepares inputs for a recurring decision; an agent that tracks divergence between workstreams. Small capabilities, placed at the points where work connects.

This middle ground is rarely the focus of AI initiatives. It is too granular for transformation programmes, and too collective for individual experimentation. This space needs a clearer name if it is to be worked on deliberately.

The Coordination Layer

The coordination layer is the part of the organisation where work is connected rather than created. It is where individual outputs are brought into relation with one another, where decisions are made in the context of other decisions, and where progress depends not only on the quality of individual contributions, but on how effectively those contributions come together.

This is also the layer where centaur teams become real, or fail to. A team does not become a human-AI team simply because individuals use AI tools, or because a process has been automated somewhere upstream. It becomes one when AI starts to support the shared coordination of work: helping context move, helping decisions stabilise, helping people understand what has changed, what matters, and what needs attention next.

This layer is present in almost every team, although it is rarely named as such. It can be seen in the flow of updates that turn activity into a shared understanding of progress, in the handovers that move work from one role to another, and in the recurring decisions that shape priorities, trade-offs, and direction. It also exists in the less visible forms of coordination — the informal knowledge carried through conversations, habits, and experience, relied upon even when it is never fully articulated.

When the coordination layer works well, decisions feel connected rather than isolated, context is preserved rather than reconstructed, and effort accumulates over time instead of dissipating across boundaries. When it works poorly, work fragments, decisions are revisited, context is repeatedly rebuilt, and progress slows because the connections between individuals are weak.

This layer has always existed, but it has rarely been treated as something that can be deliberately designed. Instead, it tends to emerge through a combination of process, habit, and individual intervention. Managers step in to resolve ambiguity, teams develop informal ways of keeping each other aligned, and work holds together through experience and continuous adjustment.

AI enters this layer in an unusual way, exposing the mostly implicit or informal ways in which work outputs are connected and combined. As long as coordination remains informal and partially invisible, it is difficult to improve in any systematic way and difficult for AI to meaningfully participate in. Systems can generate outputs with increasing speed and quality, but they struggle to integrate those outputs into a flow of work that depends on shared context, judgment, and timing. This helps explain why so many early gains from AI remain local.

The Failure Mode

If the coordination layer is where work comes together, then most current approaches to AI are operating around it rather than within it. This pattern has been visible in previous waves of digital transformation, particularly in how organisations approached the idea of “use cases.”

At the strategic level, use cases focused on large, cross-cutting ambitions, often framed around ideas such as a single face to the customer. At the other end, individual use cases were relatively easy to identify and act on. Between these sat a more complex space: the level of key processes and shared workflows, where work moved across teams, functions, and systems. This was where coordination was most critical, and where the underlying structure of work was often least visible.

These process-level interventions touched many people, relied on partially visible forms of coordination, and were frequently underpinned by informal practices that were not fully documented or understood. Changing one part of the flow risked unintended consequences elsewhere - the spreadsheet, workaround, or habit that quietly held a process together. In practice, this meant the middle ground was often left under-explored. Not because it lacked value, but because it lacked clarity, ownership, and safe ways to engage with it.

Where organisations made progress, it was typically because this layer was made visible in a structured way. In a technique we call social process surrounds, we break down key processes into their component stages, identifying where collaboration, data sharing, and knowledge flow are failing, and then mapping where small, targeted interventions could improve how work connects across those stages.

The same pattern is now re-emerging in the context of AI. The dominant use case logic still pulls organisations towards the extremes. What is harder to legitimise is the team-level use case: narrow enough to be specific, but collective enough to require shared design.

The consequences are now more visible. Work moves faster at the edges, but slows as it comes together. Outputs arrive in greater volume, but require more effort to interpret, reconcile, and integrate. Decisions are made more quickly in isolation, but take longer to stabilise when they interact with other decisions. As individuals optimise their own tasks, variation increases — and without a shared frame of reference, these local optimisations introduce small inconsistencies that must be resolved through additional coordination.

Managers, often without formal recognition of the role they are playing, become the point at which this gap is managed. As the volume and variability of work increases, so too does the demand placed on this form of intervention. People are expected to adopt AI, but are left to do so individually. They experience gains in their own work, but also an increase in the effort required to stay aligned with others.

Working on the Coordination Layer

If the coordination layer is where work either holds together or breaks down, then the first step is to learn how to see it. This layer does not present itself as a single system that can be redesigned in one move. It is distributed across workflows, roles, artefacts, and habits, and for this reason tends to remain partially invisible in how work is described or measured.

A more effective starting point is not to treat it as a system to be replaced, but as a layer to be observed. This requires a shift in attention away from what work is being done, and towards how work moves and flows: where it slows, where it fragments, where context is lost or reconstructed, and where decisions depend on inputs that are not consistently available.

What becomes visible through this lens is a pattern of disconnection. Information that is assumed to be shared turns out to be fragmented. Workflows that look linear are, in practice, iterative and contingent. These coordination points are where the most useful AI opportunities are likely to sit, not broad agents that “manage the process,” but narrow agents that help with one recurring coordination burden: assembling the right context, preparing a decision, carrying information across a handover, or making divergence visible before it becomes a problem.

From Social Process Surrounds to Agentic Process Surrounds

Focusing on social process surrounds was particularly valuable in precisely the areas that are now coming back into focus: key processes that spanned multiple teams, where coordination was fragmented, partially visible, and often dependent on informal practices that no one wanted to disrupt without fully understanding.

What made this work was not the scale of the intervention, but its placement. By improving how work connected at specific points in the flow, it became possible to increase coherence without changing the underlying process itself. AI changes what this kind of approach can do.

Where social process surrounds focused on improving collaboration between people, an AI-augmented surround can begin to participate more directly in coordination itself. This does not mean creating a single agent to own or optimise an entire process. More often, it means placing narrow agents around the process at specific coordination points, where they can maintain a shared view of work, surface relevant context, support the aggregation of inputs, or highlight divergence from expected patterns.

These coordination points tend to appear in recurring moments: preparing inputs for a decision that draws on multiple sources; handing work from one team to another with enough context to avoid rework; consolidating updates into a shared view of progress; identifying where parallel workstreams are beginning to diverge. In most organisations, these moments are handled through a combination of manual effort, experience, and follow-up.

These are precisely the points where narrow AI capabilities can be most effective:

An agent that prepares and structures decision inputs before a meeting.
An agent that carries forward context across a handover so it does not need to be reconstructed.
An agent that flags inconsistencies between updates from different teams.
An agent that maintains a shared, current view of work as it evolves.

Individually, these are small interventions. But because they sit within the flow of coordination, their effects extend beyond the immediate task. The surround becomes less like a static collaboration space and more like a living coordination environment, not replacing the core process, and not removing human judgment, but helping the team carry the connective work that would otherwise depend on memory, manual follow-up, and individual intervention.

These interventions are often smaller than expected, but they operate at points of high leverage. They are exactly the sort of narrow AI interventions that make centaur teams a reality.

Read on to learn about places to get started.

Embrace the Human to Overcome the AI Capability Absorption Gap

Lee Bryant — Tue, 28 Apr 2026 14:30:47 GMT

The Capability Absorption Gap in enterprise AI is widening, not narrowing, as model and tool development outstrips the ability of incumbent business leaders to adapt to what it makes possible.

Adoption programmes are not cutting it, and their focus on getting people to use the tools that companies have licensed is too tactical. Instead of adoption we need adaptation, and conventional ‘change management’ is not going to get us there.

We are making tremendous progress in AI tech, but this is currently outstripping the ability of organisations to deploy it.

Models and Agents Roundup

The past couple of weeks have seen the launch of major model updates from OpenAI and Anthropic - GPT-5.5 and Opus 4.7 - and a major new milestone for open models with DeepSeek 4.

Ethan Mollick rates GPT-5.5 highly, and from reviewing OpenAI’s prompting guide for 5.5, it seems to be a powerful model that responds well to sophisticated use cases. And DeepSeek’s new models are an order of magnitude more efficient than leading models from Anthropic and OpenAI.

Google has also recently released all kinds of stuff, from model updates to an enterprise agent platform. For an insight into their strategy, Ben Thompson’s interview with Google Cloud CEO Thomas Kurian is worth a read.

Meanwhile, the much-hyped Mythos model is not as dangerous as we were led to believe. Anthropic’s super-scary bug hunting model Mythos is shaping up to be a nothingburger, according to the Register.

The pattern is clear: capability is compounding rapidly across performance, efficiency, and deployability. The gap is becoming organisational.

Intelligence on Tap, but the Plumbing is Faulty

Model progress is amazing, and there is a lot more going on in the open models world than just DeepSeek, so I remain of the view that most enterprise AI usage could eventually be driven by open and small models locally owned and hosted, supplemented by calls to proprietary models where needed. Both DeepSeek and Apple’s slow-burn edge computing strategy for on-chip AI point to huge potential for more efficient AI computing.

But for now at least, the Capability Absorption Gap in most organisations outside of software engineering means that enterprise AI usage is developing far slower than the technology. And given the pressure on CIOs to demonstrate ROI, many of them could end up relying on heavily compromised but good-enough Frankenstein’s Monster solutions from the main enterprise platform vendors.

Another recent Enterprise AI adoption research report shows companies struggling with adoption:

Despite near-universal belief in AI’s potential, most organizations are struggling to translate adoption into real business value. Executives are facing growing pressure and challenges around AI strategy, productivity expectations, security and governance, and shifting power dynamics.
The 2026 survey findings reveal 79% of organizations face challenges in adopting AI — a double-digit increase from 2025 — with 54% of C-suite executives admitting that adopting AI is tearing their company apart. This is despite the fact that 59% of companies are investing over $1 million annually in AI technology.

If we continue down the incremental change management route and target marginal productivity gains from applying AI to existing ways of working, then we could end up with slightly leaner versions of last generation organisational systems, rather than better organisations overall.

That’s what happened with the initial electrification of factories from the 1880s onwards, and it took until the 1920s for major productivity gains to arrive, once factory owners had re-designed their work systems.

Technology diffusion is hard enough, but the re-tooling or re-design of organisational operating systems to really take advantage of AI is a big challenge for leaders whose whole careers have been shaped by navigating a bureaucracy.

Even the question of what this means for jobs is hard to answer at this liminal moment. Hiring and firing in response to AI is all over the place, which prompted the Financial Times to declare recently that the jobpocalypse narrative has been over-done:

“Can AI do this task?” is a useful starting point for thinking about how it might impact employment, but it is an ambiguous signal that forms only one part of a large and complex picture. Considering the other factors that can shape job growth, directly or indirectly, helps to explain why thus far those occupations that are most exposed to AI are as likely to have grown as to have shrunk.

Knowledge Engineering is Key to AI Readiness

One reason for the gap is a focus on targeting tool use and basic adoption rather than cultivating AI readiness in areas such as knowledge engineering.

Although the major consultancies could be rendered obsolete by AI in their current form, that is not stopping them from treating AI adoption like one last opportunity for body shopping, powerpoint strategising and building dependence on external advisors.

One LinkedIn commentator recently noticed that McKinsey published an AI transformation manifesto without mentioning the most obvious readiness challenge:

If you write a manifesto about AI transformation and never once use the word “knowledge”… you might be missing what AI actually runs on.

We have written a lot about the advantage that wiki-based firms have over PPT-based firms in terms of their work being legible and learnable. The more you write things down and structure your knowledge, the better your AI agents will perform.

But that should not mean that companies seek to suck all the knowledge out of human workforces to codify it and then dispense with their services. Knowledge doesn’t really work like that. It is social, connected, often ambiguous and implicit rather than explicit.

Interestingly, some firms are buying up the conversational and collaborative exhaust of defunct start-ups to train AI models, which feels slightly ghoulish. Other consultants are starting to talk about AI-enabled knowledge transfer, such as this report by KM practitioner Ross Dawson and colleagues at Humans+AI.

I suspect there are many ways in which AI can help people collate, organise and share their knowledge that are more protective of the human element than just extraction and codification. It will be interesting see how this develops.

Building Strategic Thinking Muscle

But we also need to go beyond basic knowledge engineering.

We are working a lot on the challenge of cultivating AI literacy for large firms in a way that brings together all layers of a large organisation and all knowledge levels in a common narrative of world-building.

Literacy is such an evocative term as it touches language (both human and computational), world knowledge, clarity of thought and expression, as well as other areas of cognition and experience.

If our ambition is not just to adopt new tools to speed up the old manual process structures, but rather to adapt the organisation to what AI makes possible, then we need people to expand their literacy and their thinking in various ways, and not just learn how to operate a chatbot.

A valid critique of the messy liminal phase AI has reached in the workplace is that many of us are suffering from software brain - the idea that everything we do can be captured in a database and organised or automated.

Nilay Patel, Editor-in-Chief of The Verge shared a widely discussed polemic last week arguing that most people - and a significant number of younger people - do not want to be flattened and automated away by AI, riffing on this idea of software brain:

For everyone else, AI is just a demanding slop monster. It’s a threat. I’m not saying regular people don’t use Excel or Airtable to plan their weddings or have fun throwing PowerPoint parties, or even that AI won’t be useful to regular people over time. I think a lot of people enjoy data and tracking different parts of their lives. I’m wearing a Whoop band as I write this. I’m just saying these things aren’t everything. Not everything about our lives can be measured and automated and optimized, and it shouldn’t be.

It’s a critique worth engaging with, and I believe there is a big enough landing zone for pro-human AI-enhanced organisations somewhere between automated workhouses and artisanal candle shops that we can aim for.

But if we want people to help build the new, rather than keep poking around in the old systems, then we need to cultivate their literacy and learning more effectively than we have done to date.

Brandan McCord recently shared a great introduction to the concept of Bildung and the role it played in transforming the Prussian education system after their defeat by Napoleon that is relevant to the adoption of AI:

With AI, we are building something like self-guided machines. Whether these systems liberate or merely displace is not settled. But the possibility of leisure at scale is real enough to become a serious question.
If AI can compress parts of instruction, it may deepen learning where it is used and clear ground for formation where it gives time back. But only if it preserves productive struggle rather than bypassing it.
The alternative is already visible: autocomplete for life. Not just help with expression, but the slow outsourcing of judgment itself. That is Bildung’s antithesis.

Another good read in this general direction is Neil Perkin’s recent newsletter on the need for an AI philosophy and not just an AI strategy.

It might sometimes feel like building better organisations is too hard, but there are plenty of examples around us of either a visionary leader or a compelling burning platform (or both!) creating the conditions where rapid evolution against the odds can produce winning systems.

Azeem Azhar shared a thought-provoking analysis from his team a few days ago about how Ukraine was forced to innovate in defence in order to survive, and it really shows what is possible when people have ownership, agency and very open and rapid feedback loops between producers and users. People can do anything with the right motivation.

If organisations respond to AI by forcing people to become more legible to machines, they will fail, socially, culturally, and ultimately economically.

But if they invest in human capability - judgment, learning, knowledge-sharing, and strategic thinking - they have a chance to build something far more powerful: organisations that are not just more efficient, but more adaptive, more coherent, and more human.

The Capability Absorption Gap will not be closed by a better toolset, but by better organisations.

It may be quicker and more effective to build net new capabilities and functions, rather than deploying AI in support of old ways of working, as factories initially tried with their electrification around the turn of the 20th Century.

‘Change’ is a very messy, human challenge; but there might be ways in which agentic AI can help us involve everybody and guide a distributed approach, helping track progress, and making the organisation more legible in real time. This is something we are exploring, so we might return to this topic soon.

How can Enterprises Create their own Agentic AI Evaluation Capabilities?

Cerys Hearsey — Tue, 21 Apr 2026 14:37:07 GMT

Agents can now plan, act, escalate, and collaborate across tools and workflows. Early compositions are starting to deliver real outcomes, and in controlled settings, they can look surprisingly capable. Tasks are completed, metrics move and dashboards turn green.

But in many cases, we do not actually understand how these systems are behaving.

We see the outputs they produce, but not the paths they take to get there.
We measure completion, but not reliability.
We observe success, but struggle to explain why it occurred, or whether it will happen again under slightly different conditions.

This creates a dangerous illusion: systems that appear to be working long before we have any real confidence that they are.

Part of the issue is that most evaluation approaches have not caught up with the shift from tools to systems. We are still judging agents in the same way we judged early language models: by sampling outputs, spot-checking responses, or measuring task completion against predefined criteria. These methods can tell us whether something worked once. They tell us very little about how it behaves over time, under pressure, or at scale.

Worse still, agents can learn to satisfy these surface-level checks without actually developing the underlying capability we think we are testing. They pass the test, but for the wrong reasons. They follow patterns rather than understanding intent, they optimise for success conditions that do not hold outside the test environment and in practice, this means systems can degrade silently, masking fragility behind plausible performance.

As we begin to move toward Outcome-as-Agentic Solutions, this gap becomes harder to ignore. When a system is responsible for delivering a result, not just completing a task, the stakes change. It is no longer enough for something to work occasionally, or under supervision. It needs to behave consistently, adapt appropriately, and recover when things go wrong.

And yet, most organisations do not have a clear way to evaluate whether that is happening - we are deploying systems we do not yet know how to judge.

Evaluating Agent Design

If evaluation is no longer about checking outputs, then what replaces it?

Ideally, each organisation should design and run their own capability to observe, test, interpret, and improve agentic systems as they operate, as an ongoing function embedded into how systems are built and run. This is what an Agent Evaluation Capability provides.

At its core, it is the ability to make system behaviour visible and legible. To move from surface-level indicators of success to a deeper understanding of how outcomes are achieved, where fragility exists, and how performance evolves over time.

This is not a single tool or framework. It is a composed capability, built across systems, data, software, processes, and skills.

Core Systems: Evaluation requires infrastructure. This includes the pipelines that capture system activity, the environments used to simulate scenarios, and the mechanisms for replaying or stress-testing behaviour. Without this layer, evaluation remains anecdotal, based on isolated observations rather than structured insight.

As systems become more complex, the ability to trace execution paths, reconstruct decisions, and observe interactions across components becomes essential.

Data Sets: Agentic systems generate a different kind of data. Not just inputs and outputs, but traces: sequences of decisions, tool calls, escalations, and intermediate reasoning steps. Over time, these traces form a rich dataset that can be analysed to understand patterns of behaviour, identify failure modes, and detect drift.

Capturing and structuring this data is a prerequisite for meaningful evaluation.

Software: On top of this foundation sit the tools that make evaluation possible. Evaluators, critics, and verifiers that assess system behaviour against defined expectations. Test harnesses that allow scenarios to be replayed and varied. Monitoring tools that surface anomalies and deviations in real time.

These components act as a counterbalance to generative systems, introducing layers of checking, validation, and interpretation.

Services & Processes: Evaluation is not just technical. It is operational. Teams need routines for reviewing performance, analysing failures, and refining system behaviour. This might include red teaming exercises, audit processes, incident reviews, and continuous testing loops.

Over time, these processes create institutional knowledge about how systems behave and how they can be improved.

Skills: Finally, evaluation depends on people. Not just engineers, but individuals who can interpret system behaviour, understand risk, and make informed judgements about performance. This includes system thinkers, evaluators, and domain experts who can bridge the gap between technical signals and business outcomes.

As agentic systems become more embedded in organisations, these skills become critical.

Taken together, these elements form a capability that allows organisations to move beyond surface-level validation and toward a more robust understanding of their systems. Without it, agentic systems remain opaque.

Key Design Principles for an Evaluation Capability

Once evaluation is treated as a capability, a different set of design principles can emerge. These are not rules in the traditional sense, but patterns that help teams avoid the most common failure modes.

Evaluate behaviour, not just results. A correct outcome reached through a fragile or opaque path is not success. It is deferred failure. Systems need to be assessed on how they arrive at outcomes, not just whether they do.
Design for failure visibility. Most systems do not fail cleanly. They degrade, drift, or produce plausible but incorrect results. Evaluation should make these conditions visible early, not hide them behind aggregate metrics or surface-level success rates.
Separate generation from verification. The same system that produces outputs should not be solely responsible for judging them. Independent evaluators, critics, or verification layers create necessary tension in the system and reduce the risk of self-confirming behaviour.
Make judgement legible. Decisions about system performance should be traceable and explainable, not buried in logs or reliant on individual interpretation. This is particularly important as systems scale and more stakeholders become involved.
Treat evaluation as continuous. It is not a phase that happens before deployment. It is an ongoing function that evolves with the system, adapting to new conditions, new data, and new expectations.

Evaluation cannot be just a checkpoint, it needs to be a continuous part of the system.

Passing Tests Without Capability

Agentic systems have the ability to pass tests without developing real capability. This is not new. In many domains, systems learn to optimise for the conditions under which they are evaluated, rather than the underlying objective. In agentic AI, this can manifest as agents that appear competent within test scenarios, but fail when those scenarios change in small but meaningful ways.

They follow patterns rather than understanding intent or exploit shortcuts that satisfy evaluation criteria without solving the actual problem. Often, they succeed in environments that resemble their training or testing conditions, but struggle to generalise beyond them.

The result is a form of surface competence.

From the outside, the system appears to work. Tasks are completed. Metrics improve. But beneath that surface, the system lacks the robustness required for real-world deployment. It cannot handle variation, ambiguity, or unexpected conditions without significant degradation.

This is particularly dangerous in enterprise settings, where early signs of success can drive rapid scaling. Systems that have not been properly evaluated for capability are extended into new contexts, where their limitations become more pronounced and more costly.

Without a structured approach to evaluation, these issues are often only discovered after failure.

Deterministic Guardrails in Probabilistic Systems

Agentic systems are inherently probabilistic. They operate under uncertainty, generating responses and actions based on patterns rather than fixed rules. This flexibility is part of their power, but it also introduces risk.

To manage this, organisations need to introduce deterministic elements into the system. Not to remove flexibility, but to bound it. To define where variation is acceptable and where it is not. These guardrails can take many forms.

They may be constraints on what actions an agent is allowed to take.
They may be verification steps that check outputs against known rules or thresholds.
They may be escalation paths that transfer control to a human when confidence is low or risk is high.

The key is that these elements are designed into the system, not added as an afterthought.

Too often, governance is treated as a layer of policy that sits outside the system. Documents are written, guidelines are issued, but the system itself remains unchanged. This creates a gap between intent and execution.

In effective agentic systems, governance is embedded. It is expressed through constraints, checks, and decision logic that shape behaviour in real time. Reliability is not something that emerges from probabilistic systems on its own. It is something that must be designed.

What Evaluation Makes Visible

The impact of an evaluation capability changes what teams can see, what they trust, and how they intervene. Systems are judged on whether they complete tasks. Once evaluation is introduced, the focus shifts and teams begin to understand how work is actually being done, where it breaks down, and what needs to change. Let’s explore some examples:

Customer Onboarding

A team defines success as “customer activated within 72 hours.” Without evaluation, performance is tracked at the end of the funnel. Activation rates go up or down, but the system remains largely opaque.

With evaluation in place, the system becomes legible. Teams can see where drop-offs occur, how agents respond, and which paths lead to successful activation. They can identify whether delays are caused by missing data or poor sequencing. They can see which users require escalation and why.

Over time, this shifts the work. Instead of reacting to missed targets, teams begin to refine the system itself, improving how onboarding is delivered rather than compensating for its failures.

Incident Response

In a typical setup, success is measured by resolution time. An incident is detected, addressed, and closed. If the system resolves the issue quickly, it is considered effective. Evaluation changes the frame.

Teams can see how the system detected the incident, which signals it prioritised, and how it coordinated response actions. They can identify missed signals, unnecessary escalations, or delays in decision-making. They can analyse how the system behaves under pressure, not just whether it resolves the issue.

This creates a different kind of confidence. Not just that incidents are resolved, but that the system can be relied upon when conditions change.

Sales Qualification

A sales team defines success as “90% of new pipeline entries fully qualified within five working days.” Without evaluation, progress is tracked through completion metrics. Entries are marked as qualified, and the process appears to work. With evaluation, the process becomes transparent.

Teams can see how qualification is being performed, whether agents are applying the right criteria, and where information is being missed or misinterpreted. They can detect patterns in stalled entries and identify where additional support is needed.

More importantly, they can distinguish between genuine qualification and superficial completion. This reduces the risk of pipeline inflation and improves the overall quality of sales activity.

Across these examples, evaluation allows teams to move from observing outcomes to interpreting behaviour. From reacting to results to shaping the systems that produce them.

Without this visibility, systems may appear effective while hiding fragility. With it, organisations gain the ability to refine, adapt, and trust the systems they are building.

Getting Started

Most organisations do not need to start from scratch.

They already have elements of this capability in place: logging systems, monitoring tools, performance reviews. The challenge is to connect these elements and extend them to cover agentic systems.

A practical starting point is to focus on a single outcome.

Choose a system that is already in use, or being developed, and begin by instrumenting it. Capture the traces of how it operates. Identify key signals that indicate success, failure, or drift.

Introduce simple evaluation loops. Review behaviour regularly. Look for patterns. Ask not just whether the system worked, but how. Expect gaps. There will be missing data, unclear signals, and behaviours that are difficult to interpret. This is part of the process. Each gap is an indication of where the capability needs to be strengthened.

Over time, these practices can be formalised. Evaluation becomes less ad hoc and more systematic. Insights accumulate. Confidence grows.

If evaluation is treated as something to be added later, it will always lag behind the system it is meant to support. Building it alongside the system is the only way to ensure it keeps pace.

Read on to explore the practicalities of evolving this emerging capability.

Agents at the Ready? Yes and No...

Lee Bryant — Tue, 14 Apr 2026 14:30:49 GMT

Agentic AI capabilities are developing within several pace layers at once - economic, infrastructure, capability readiness, and knowledge engineering - and it is getting harder to stay on top of these developments whilst tracking their interdependence. But organisational readiness is moving slower than each of them for most companies. Are agents ready for the big time yet, and if not, where should we focus our efforts?

Anthropic in the news

Anthropic has been making the news a lot recently, with the leak of its Claude Code codebase (the harness code, not the model weights, etc) and more recently with the announcement that its new model preview Mythos is so powerful that it was held back from release until Anthropic built some counter-measures for its astonishing ability to detect security exploits in all kinds of commonly-used software:

According to Anthropic, Mythos Preview crosses a threshold of capabilities to discover vulnerabilities in virtually any and every operating system, browser, or other software product and autonomously develop working exploits for hacking. With this in mind, the company is only releasing the new model to a few dozen organizations for now—including Microsoft, Apple, Google, and the Linux Foundation—as part of a consortium dubbed Project Glasswing.

Both developments have created a lot of fear and noise, and Mythos points to the huge gap opening up between the (risky, arguably uncontrollable) art of the possible on one hand, and the architectures and control systems we have in place to govern and guide AI within complex organisations.

But it is a third, less dramatic development that I think is most immediately relevant to enterprise AI and the pursuit of new organisational operating systems: Claude Managed Agents, which they describe as “a hosted service in the Claude Platform that runs long-horizon agents on your behalf through a small set of interfaces meant to outlast any particular implementation—including the ones we run today.”

It seems to be aimed at SaaS teams that want to embed Claude agents into their products, providing a managed infrastructure layer with composable APIs that sits between your system and Claude’s models. Ken Huang provided a good overview of its three main components (harness, session log and memory) and some thoughts on its potential usage.

Anthropic have called it a meta-harness that could help scale agentic AI and provide an abstraction layer that can cope with future changes and innovation:

The challenge we faced is an old one: how to design a system for “programs as yet unthought of.” Operating systems have lasted decades by virtualizing the hardware into abstractions general enough for programs that didn’t exist yet. With Managed Agents, we aimed to design a system that accommodates future harnesses, sandboxes, or other components around Claude.

This is one of the first serious attempts to define an agentic infrastructure layer, rather than just an agent.

Is Agentic AI Showing up in Enterprise ROI?

Adoption and investment analysis for enterprise AI usage is beginning to show the impact of agentic AI above and beyond individual productivity use cases for chatbots. A16z’s latest report (released last week) cites coding, search and support use cases as the most active in the firms they surveyed, with tech, legal and healthcare as the sectors most keen on AI adoption.

But we will not really begin to develop a full picture of enterprise AI’s impact until we make more progress building out the kind of architecture necessary for agentic automation to start operating as a managed layer of the tech stack.

McKinsey discussed this paradox - AI is everywhere, and yet the agentic organisation is not yet visible - in a recent podcast with Senior Partner Alexis Krivkovich:

The real promise with agentic, relative to generative AI or previous evolutions of AI, is that you can have the equivalent of superhuman capabilities added to your teams. But the day-to-day workflows and the rituals around ways of working will need to fundamentally change.

That’s what we mean when we say the operating model needs to shift. You need to think about how the hours of the day happen differently, the process of overseeing an agent population, how you engage in problem-solving as a team—and put the right governance and risk controls on top of that.*

AI is everywhere in the enterprise, in other words, but the work of organising around it has barely begun.

Agents & The Coasean Singularity

But the real prize is not just greater efficiency in the old work model, but the so-called Coasean Singularity that occurs if/when agents drive workflow transaction costs towards zero, changing the entire premise of what organisations are and why they exist, which was the question Ronald Coase posed in his famous 1937 essay The Nature of the Firm.

NBER released an interesting working paper on this question in late 2025, which concluded that this shift could challenge existing market structures in good and bad ways, whilst also opening up entirely new forms of exchange beyond simple labour, jobs and contracting:

The capacity of AI agents to dramatically reduce transaction costs as automated intermediaries could unlock new forms of market participation, enable previously infeasible mechanisms, and push allocative efficiency closer to competitive ideals. Yet the same forces that make agents attractive— their tireless persistence, computational superiority, and negligible marginal costs—also threaten to overwhelm existing market structures. The ultimate impact will depend critically on collective choices adopted regarding agent design, market structures, and regulatory frameworks.

Professor Howard Yu wrote an interesting piece on this recently, covering many angles that I think are useful starting points, such as Haier’s Rendanheyi model and Sangeet Paul Choudary’s work on unbundling and rebundling tasks and jobs. One of his key takeaways is that when transaction costs are lowered and supply can meet demand more efficiently, there is a tendency to create barbell-shaped markets where the premium and commodity ends grow, but the middle tier contracts.

Jack Dorsey’s rather chaotic financial services firm Block claims to be developing its own agentic operating system that will sever the link between headcount and output, but it is hard to know if their layoffs are the result of this innovation or just an unwinding of previous over-hiring.

John Rossman wrote glowingly of Block’s intentions last week, and although I profess a degree of scepticism given Block’s history, there was one observation by Block’s Executive Officer in the piece that is worth highlighting:

Before Block could restructure how work gets done, they had absolute clarity on what work needs to accomplish. This requires “Thinking in Outcomes”. Jennings described three non-negotiable outcomes that governed every restructuring decision — reliability (no outages), regulatory integrity (compliance teams untouched, full stop), and durable growth (roadmap commitments honored).

Outcomes clarity and context are vitally important if we want reliable agentic architectures.

But we should not presume that agentic architectures will simply automate and wipe out jobs in complex organisations. As a layer of intelligence, automation and orchestration, agentic AI will make many new value creation methods possible, which could have an expansionist effect resulting in more activity and potentially also more or better human roles.

Agentic Capabilities Still Maturing

We have written a lot about emerging agentic capabilities and how companies hope they will develop to become a key new infrastructural layer in the organisational operating system stack, and the progress we are seeing is encouraging. But it is easy to conflate proofs of concept that demonstrate theoretical performance with mature, tested capabilities.

Luca Mezzalira wrote a thoughtful post last month reflecting on an O’Reilly Network event that discussed agentic capabilities in some depth, which captured some nuanced critiques of current agentic AI state of the art. For example, as long as our agent testing is based only on behaviour and outputs, rather than its real capabilities, then we are at risk of missing deeper problems with how agents sometimes satisfy test conditions, which is not always honest in the sense people would understand the term. There is hidden risk here if we move too fast and assume too much.

Mezzalira’s take-aways from this are threefold.

First, he argues that introducing deterministic guardrails to probabilistic agents is non-negotiable.

Second, we don’t yet know what good looks like, so we should avoid the mis-steps that accompanied the micro-services movement, which had some parallels with agentic AI, but we should also be cognisant of the risk gap this opens up. If agents can already autonomously develop working exploits using a model like Mythos, not knowing what good looks like is a scary prospect.

Finally, we should recognise and accept that we are all beginners and there is a lot still to learn and discover about how agentic AI will play out in the real world, and especially how it will intersect with human behaviours and failure modes.

Recently, we have seen several studies and startup announcements relating to neuro-symbolic AI systems and agents that might be part of the solution for deterministic agents in areas where we cannot tolerate probabilistic risk. Perhaps at a minimum, it is possible to use neuro-symbolic methods in testing or orchestrator agents to verify the output of other agents in the system.

Memex Redux: Agents that Document your Knowledge

There is one final development worth flagging, as I think it points to a bigger trend we will see in agentic enterprise AI in the future. Andrej Karpathy shared a gist called LLM Wiki that helps us use AI agents to build, grow and curate our own structured personal knowledgebases, as Vannevar Bush first imagined in 1945, and Ted Nelson later attempted with project Xanadu:

Most people’s experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question….
The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.

The idea of feeding our AI agents relevant information and knowledge, but asking them to maintain a structured knowledge graph or wiki is potentially helpful in overcoming the limits of memory and the context window when performing long-run or multi-stage tasks. I would like to imagine each agent we train and deploy in the enterprise could maintain its own knowledgebase to improve its learning and to stay updated on relevant developments that might impact its work.

Among the AI solopreneur and personal productivity communities, there is already a popular technique that involves combining Claude with the personal knowledge repository Obsidian to give personal agents better context to work with. But as one commentator pointed out, simply loading relevant wiki pages as .md files into an LLM’s context window is not really the same as a functioning memory that can be queried like a database, nor is it very scalable.

So whilst this is a great starting point for a simple memory and context system for people and agents, it would probably need to be paired with a real knowledge graph and/or database to allow agents to query, search and refactor the information they need without overwhelming the context window.

What would it mean for every enterprise agent to maintain its own structured knowledge base? What are the implications for how we build and govern knowledge infrastructure? Given how hard it has been over the years to persuade leaders that this kind of knowledge engineering should be a key part of their work, perhaps this development might point to us being able to accelerate world-building and context engineering using agents to do the hard work.

Within most organisations, there is a lot of talent and work going on across the operational, technical and (hopefully!) architectural layers that will support agentic AI. But it is worth repeating that only leaders will have the holistic view of how the economics, infrastructure, capability readiness, and knowledge engineering all work together to make reliable, productive agentic AI a reality.

Managed agents with well-curated knowledge repositories could lead us to the Coasean Singularity with all the economic benefits that entails. But if we don’t bring it all together in a considered and safe way, then Mythos points to what might go wrong.

The Autonomous Project Manager

Cerys Hearsey — Tue, 07 Apr 2026 15:03:12 GMT

Note: This piece uses project management as a lens to make a broader shift tangible. Much of our recent work has focused on how organisations move from implicit, human-carried coordination toward systems that are codified, legible, composable and coherent. These ideas are often discussed at an architectural level. Here, they are shown in the context where coordination pressure is most visible, allowing the underlying patterns to be observed in practice.

The Hidden World of Projects

Most projects do not fail because the work is unclear, but because alignment cannot be sustained as the work unfolds. Teams generally know what needs to be done, and plans are often internally consistent - the difficulty emerges in how that work is coordinated across time, teams and systems.

We often label coordination as a single activity, but in reality it is the ongoing effort required to keep multiple streams of work moving in a way that remains coherent. This includes managing dependencies between tasks, interpreting whether progress is meaningful or superficial, and maintaining a shared understanding of priorities as conditions change.

These signals exist in every project - task state transitions, message latency, document revisions and shifts in ownership all provide indicators of how work is actually progressing. In practice, these signals are fragmented across tools and interpreted manually. Teams reconstruct the state of the project through meetings, status updates and informal exchanges, often with a lag between what has happened and what is understood.

Most organisations have responded by increasing visibility. Dashboards, trackers and reporting layers provide more information about activity. This improves awareness, but it does not reduce the effort required to interpret that information. Progress is still reconciled manually, dependencies are still tracked unevenly, and alignment remains something that has to be actively maintained, producing a kind of project reporting theatre where the production of slide decks becomes more of a priority than the actual work of the project.

This is where project managers spend most of their time. They act as a coordination layer across fragmented systems and teams, stitching together signals, resolving inconsistencies and intervening when alignment begins to drift. Much of this work is invisible. It does not appear in plans or reports, but it determines whether the system holds together.

As complexity increases, this model begins to break down. More teams, more dependencies and faster cycles of change increase the volume of signals that need to be interpreted. Coordination effort scales with this complexity, and the organisation becomes increasingly dependent on individuals who can absorb ambiguity and maintain coherence under pressure.

Most coordination effort is invisible, and because it is invisible, it is rarely designed or improved.

The Autonomous Project Manager

Once coordination is recognised as the primary constraint, a different question becomes practical. Instead of asking how to improve planning or execution, attention shifts to how coordination itself is maintained and where it breaks down.

In most projects, this breakdown is visible in small, repeated patterns:

Status updates arrive late or require interpretation.
Dependencies are identified, but their impact is not always tracked as conditions change.
Teams report progress locally, while the overall picture remains unclear.
Escalation often depends on someone noticing that something feels off, rather than on a consistent view of risk.

These are not failures of discipline within the project team, they reflect the absence of a system that can continuously interpret what is happening across the project.

Recent advances in agentic AI make it possible to construct such a system. In practical terms, this creates a clear and immediate use case for a project management agent. The coordination work that currently sits with human project managers is already structured enough to be observed, interpreted and partially handled by a system operating across the same tools and signals. Models can now operate with persistent context, observe activity across multiple tools, and act over time rather than in isolated interactions. This allows coordination to be supported directly within the flow of work.

The idea of an autonomous project manager can be understood more concretely as a project management agent that operates across the coordination layer of the work. Rather than replacing the role, this agent takes responsibility for a set of recurring coordination tasks that are currently performed manually. It surfaces what coordination looks like when it is embedded into the system rather than carried by individuals. In practice, this defines a set of responsibilities that can be handled by a project management agent.

The agent continuously assembles a view of progress. Task state changes, document revisions and communication patterns are observed directly, allowing the current state of work to be assembled continuously. The need to reconstruct progress through meetings and reports is reduced.
The agent tracks and updates dependencies as conditions change. When upstream work slips, the effect on downstream tasks can be traced and surfaced early. This reduces the lag between a change occurring and its implications being understood.
The agent generates and distributes updates based on the current state of work. Updates reflect what is happening across the system, rather than relying on periodic summaries that quickly become outdated or inconsistent across audiences.
The agent detects patterns that indicate risk and routes them appropriately. Patterns that indicate risk, such as repeated delays, conflicting signals or gaps in ownership, can be surfaced with context and directed to the appropriate individuals. This reduces reliance on informal escalation through messages and ad hoc discussions.

These changes do not remove the need for coordination. They shift a significant portion of coordination work from manual effort to system support. Activities such as gathering updates, tracking dependencies and maintaining visibility can be handled continuously by the agent, reducing the need for project managers to reconstruct the state of the work.

For project managers, this changes the nature of the role. Time previously spent chasing updates, reconciling information and maintaining visibility can be redirected toward working with stakeholders, resolving trade-offs and shaping solutions as conditions evolve. Less time is spent reconstructing the state of the project, and more time is available to address ambiguity, resolve trade-offs and guide direction.

Coordination moves from something that is repeatedly reconstructed to something that is continuously maintained.

Inside the Coordination System

The patterns described above are already present in most projects. Signals are generated continuously as work progresses, but they are fragmented, delayed and interpreted manually. A coordination capability brings these elements together into a continuous layer that can observe, interpret and respond as conditions change.

The first component is sensing. Every project generates a stream of signals: task state transitions, changes in ownership, message latency between teams, document revisions and dependency updates. These signals exist across tools and communication channels, but they are rarely combined. As a result, no single view reflects the current state of the system. Bringing these signals together allows the project to be observed as it is, rather than as it was last reported.
Interpretation builds on this. Signals on their own do not indicate whether a project is on track - a task moving to “in progress” may represent genuine progress, or it may reflect a delay in starting work. A completed task may unblock downstream activity, or it may mask unresolved issues. Interpretation requires relating signals to outcomes, timelines and dependencies. This allows the system to distinguish between movement and meaningful progress.
Coordination follows from this understanding. Dependencies between workstreams can be treated as active relationships rather than static links in a plan. When upstream work slips, the system can trace the impact across downstream tasks, identifying where timelines need to adjust or where attention is required. This reduces the delay between a change occurring and its consequences being understood.
Communication becomes part of this same layer. Instead of aggregating updates manually, the system can generate views of progress based on current conditions. Different stakeholders require different levels of detail, but the underlying information remains consistent. This reduces the need for parallel reporting structures that often diverge over time.
Escalation provides structure to decision-making. Patterns such as repeated delays, conflicting signals across teams or gaps in ownership can be detected as they emerge. These situations can be surfaced with relevant context and directed to the appropriate individuals. This allows human attention to focus on points of uncertainty, rather than on monitoring routine activity.

These components do not operate independently. They reinforce each other. Better sensing improves interpretation. Clearer interpretation supports more effective coordination. Structured coordination reduces the noise in communication. Consistent communication improves the quality of escalation.

Together, they form an emerging capability that allows alignment to be maintained continuously. The system does not remove variability from the project, but it reduces the effort required to detect, understand and respond to it.

Coordination becomes observable, interpretable and responsive, rather than fragmented and manually reconstructed.

Where Humans Remain Central

A coordination capability changes how effort is distributed across a project, but it does not reduce the need for human judgement. It changes where that judgement is applied and how it is informed.

When coordination is supported as a system, the backward looking reconstruction effort is reduced. Signals are assembled continuously, dependencies are made visible as they evolve, and the state of the project can be observed without manual aggregation. This creates space for attention to shift toward interpretation and decision-making.

Judgement becomes concentrated in areas where signals alone are insufficient. Prioritisation across competing objectives requires an understanding of intent that extends beyond task-level progress. Decisions about whether to absorb a delay, re-sequence work or escalate an issue depend on how outcomes are valued in context.

Ambiguity is another area where human input remains essential. Conflicting signals are common in complex projects. One team may report progress while another experiences blockage. A dependency may appear resolved in one system but remain open in another. Interpreting these situations requires an understanding of how work is actually being carried out, not just how it is represented.

Stakeholder dynamics introduce a further layer. Commitments, expectations and reputational considerations shape how decisions are made. These factors are rarely captured fully in systems, yet they influence how trade-offs are resolved and how issues are communicated.

Escalation, when structured effectively, directs these moments of uncertainty to the appropriate individuals. The system can identify patterns and assemble context, but decisions about how to respond remain with people who carry responsibility for outcomes.

This shifts the role of those involved in project leadership. More time and attention is freed up for shaping intent, resolving uncertainty and making decisions that affect the trajectory of the work.

The Failure Mode: More Activity, Same Misalignment

Introducing a coordination layer does not automatically improve performance and alignment. It can also increase the volume and speed of activity without changing how that activity is interpreted.

This pattern is already familiar. Most organisations have invested in tools that improve visibility, yet still rely on manual effort to reconcile what those signals mean. The result is more information, but not necessarily better alignment.

This leads to a form of coordination that is technically consistent but operationally ineffective. Activity is tracked, updates are generated, and alerts are surfaced, yet the underlying coherence of the work does not improve.

Communication can become a source of friction rather than clarity.
Escalation can place additional strain on decision-makers if it is not well structured.

There is also a tendency to formalise coordination prematurely. Systems that impose rigid workflows can struggle to accommodate how work actually happens. Teams adapt by working around the system, reintroducing informal coordination alongside the formal structure.

These patterns share a common characteristic. The system responds to signals, but does not improve how those signals are interpreted or acted upon. Coordination effort is redistributed, but not reduced.

This is the same failure mode that appeared in earlier waves of automation. Processes became faster and more visible, but remained fragmented. The system optimised activity within existing constraints rather than improving how the work held together.

What are the conditions for success?

When coordination is treated as a capability rather than a role, a different set of requirements becomes visible. These are not new technologies as such, but conditions that determine whether alignment can be sustained systematically.

Clarity of outcomes provides the foundation. The system needs a stable reference point against which progress can be interpreted.
Shared context allows that interpretation to be consistent. Goals, roles, dependencies and constraints need to be visible in a form that can be related to ongoing activity.
Access to systems and data determines the quality of signals available. Work that is dispersed across disconnected tools or constrained by interfaces limits the system’s ability to observe and respond. Coordination relies on the ability to relate activity across environments, rather than treating each system in isolation.
Defined escalation pathways provide boundaries for decision-making. The system needs to know when to surface situations and how to direct them appropriately.
Codified ways of working support continuity. Many coordination patterns already exist within organisations, but they are often informal and unevenly applied. Making these patterns explicit allows them to be observed, adapted and, where appropriate, supported by the system.

These conditions make it possible for coordination to move from an informal practice to a structured capability. They allow alignment to be sustained through the interaction of systems and people, rather than relying on continuous manual intervention.

The effectiveness of coordination depends on how clearly the organisation can represent its own work, context and decision boundaries.

Read on for what this means for the future of the Project Manager role.

Agents of Progress or Agents of Chaos?

Lee Bryant — Tue, 31 Mar 2026 15:12:43 GMT

The OpenClaw moment we covered a few weeks ago was a wild ride. But in the age of YOLO, Hodl and r/wallstreetbets, it should come as no surprise that there is an apparently limitless supply of people willing to hand over control of their personal computer to AI agents in pursuit of rapid progress.

One fascinating research study - Agents of Chaos - let OpenClaw run riot within a controlled lab environment, and concluded:

During a two-week experimental investigation, we identified and documented ten substantial vulnerabilities and numerous failure modes concerning safety, privacy, goal interpretation, and related dimensions. These results expose underlying weaknesses in such systems, as well as their unpredictability and limited controllability as complex, integrated architectures. The implications of these shortcomings may extend directly to system owners, their immediate surroundings, and society more broadly. Unlike earlier internet threats where users gradually developed protective heuristics, the implications of delegating authority to persistent agents are not yet widely internalized, and may fail to keep up with the pace of autonomous AI systems development.

For those of us with a slightly lower risk appetite, Claude Cowork has also evolved very quickly as a more mainstream alternative to OpenClaw, and is incredibly impressive.

To take but one of many examples of personal agent setups, Eric Porres recently shared his own Claude Cowork harness and lauded its ability to help him manage his portfolio of work activities in a more powerful and efficient way:

the gap between “AI as chatbot” and “AI as operating system for your work” is closing fast. And Cowork is where that gap collapses for non-developers.

At their impressive GTC event in San José recently, Nvidia placed great emphasis on this agentic inflection point as a pointer to where AI is headed next, and also crucially where it can start to show strong returns. In his keynote, CEO Jensen Huang challenged companies to understand the nature of this shift with a typically provocative statement:

Every company needs an OpenClaw strategy

But this was not a crazy call for companies to let OpenClaw run riot in their organisations. The message, as Azeem Azhar interpreted it, is a lot more far-reaching: it is about the shift from the model training era to one of inference and execution.

The dominant driver of AI progress (and NVIDIA’s revenue) was training compute: the vast one-time cost of training large foundation models. This emerging thesis says that the next scaling frontier is inference-time compute — spending more compute at the moment of generating a response, letting models “think longer” on hard problems (chain-of-thought, test-time search, etc.) rather than just being bigger. This changes the hardware economics significantly: inference demand is continuous, distributed, and latency-sensitive rather than concentrated in large training runs. It also opens up physical AI (robotics, autonomous systems) as a major new inference market.

The focus shifts from models to what we do with them - or as Azeem put it: “The harness is the revolution”…

For AI, the harness moment happened at the tail end of 2025. Claude Code began to work reliably enough that you could leave it running overnight and trust what it had done in the morning. Not perfectly, but reliably enough. And that threshold, that “I can leave it to its own devices” threshold, changed everything. It changed what users asked AI to do. It changed how long tasks ran. It changed the token usage profile of every organization that crossed it.
Now, OpenClaw is the harness for the next layer.

image credit: https://www.linkedin.com/posts/i0exception_san-francisco-these-days-activity-7429390168576405504-D8Nr/

Talk to my agent!

What does this mean for agentic AI in the enterprise, and should we be concerned about the way organisation-owned agentic services seem to be lagging so far behind the rapid evolution of personal agents?

Personal AI agents have evolved so much faster than enterprise agentic services that the gap between them is becoming structural, and we could end up with more ways to navigate broken enterprise systems instead of business transformation.

But if we embrace personal agents at work as prototypes and testbeds for shared enterprise services, then perhaps we can close the gap.

Looking at this challenge within the wider context of the shift from model training to inference and runtime intelligence, however, it is clear that much more focus is needed on world models and decision intelligence infrastructure to enable shared enterprise agents to work reliably with less human supervision. The good news is we can do much of this with the tools we have access to today. It is less of a tech challenge and more of a readiness / architectural question.

Microsoft has recently been beefing up its agentic capabilities in its M365 platform, and has just announced a multi-model capability for verifying complex research.
Perplexity has launched an agentic harness for the enterprise, based on internal tooling that was used by its own employees to speed up delivery.
Elsewhere, Salesforce’s agentic foundry, SAP and a host of other stalwarts from the previous generation of enterprise platforms continue to announce new agentic capabilities.

But the capability diffusion gap continues to widen, and there is now a risk that personal agents will evolve so much faster than enterprise agents that we could recreate the ‘old wine in new bottles’ problem that we saw with the earlier phase of Robotic Process Automation (RPA), and use AI to navigate a broken system better, rather than to fix the system or start building a better one.

As Eric Zhou and Seema Amble of Andreesen Horowitz remarked recently, the world still largely runs on old, poorly-designed enterprise platforms not because they are good, but because organisations contorted themselves around their inadequacies and foibles to such an extent that ripping them out could be painful:

To ask a question that sounds almost disrespectful until you’ve spent a week in a Fortune 500: why do people still use SAP (and ServiceNow, and Salesforce) at all?
The short answer is that SAP, or any major legacy system of record, captures critical data across the businesses that use it. But on top of that, the business has customized it and built a set of specific procedures and roles on top of it, much of which is not actually documented anywhere.

Or at least that was the case until now. The authors argue that agentic AI in the enterprise could replace these behemoths over the medium term, but even in the short-term, it could make them more malleable and easier to work with.

Perhaps this is an area where personal agents in the enterprise can become a testbed for enterprise agentic services, trying out automations, workarounds and multi-agent tasks under human oversight, before becoming candidates for new shared services that run more autonomously.

With a personal agent harness, we can each maintain our own personal work context, memory, preferred styles and so on, and our agents will be able to use this to navigate the enterprise and interface with systems of record at the API and data level, bypassing the UI altogether. And our agents can talk to each other to take over a lot of the boring, time-wasting scheduling, alignment, stakeholder communications and basic coordination tasks that consume so much of a leader’s time in large enterprises today.

Later, as more shared enterprise agents come online, our personal agents could help manage our relationships with these as well, for example sequencing the various actions needed for us to run a project or gather intelligence for new ideas.

That suggests to me that we should try to find a way to embrace technologies like Claude Cowork safely within the enterprise to deliver on the potential that copilots promised. Nothing is risk-free, and we clearly need to focus on guardrails and permissions, but if advanced users are willing to be accountable for their agents in return for the productivity and value they could generate, then we can probably find ways to make it work.

World Knowledge, Architecture and Run-time Intelligence

But it is also worth thinking about the differences between personal agents and enterprise agents, and what we have learned so far on this journey of discovery.

First, we need a more considered and thoughtful approach to productivity than boasting about how many lines of code (LOC) we can churn out.

Mario Zechner tried to summarise his own lessons from agentic coding last week in an interesting, opinionated piece about the dangers of brittle software, missed learning and unmaintainable systems, and concluded we need to “slow the f** down.”*

In a similar vein, Matt Webb reminds us that good architecture beats a high LOC count every time, and helps avoid personal-agents-as-workarounds suffering from the same failure modes as RPA, mentioned earlier:

The thing about agentic coding is that agents grind problems into dust. Give an agent a problem and a while loop and - long term - it’ll solve that problem even if it means burning a trillion tokens and re-writing down to the silicon.
Like, where’s the bottom? Why not take a plain English spec and grind in out in pure assembly every time? It would run quicker.
But we want AI agents to solve coding problems quickly and in a way that is maintainable and adaptive and composable (benefiting from improvements elsewhere), and where every addition makes the whole stack better.
So at the bottom is really great libraries that encapsulate hard problems, with great interfaces that make the “right” way the easy way for developers building apps with them. Architecture!

Another question is where we can live with probabilistic models, and where we need a more deterministic approach. We can tolerate the limits of probabilistic models in personal agents we oversee and can test, but when we start to think about embedded or autonomous agents that work the same way for everybody, a more deterministic approach is often needed.

Just as Nvidia’s forward strategy is about more real-time inference in the output stage, we can start to imagine how more verification and compliance could also be done at runtime to make some enterprise AI agents more deterministic.

Artur Huk wrote about this a few days ago for O’Reilly, describing ‘decision intelligence runtime’ as a missing capability layer in agentic AI - more of an engineering pattern than a specific solution or technique.

And this brings us back to a topic we have been noodling on for some time, which is the vital importance of world-building in ensuring the success of enterprise AI.

Whilst some aspects of world-building are about creating good general context for people and machines to have clarity about goals, ways of working culture, language and so on, there are other aspects of world knowledge that are more precise and scientific.

The way that current models handle world knowledge is largely in the training stage, rather than the inference or runtime moment. The way autonomous driving systems learn is a good example. Waymo vehicles in Austin developed a nasty habit of illegally overtaking school buses on pick up, and the school district worked with the company to give them simple rules and guidance on how to avoid this happening. But the training process for Waymo’s system is so long and includes so much data, that they were unable to simply add a new rule quickly, and the cars kept overtaking buses. More runtime inference and decision intelligence based on world models is perhaps one way to tackle such anomalies.

So much of the existing decision intelligence inside organisations has never been captured, as Sharon Richardson remarked yesterday in her informative piece about context graphs, which means there is a lot we can achieve quite easily and quickly if we are smart about it. This does not require some kind of Sisyphean manual knowledge mapping exercise, because it is the kind of task that AI can accelerate with the right supervision, even down to the level of conducting structured interviews or After Action Reviews (AARs) to capture decision traces and reasoning from people.

The kind of world models we will need to realise the promise of enterprise agentic AI will go way beyond intangible knowledge and culture. They will need to understand the physical world, manufacturing, distribution and even domains like geopolitics that (not again!) are messing with supply chains and pricing.

How we build and evolve these models is truly a fascinating challenge, and one where we can use AI itself to help improve our AI readiness by doing the mapping, collating and documentation of the information we need to make them real.

Rohit Krishnan recently wrote that this idea is really the key to the future of work, which he sees operating more like a strategy co-op game than a single-player game, and I think that is right.

What’s needed in the enterprise world is such a world model - an engine that knows the rules, tracks the state, understands and predicts consequences.
The environment would connect to the systems a company already runs, the information that is gathered, the agents it uses, and build a live operational model of the business. Scale it across companies and you have the training data to build a compelling environment and an even better world model!

The question is, can visionary CIOs and leaders of AI adoption programmes make the case that urgent attention and investment is needed in AI readiness efforts rather than just rolling out co-pilot licenses and hoping for some marginal productivity gains?

We need composable, addressable processes, services and systems if enterprise agents are to operate autonomously. And we need codified rulesets, world models and decision intelligence to be available at run-time if we want them to operate more deterministically without the kind of oversight we perform with personal agents.

Same Models, Different Worlds

Cerys Hearsey — Tue, 24 Mar 2026 15:08:51 GMT

AI models are becoming shared utilities. When the same intelligence is available to everyone, the real differentiator is no longer the model, but the world around it. If your competitors are using the same models as you, trained on similar data and accessed through the same interfaces, then the intelligence itself cannot be the thing that gives your organisation competitive advantage.

If everyone has access to an identical intelligence layer, where does differentiation actually come from?

At first glance the answer might seem to lie in prompting skill or in deploying more sophisticated agents and tools on top of the models. But the organisations gaining the most value from AI are not simply using the models differently, they are building richer environments around them. And the more intelligence you retain in the environment, the easier it becomes to switch models when needed.

Instead of treating AI as a standalone tool, enterprise AI pioneers are beginning to shape the world that AI operates within: the outcomes it is responsible for, the expertise it inherits, and the context it can access. This is also creating a clearer distinction between skills and context in terms of agentic architecture.

Memory and Ownership

In everyday AI usage, people are starting to build up context inside individual tools: conversations, prompts, working patterns, fragments of reasoning. Over time, this becomes something more than usage. It becomes a kind of working memory.

→ But that memory is fragile.

→ Switch models, and it disappears.

→ Move tools, and it fragments.

→ Hit usage limits, and it resets.

For those focused on using commercial tools, running into usage limits with, say, Claude and potentially having to switch models is not just inconvenience, it is the loss of accumulated context. Conversations, assumptions, and working patterns have to be rebuilt from scratch.

The same pattern is playing out inside organisations.

Copilots sit in different tools. Context is scattered across systems. There is no shared memory layer, no consistent way to carry forward how work is done, what has been learned, or how decisions have been made.

The intelligence may be shared, but the context is not; and more importantly, it is not owned.

Tools vs Worlds

On paper, most AI deployments look remarkably similar. The same models sit underneath a fairly common toolset. The same copilots appear in productivity software and the same agent frameworks promise orchestration across workflows. From the outside, it can feel as though every organisation is drawing from the same intelligence layer, and in many ways, they are.

But beneath that shared surface, two very different approaches are being used.

In some organisations, AI remains primarily a tool layer. Employees interact with models through prompts and copilots, using them to draft content, analyse documents, generate ideas or automate parts of existing workflows. The intelligence sits at the interface: helpful, but largely separate from the deeper structure of how the organisation operates.

In others, a different approach - instead of focusing only on how people interact with AI, attention begins to shift toward the environment.

What outcomes the system is responsible for.
What expertise it inherits from the organisation.
What context it can access across tools, data and systems.

This is where an architectural distinction that is gaining traction in agent systems becomes useful. The difference between Skills and context systems such as MCP. At a simple level, the distinction is straightforward:

Skills describe how the system approaches a problem. They encode reasoning patterns, frameworks and playbooks that guide analysis or decision-making, but increasingly they also define how work gets done in practice: the sequences of actions, process steps and interactions the system can carry out across tools and workflows.
Context systems determine what the system can see and interact with. They provide structured access to documents, tools, data sources and workflows across the organisation.

AI systems do not become powerful simply because the models improve; they become powerful when three things begin to align:

the outcomes they are responsible for
the skills they can use to reason and act
the context they can access across the organisation

Together, these elements start to form something that looks less like a tool and more like an operational environment for intelligence.

Tool Mode: Intelligence Without a World

Digging a bit deeper into the differences, it is clear that in Tool Mode, AI is treated primarily as an interface for generating output.

Employees prompt copilots to draft documents, analyse data, generate presentations, or summarise discussions. Individual productivity improves, and in many cases the gains are significant. Teams can move faster in small pockets, content can be produced more easily, and analytical work that once took hours can often be completed in minutes.

But the intelligence remains largely detached from the organisation itself.

The model has limited awareness of the systems people use every day. It does not have structured access to decision histories, internal frameworks, or the reasoning patterns that shape how the organisation operates. Knowledge remains scattered across documents, chat threads and individual expertise. As a result, AI becomes powerful but context-poor.

Outputs may be impressive in isolation, but they often lack alignment with the organisation’s specific priorities, norms and operating logic. Two employees asking the same question may receive different answers. Strategic nuance is lost. Decisions remain dependent on individuals interpreting and adjusting the output.

In this mode, AI behaves a little like a brilliant intern dropped into the organisation with access to a search engine but very little understanding of how the company actually works.

World Mode: Intelligence Inside an Environment

When organisations focus on designing the environment, instead of just prompting and tool use, attention can shift towards:

What expertise should be codified and reusable?
And what context should be systematically accessible?

This is where the distinction between Skills and context systems becomes operational.

Skills capture reasoning patterns that previously lived mostly in people’s heads, but they can also encode how those patterns translate into action: what steps to take, which tools to use, and how work flows from one stage to the next. They encode the frameworks, heuristics and analytical approaches that experienced practitioners use when evaluating problems or making decisions.

A competitive analysis skill, for example, might specify how to compare products across pricing, features, positioning and risk, and how to gather that information across internal and external sources as part of a repeatable workflow. A risk assessment skill might define the dimensions that should be considered before approving a supplier or launching a new initiative. These patterns represent organisational expertise. When codified as reusable skills, they become something new: shared reasoning infrastructure.

Context systems complement this by shaping the environment the intelligence operates within. Through mechanisms such as MCP, agents can access the tools, documents, databases and workflows that contain the organisation’s operational knowledge.

When outcomes, skills and context begin to align, the intelligence is no longer simply generating output. It begins participating in the operational fabric of the organisation, both in how decisions are made and in how work is executed.

Why Tool Mode Is the Default

Just as Extraction Mode dominates organisational transformation, Tool Mode tends to dominate early AI adoption.

The reasons are straightforward.

Deploying AI tools is easy. Codifying expertise and structuring context is not.

It is far simpler to train employees in prompting or roll out copilots across the organisation than it is to examine how knowledge actually flows through the system. Codifying reasoning patterns requires surfacing tacit expertise that may never have been formally articulated. Structuring context requires connecting fragmented systems and clarifying which information should be authoritative. Both activities are organisational work rather than purely technical work. Most enterprises therefore begin with the most visible layer: interaction with the model.

But this choice has consequences: the intelligence becomes faster, but the organisation does not necessarily become smarter.

The Mechanics of World-Building

In our previous piece we explored the idea of world-building as a leadership discipline, the craft of designing the environments in which human and machine intelligence operate together. Organisations were compared to worlds with their own physics, culture and geography: rules that shape behaviour, norms that guide judgment, and environments that determine how actors navigate the system.

The distinction between outcomes, skills and context begins to reveal what those layers look like in practice.

Outcomes define the direction of the world. They describe what success looks like and where responsibility ultimately sits. When AI systems are attached to outcomes rather than isolated tasks, they begin participating in the organisation’s operating logic rather than simply generating output.
Skills represent a form of codified expertise. They capture the reasoning patterns that experienced practitioners use when analysing problems, evaluating trade-offs or making decisions. In world-building terms, they begin to encode aspects of the organisation’s culture — the ways it interprets information, the factors it considers important and the principles that guide judgment.
Context systems provide the geography of the world. Through mechanisms such as MCP, agents gain structured access to documents, tools, data and workflows. These systems determine what the intelligence can see, what information it can retrieve and where it can act.

Seen together, these elements begin to form the operational layer of world-building. The models may be shared across organisations, but the world around them is not.

One organisation may give an agent access to fragmented documentation and informal processes. Another may provide structured context, codified expertise and clearly defined outcomes. The underlying intelligence is the same, but the environment it operates within is fundamentally different.

Let’s explore these three layers in more depth.

Outcomes Define Direction

Before thinking about skills or context, start with something simpler: what the system is actually trying to achieve.

Most early uses of AI focus on tasks, drafting, summarising, analysing. Useful, but peripheral. The centre of the organisation is not tasks, it is outcomes.

Reduce customer churn
Improve supplier reliability
Increase conversion
Resolve support issues faster

When intelligence is attached to outcomes, the system begins to organise differently. Decisions, data and workflows align around a shared objective rather than fragmenting across individual activities.

This is the shift from AI as a productivity tool to AI as part of how the organisation delivers results.

Outcomes define direction, skills and context can only make sense once that direction is clear.

Learning Becomes Codification

If outcomes define what matters, skills define how the organisation gets things done.

In AI systems, a “skill” is simply a codified way of approaching a problem, a reusable reasoning pattern. Instead of asking a model to “analyse a market,” a skill defines how that analysis should be done: what to compare, what structure to follow, what constitutes a good answer.

For example, a competitive analysis skill might require comparison across pricing, features, positioning and risk, and end with clear strategic implications. What matters is not the example, but the shift.

Learning no longer lives only in people. It becomes something that can be captured, reused and improved.

Frameworks, heuristics and decision patterns that were once taught informally start to become shared infrastructure. Different teams stop reinventing the same thinking. AI systems and humans begin to draw on the same reasoning patterns.

Learning moves from training individuals to building organisational capability.

Context Becomes Architecture

If skills define how the organisation thinks, context determines what it can act on.

Without structured context, AI operates in isolation, limited to prompts and general knowledge. With it, intelligence becomes connected to the organisation’s actual work. This is where approaches like Model Context Protocol (MCP) matter as they provide a structured way for AI systems to access:

internal documents
data sources
business tools
workflows

So instead of generating answers in the abstract, the system works with live organisational information. At that point, the quality of the environment starts to matter as much as the quality of the model.

Two organisations using the same AI can produce very different results depending on how well their context is structured.

If organisations are beginning to codify outcomes, skills and context, an interesting question surfaces: where does this knowledge actually live?

In software engineering, GitHub provides a shared environment where code can be stored, improved and versioned collaboratively. A similar concept may emerge for organisational intelligence: a place where skills, decision rules and context connections can be maintained as shared infrastructure.

You could think of it as a kind of GitHub for world-building - a repository where the logic of how the organisation operates becomes visible, improvable and reusable.

Agents on the Night Shift

Lee Bryant — Tue, 17 Mar 2026 15:30:51 GMT

Andrej Karpathy’s new autoresearch tool recently ran 700 experiments on his nanochat codebase in two days. It found 20 improvements he had missed, delivering an 11% uplift in output. Tobi Lütke at Shopify tried it on his own hand-tuned model: 19% improvement, parameter size halved. What makes this remarkable is not the numbers. It is the mechanism. The tool does not just run tests — it updates its own Python code based on what it learns. The researcher sets the direction; the machine runs experiments overnight and arrives with findings.

This is one example of an important archetype I think we will see more and more in agentic AI: the machine that makes the machines, which is both exciting and slightly strange.

This raises interesting questions about where learning lives in a human-machine system, and who benefits from it.

As Jeremy Keith put it recently when considering how agentic coding is changing development practices:

Outsourcing execution to machines makes a lot of sense.
I’m not so sure it makes sense to outsource learning.

But the productive division isn't just human vs. machine learning — it's human imagination operating at the meta-hypothesis level, and machine speed exhausting the territory around it. A single wild guess or idea can now seed hundreds of downstream tests; what comes back isn't just an answer, but a richer map of the problem space than any individual researcher might have drawn alone.

From Agentic Coding to Agentic Engineering

It seems everybody is intrigued right now by the rapid changes that the latest agentic AI models are bringing to software development, and it is worth paying attention because a similar process is likely to play out across other areas of work.

The New York Times Magazine recently published a major feature, Coding After Coders: the End of Computer Programming as we Know it, covering the history of the field and the experience of developers navigating rapid transformation:

How things will shake out for professional coders themselves isn’t yet clear. But their mix of exhilaration and anxiety may be a preview for workers in other fields. Anywhere a job involves language and information, this new combination of skills — part rhetoric, part systems thinking, part skepticism about a bot’s output — may become the fabric of white-collar work. Skills that seemed the most technical and forbidding can turn out to be the ones most easily automated. Social and imaginative ones come to the fore. We will produce fewer first drafts and do more judging, while perhaps feeling uneasy about how well we can still judge. Abstraction may be coming for us all.

This is almost certainly not the end of computer programming as a discipline, despite the pace of change. Computer science will become more sciencey; programming — talking to computers — will become more literary. But the need for people who understand what is possible and how to make it happen will continue to grow.

But agentic coding is also creating new forms of cognitive overload among AI-assisted developers, including the puzzling sight of people sitting outside in Silicon Valley watching — but not touching — their laptops as coding agents grind through the work.

Matt Jones captured this strangeness well this week in a lovely piece of writing — Gas Town and Bullet Hell — in particular the temporal mismatch between human cognition and machine speed:

If brain fry is a clock problem — a temporal mismatch between human cognition and machinic speed — then solutions that only address interface design or training will help at the margins but miss the structural issue…
If we want AI agent work to feel more like flow and less like fry, the challenge isn’t making things faster or even slower — it’s about legibility, consent, and reversibility, and all three matter at once.

As we hit the cognitive limits of what single-player mode can achieve, the shift from agentic coding to agentic engineering becomes important.

Simon Willison has a typically thorough guide to what this means in practice: instead of using an agent to write some code, agentic engineering means tasking systems with higher-order goals and the ability to self-manage the path towards them with less micro-management. The craft, Willison argues, was never primarily about writing code — it was always about figuring out what code to write.

The Organisation as the Machine That Makes the Machines

We have argued for a long time that to produce good software, organisations need to become like software themselves.

Corporate failures such as Volkswagen’s first attempt to build a software division for its vehicles within a hierarchical and bureaucratic organisation prove the point. The technology was not the problem; the organisational architecture was.

And yet, elsewhere in the automotive world, this has been understood for some time. Jurriaan Kamer recently shared lessons from F1 teams, quoting a team principal on what they borrowed from the Apollo project in their pursuit of agility and excellence under pressure:

“What you can’t have is an engineer here having to go up and down a particular hierarchy and then hop across — in our instance, not just a different geographic location, but a different country altogether — and then go up and down. So instead, it’s a kind of different structure where it’s mission control instead of command and control.”

This distinction matters more than it might appear. Today, a developer can use an agent to write better static software, and that is a productivity story everybody can follow. But if we trace the trajectory of agentic engineering towards its logical conclusion — and Karpathy’s autoresearch is an early signal of where that leads — we will need a much more fluid and connected organisational structure where services and processes are digitised and addressable, so that they can become truly programmable and genuinely capable of self-improvement.

The organisation itself needs to become the machine that makes the machines. ASML’s famous EUV system is a useful reference point: a machine so complex that it requires extraordinary coordination between hundreds of specialist suppliers and internal teams, but one whose design assumes that it will be continuously improved by the people who build and operate it. The infrastructure is not static. It learns.

This also brings the learning question back into focus. If the machine is updating its own code overnight and accumulating insights from hundreds of experiments, organisations need to build the governance and oversight architecture that keeps humans genuinely in the loop — not as approvers of every output, but as the people setting direction, interpreting results, and carrying the institutional memory that the machine cannot hold. Otherwise, you end up with iteration without learning, which is just faster drift.

As Daniel Hulme reminds us in his recent thoughtful account of the philosophical and historical pre-cursors of agentic AI, we already have rich bodies of knowledge and methods to draw on:

The irony of this moment is that we are simultaneously living through the most rapid deployment of autonomous agents in history and underutilising the most relevant bodies of knowledge ever produced on how to make such systems safe. From Socrates’ method of structured interrogation to Aristotle’s formal logic, from Chrysippus’ propositional reasoning to the medieval protocols of adversarial disputation – and then from Carl Hewitt’s Actor Model to Michael Bratman’s theory of practical reasoning, from Leslie Lamport’s work on distributed consensus to Edmund Clarke’s model checking, from Lotfi Zadeh’s fuzzy logic to the agent architectures of Michael Wooldridge and Nick Jennings – these thinkers and many others spent careers building the conceptual and mathematical toolkit for exactly the challenges we now face. Their work isn’t historical curiosity. It’s a foundation we should be actively building on

The same could be said of our accumulated knowledge about organisational design. How systems learn, adapt, and maintain coherence under rapid change is not a new problem. We just have a new urgency to solve it.

The Infrastructure Is Coming

The broader technology ecosystem is already moving in this direction. Nathan Lambert’s survey of the current state of open AI models suggests we will eventually reach a place where specialised small models are freely available for organisations to adapt and build on when creating their own AI platform architectures.

Jensen Huang is unambiguous about where this leads:

“There will be no software in the future that’s not agentic. How could you have software that’s dumb? And so, it is absolutely true that every software company will become an agentic company.”

Instead of using AI agents to write better SaaS tools, this implies that software firms will make available agents that can continuously write, maintain, and evolve living software — software that has a sense of its own role and mission.

Incidentally, this also supports the thesis that ‘services as software’ will be a major new opportunity for specialist service providers.

Futurum Group published new research this week on CIO AI priorities, finding that enterprise goals are shifting from basic efficiency towards innovation and organisational change. Dion Hinchcliffe’s conclusion that “the generic efficiency argument for AI is dead” is heartening. The route to greater returns is more about systems and architecture than it is about individual tool use, and it seems more enterprise leaders are beginning to see this. The danger is that “innovation and organisational change” becomes the new banner under which old structures get expensively automated rather than genuinely redesigned.

Hypotheses & Organisational Learning

Karpathy ran 700 experiments in 48 hours on a well-defined optimisation problem with clean metrics and the ability to measure improvement objectively. That particular set of conditions is relatively rare. Most organisational improvement problems do not have clean metrics, do not produce outputs that can be evaluated overnight, and do not have the structured test environment that makes autoresearch possible.

What humans might lack in speed of iteration, they more than make up for in their ability to generate the wild guesses and what ifs that make for rich experimentation. Super-charging this innate human capability with the power of machines to loop through variations or play out scenarios could accelerate our learning and innovation in exciting new ways.

What autoresearch points towards isn't the automation of discovery, but its amplification. The human makes the leap; the machine explores where it lands.

Every organisation already has processes that could, with sufficient effort, be made legible, measurable, and addressable. The question for leaders is not whether to wait for the infrastructure to arrive — it will. The question is whether the organisation they are building now can actually use it when it does. The machine that makes the machines requires a very different kind of organisation than the one that deploys tools to make existing tasks faster.

What is the hypothesis-testing loop in your organisation that you most wish you could accelerate? And who, right now, is doing the learning?

A Quick Favour to Ask

Please consider signing the Rebuild Letter to support a great initiative I have been loosely involved in over the last year or so that aims to stimulate the development of better European social tools and networks to reduce our reliance on weaponised attention farming.

Programmable Governance & Probabilistic Humans

Cerys Hearsey — Tue, 10 Mar 2026 15:19:31 GMT

It might sound counter-intuitive at first, but good AI governance needs to become both programmable and probabilistic if organisations are to make meaningful use of human judgement alongside machine intelligence.

The idea of ‘human-in-the-loop’ governance is simple and reassuring - AI systems may assist, recommend or automate, but somewhere in the process a human remains responsible for oversight and final judgement.

For early pilot deployments this model works reasonably well. Humans review outputs, approve sensitive actions or intervene when something appears wrong. But as AI systems become faster and more autonomous, we hit the limits of this approach quite quickly.

The scale and speed of modern systems will soon outstrip the cognitive bandwidth of manual oversight. A governance model built around humans inspecting individual outputs simply does not scale.

This does not make human judgement less important. If anything, the opposite is true. But the role humans play in governance needs to evolve.

In practice, most leaders already hold nuanced views about emerging risks. A security leader may suspect that monitoring systems could fail under certain conditions. A product leader may worry about reputational edge cases. Legal teams often sense regulatory ambiguity long before it becomes formal policy.

Yet governance structures rarely capture these insights clearly. They compress judgement into binary decisions: approved or rejected, compliant or non-compliant, acceptable or unacceptable, red or green.

An under-used superpower that organisations already possess, and which could help here, is the ability to harness distributed human judgement about uncertainty.

A probabilistic human expresses judgement differently. Instead of presenting certainty where none exists, they estimate likelihoods and levels of confidence. When those signals can be aggregated across many individuals, organisations gain a clearer picture of how risk is evolving across the system. This opens the door to a different kind of governance. Rather than inserting humans into isolated approval points, organisations can begin to treat collective judgement as a continuous signal about how uncertain the system really is.

Several mechanisms are beginning to emerge to support this shift. Some organisations experiment with prediction markets, allowing distributed expertise to converge into probabilistic forecasts about emerging risks, whilst others introduce structured dissent mechanisms, deliberately creating space for people to challenge prevailing assumptions and surface potential failure modes. And some leadership teams are beginning to convene probabilistic risk councils, where uncertainty is discussed explicitly and collective judgement informs governance decisions.

Taken together, these approaches allow organisations to move beyond episodic oversight toward something more adaptive: continuous calibration of uncertainty.

Let’s explore how these techniques work in practice and how leaders can use them to make human judgement legible inside increasingly complex AI systems.

Why human-in-the-loop governance begins to strain

For many leadership teams, the idea of human oversight in AI systems feels like a reassuring safeguard. If automated systems introduce uncertainty, the obvious response is to ensure that a person remains responsible for the final decision.

Yet organisations deploying AI at scale quickly encounter a different reality.

The challenge is not simply that systems are autonomous. It is that they operate within environments defined by speed, complexity and data volume that far exceed what traditional governance processes were designed to handle.

AI systems interact with constantly changing data, evolving models and interconnected workflows. Decisions that once occurred occasionally may now occur thousands of times per hour. Risk signals appear not as clear incidents but as patterns emerging across vast streams of activity.

Under these conditions, governance based on periodic review begins to struggle. Human judgement remains essential, but it cannot function effectively if it is only inserted at isolated approval points. This is where the idea of programmable governance becomes important.

Rather than relying entirely on manual oversight, programmable governance embeds certain rules, constraints and escalation paths directly into the systems themselves. Authority boundaries can be checked automatically before actions occur. Certain thresholds can trigger human review. Conflicts between objectives can halt execution and escalate to decision-makers.

In other words, governance becomes structural rather than procedural. Some forms of accountability are handled automatically within the system, while human judgement is reserved for the decisions that genuinely require interpretation, trade-offs and values.

When governance is structured this way, the human role changes. Instead of reviewing every decision, leaders focus on calibrating how the system interprets risk and uncertainty. And this is where probabilistic thinking becomes essential.

Most governance processes still ask leaders to express judgement in binary terms: approved or rejected, compliant or non-compliant, acceptable or unacceptable.

Yet the judgments leaders actually hold are rarely so definite.

A CISO may believe there is a moderate chance that monitoring systems would fail under certain conditions. A product leader may suspect that a new feature introduces reputational risk without being able to quantify it precisely. Legal teams may sense regulatory ambiguity long before it becomes formal policy.

These kinds of judgements contain valuable information. But governance structures compress them into simple approvals or objections.

The idea of the probabilistic human offers a different approach.

Instead of presenting certainty where none exists, probabilistic humans express judgement in terms of likelihood and confidence. When those signals can be aggregated across many individuals, organisations gain a clearer picture of how risk is evolving over time.

And once judgement can be expressed probabilistically, a new set of governance techniques becomes possible, and we enable a far richer audit trail of decision-making that can be used to train AI systems and improve future decisions.

Once judgement can be expressed this way, a new set of governance techniques becomes possible and we create a richer audit trail of how judgement is exercised, which can inform both future governance decisions and the training of AI systems themselves.

Where this tension shows up for leaders

The limitations of human-in-the-loop governance rarely appear as an explicit design flaw. Instead, they surface indirectly, as uncertainty that feels difficult to resolve through existing oversight structures.

Different leadership roles can encounter this tension in different ways, depending on where they sit in the organisation’s decision and accountability landscape.

CISO pain points: signals that arrive too late

For CISOs and security leaders, the strain often appears as a timing problem: risk reviews take place and systems are assessed against known threat models. Yet concerns about AI behaviour can emerge gradually, often through operational signals rather than formal governance channels.

A model may drift slowly outside expected parameters, monitoring alerts begin to cluster in unusual ways, small anomalies appear that do not yet justify escalation, but suggest that the system’s behaviour is shifting.

Traditional governance frameworks expect risks to be identified and addressed at defined checkpoints. But many of the signals that matter most in AI environments are probabilistic rather than definitive.

Security teams therefore find themselves working with a growing set of partial signals, indicators that something may be wrong, without a clear threshold that justifies intervention.

Legal and compliance pain points: decisions without certainty

Legal and compliance leaders tend to experience the tension differently. Governance processes often require them to classify a system in categorical terms: compliant or non-compliant, acceptable or unacceptable. Yet many AI deployments sit in ambiguous territory, particularly when regulations are evolving or when systems operate across jurisdictions.

Legal teams frequently recognise emerging risks early. They may sense that a deployment could attract scrutiny, or that regulatory expectations are shifting in ways that are difficult to formalise.

However, governance structures typically force those insights into binary decisions. A deployment is either approved or blocked, even when the underlying judgement is far more nuanced.

This can create uncomfortable dynamics. Legal teams appear cautious or obstructive when they are simply responding to uncertainty that has not yet stabilised.

Product leadership pain points: innovation slowed by rigid oversight

Product leaders encounter the same structural issue from another direction. AI-enabled features often evolve iteratively. Teams test new capabilities, refine workflows, and adjust behaviour based on real-world feedback. In this environment, risk rarely presents itself as a clear go-or-no-go moment.

Instead, risk appears as shifting probabilities.

A feature may be broadly safe, but introduce edge-case failure modes. A system may work well under typical conditions, but become fragile when interacting with other services.

When governance frameworks rely on discrete approvals, product teams can find themselves navigating a process that feels mismatched to the way the technology evolves. Reviews occur at specific milestones, while risk emerges gradually over time.

The result is often friction rather than clarity.

Executive leadership pain points: oversight that becomes symbolic

At the executive level, the tension appears as a widening gap between formal oversight and operational reality.

Leadership teams approve governance frameworks, establish policies, and review risk dashboards. Yet the speed and complexity of AI systems can make those structures feel increasingly abstract.

Executives may receive polished summaries of model performance or compliance posture, while sensing that the organisation’s true exposure is harder to quantify.

This does not usually reflect a failure of diligence. Rather, it reflects a mismatch between the episodic rhythm of traditional governance and the continuous evolution of AI-enabled systems.

Under these conditions, leadership oversight risks becoming symbolic: reassuring in principle, but too distant from the system to provide real-time calibration.

Across all of these perspectives, the underlying issue is the same - human judgement is present throughout the governance system, but it is expressed in ways that hide uncertainty rather than revealing it. Leaders are often forced to present confidence where what they actually possess is a probability.

This is where collective intelligence techniques become valuable. They allow organisations to capture and aggregate the probabilistic judgement already present across the system, turning it into signals that governance structures can actually use.

Read on to explore three techniques that can help organisations embed structured collective intelligence into AI governance:

prediction markets
structured dissent mechanisms / red team markets
probabilistic risk councils.

Humans in the Loop or in the Soup?

Lee Bryant — Tue, 03 Mar 2026 15:50:09 GMT

Enterprise AI governance, security and safety are challenges that will require a multi-domain approach and imaginative solutions that combine technology, human factors, knowledge engineering and codification. These are issues that cannot just be delegated to CSOs and IT functions without collective leadership accountability.

Waiter! There’s a Human in my Loop!

The recent furore over the USA Department of War’s threats to declare Anthropic a supply chain risk is an interesting example of how confusing things are likely to become.

Anthropic has been supplying a modified (and apparently more advanced) version of Claude to the US military through Palantir, but has tried to insist on two red lines governing its usage, namely (1) that it should not be used for broad spectrum domestic surveillance, which might be technically illegal; and, (2) it should not be used to run fully-autonomous weapons systems, because Anthropic do not believe it is yet reliable enough to do this safely.

In fact, as Henry Farrell wrote today, the US military is such a vast bureaucracy that the majority of use cases for LLMs will not be about autonomous weapon systems, but the logistics, information synthesis and practical management tasks that such a huge organisation requires.

Nevertheless, the Department of War has insisted it should have total control over how the tool is used, and if Anthropic do not acquiesce, then the company will be declared a supply chain risk, meaning not only that it will lose its contracts with the US government, but also that third parties offering services to government that depend on Anthropic’s technology will probably need to replace it for another model.

By way of context, Israel - one of the most advanced users of AI-enabled technology for targeting and hardly what might be called over-cautious - claims to use a double human-in-the-loop sign-off process to verify proposed targeted attacks. Meanwhile, the US “Secretary of War” rails against “stupid rules of engagement” and “traditional allies who wring their hands and clutch their pearls, hemming and hawing about the use of force.”

Inevitably in such a polarised political landscape, Anthropic have been seen as the good guys and OpenAI, who negotiated their own agreement with the DoW shortly after Anthropic’s deal collapsed, have been seen as the bad guys; but what this means for AI governance is likely to be less simple than it appears.

So where is AI governance headed and what are the practical alternatives to unlimited executive power?

Security Soup and Programmable Governance

I had a conversation with a very smart and accomplished CISO last week, and she suggested that we are heading to a place where human oversight is insufficient to maintain security in a complex enterprise. So, whilst we might want to see human-in-the-loop solutions to minimise AI risks, we probably need to think more imaginatively about how this works.

Historically, regulation and corporate governance involved a very slow process of risk analysis and political negotiation of the rules, which were then handed down to company officers to enforce manually through training, guidelines and so on. But this approach was prone to problems such as regulatory capture, or compliance theatre, where those in charge of upholding the rules lacked the power and influence to rein in the behaviour of colleagues who were generating profits from skirting them (e.g. in banking). This was also prone to regulatory inertia, which often meant regulators were too busy fighting the last war and struggled to keep up with current or emerging risks.

So how should companies do to ensure security, safety and risk mitigation in their use of AI and related technologies?

Diginomica recently reported on a UK CIO event where executives expressed a desire to ensure AI safety without harming innovation, and several attendees talked more about a collaborative evolution than top-down transformation:

Anybody in a complex organization should think about whether they are enabling or leading. You enable through a coalition of the willing.

There have been several attempts to scope out some guidelines for trustworthy AI. For example, James Miller at TechTarget recently shared some thoughts on how to boost institutional accountability to align governance and structure, and laid out a framework to guide the shift from strategy to execution via trustworthy data and algorithms.

Elsewhere, those who want to see pro-human, safe and trustworthy AI are writing lots of guidelines, pleas and commendable nice words. But they often lack a credible, practical approach to implementation that is sufficiently robust, responsive and integrated with technology systems and platforms to really make a difference.

As one enterprise AI practitioner put it when reflecting on lessons learned:

Policy alone won’t save you. You need policy and technology and education working together. Any one of those by itself is insufficient.

Looking ahead, I think we are headed towards AI-enabled programmable governance (building on existing practices such as Policy-as-Code) where an organisation ingests and codifies regulatory, compliance and legal rulesets, and combines them with its own rules of the road to give governance systems the context needed to guide the organisation’s work. And of course, governance isn’t just about rules like “don’t leak data.” It also includes less technical concerns like “don’t violate brand values,” “don’t hallucinate pricing,” and “stay within budget.”

Where Leaders Need to Step up

The challenge in building and evolving fit-for-purpose AI security and governance systems is not to be under-estimated, and it covers everything from tech, data, and platforms to culture, behaviour, and process management.

Manual oversight is not enough. Human-in-the-loop is desirable, but at what level and supported by what kind of deep infrastructure underneath? Perhaps we will end up with agents surveilling other agents, and an organisational autonomic immune system that draws lessons from biology with single-purpose nano-bots running around in swarms to identify and contain anomalies or ‘foreign bodies’ at the network level.

But however our security, safety and governance systems evolve, they will need significantly greater codification of our rules, guidelines and threat vector identification than we have today. This is an area where the wider leadership function can meaningfully assist CISOs and CIOs without needing a great deal of technical knowledge: start by capturing specific rules or statements that make nice words actionable and (ideally) programmable.

It is a system challenge, but we have barely scratched the surface of writing the code for it.

We already know that when leaders only decide and delegate on AI topics, but don’t lead adoption, the outcome is sub-optimal. We need them to get their hands dirty and bring their experience and knowledge of their organisation’s value chain and strategy to the table. We already know that mandating adoption using the crudest possible KPIs is likely to optimise for the wrong outcomes.

We need leaders to lead the codification or knowledge engineering required to make our guidelines understandable to agents, APIs and systems.

This starts with understanding and connecting the knowledge objects of the organisation in shared knowledge graphs, so that the entities we work with (person, team, process, system, concept, goal, etc) are addressable and findable. Once we have a good basic knowledge graph, then we can start to make a system that is programmable (IF this THEN that, etc) - and that is where we can start to develop meaningful codification of how we want things to work.

Neil Perkins recently shared a thoughtful article on knowledge engineering and why it matters, which is worth a few minutes to read:

… I think the idea of architecting knowledge for AI goes far deeper than just a technical practice. The quality of every AI-assisted decision, recommendation, and output is bounded by the quality of context it receives. This makes the curation of organisational knowledge (what gets captured, how it’s structured, how relationships between ideas are maintained) a fundamental strategic capability.

Humans in the Loop or Making the Soup? Why not Both?

To suggest that we are blowing past the point where humans in the loop can meaningfully oversee and manage safety, security and regulatory compliance need not be a defeatist or alarmist position. It is about the size and scale of the loops, and where people can maximise the value of their unique intuition and experience.

If the loops are too low-level, there is too much information to process unless we develop the kind of pre-cognition powers in a movie like Minority Report. If the loops are too high-level, then we review our leadership team’s nicely packaged monthly threats powerpoint file only to find we were fatally compromised 29 days ago.

If we think of enterprise AI as an exoskeleton that empowers people rather than a robot that replaces them, perhaps we can use agentic systems where they can realistically perform autonomic immune functions, whilst surfacing issues and decision points at the right level of detail for human-in-the-loop issues, allowing people to drill down quickly to see the specifics. But this will require a whole load of systems, data and knowledge graphs to support them.

Leading and participating in the codification effort is something all leaders can do today to take the pressure of CSOs, CISOs and CIOs who cannot do it alone.

Extraction vs. Redesign: The Hidden Fork in the Road for AI Leaders

Cerys Hearsey — Tue, 24 Feb 2026 15:20:41 GMT

This article is published as a free sample of the Shift*Academy paid edition.

Every other week, the paid edition explores the structural implications of AI for leadership, organisational design and enterprise capability, with practical deep dives for leaders navigating the agentic era.

If this resonates, you can subscribe for full access to future essays and capability breakdowns.

Over the past decade, we have lived through multiple “transformations” that promised structural change. Digital was meant to flatten hierarchies. Agile was meant to empower teams. Platforms were meant to dissolve silos. In each case, the tooling evolved faster than the governing logic of the organisation. We digitised reporting lines rather than redesigning them. We accelerated information flow without rethinking who holds authority. We changed the vocabulary, but not the topology.

Enterprise AI is arriving into that same landscape.

Its capabilities are extraordinary - coordination costs are falling, translation layers can be automated, and expertise can gain direct leverage over execution in ways that were impossible even five years ago. And yet, already, we can see a familiar pattern forming. Many organisations are reaching for AI as a simple efficiency play inside structures designed for a previous era.

The question is not whether AI works - it clearly does - but whether we are willing to change the frame through which we organise work, rather than using this new intelligence to reinforce the old machine.

And this is where the fork in the road begins to come into focus.

On paper, most AI deployments look similar. The same models. The same copilots. The same orchestration platforms layered into finance, operations, customer service, product and strategy.

The language is shared: augmentation, automation, leverage, productivity. But beneath that shared surface, two very different operating logics are taking shape.

In some organisations, AI is being introduced as a margin stabiliser. Junior layers are reduced. Reporting structures remain intact. Agents are embedded into existing workflows to accelerate output and reduce cost, while decision rights and authority models remain largely untouched. Efficiency improves, executive distance from execution is preserved and the machine runs faster.

In more ambitious organisations, AI is treated as permission to ask more uncomfortable questions. If translation can be automated, why maintain translation layers? If coordination is cheaper, why preserve reporting ladders designed to aggregate information upward? If expertise can now act closer to the work, what becomes of authority that was historically justified by information asymmetry? Here, AI is not simply accelerating existing processes; it is exposing structural assumptions that have long gone unchallenged.

Both paths can produce impressive short-term productivity gains. But only one opens up net new top line growth, and it does this by changing the organisation.

The distinction is subtle at first. It does not show up in vendor announcements or pilot metrics. It shows up in what leaders choose to leave alone. It shows up in whether span of control is redesigned or simply expanded. It shows up in whether apprenticeship is re-imagined or quietly eroded. It shows up in whether authority is redistributed toward outcomes, or insulated behind more efficient reporting.

This is the moment where AI adoption stops being a technology story and becomes a design story.

Extraction Mode: The Frame Defending Itself

Extraction Mode often presents as pragmatism. Budgets are under pressure, markets are volatile and boards are demanding visible returns. AI offers immediate gains in efficiency, speed, and headcount flexibility. In that context, embedding agents inside existing workflows feels not only rational, but responsible.

Junior roles are reduced first:

automation absorbs repetitive tasks;
reporting structures remain largely intact
middle layers translate agent outputs rather than reconsider their necessity

Executive oversight becomes more data-rich, but not structurally closer to execution. Productivity per employee rises and importantly, cost curves improve.

But decision rights are still flowing upward and authority is still tied to hierarchy rather than outcome ownership. Over time, the consequences begin to surface in less obvious ways. For example, apprenticeship pathways narrow as entry-level roles disappear without being redesigned; or leaders find that their bandwidth does not materially recover, but shifts toward adjudicating edge cases and resolving boundary disputes between humans and agents. Informal shadow coordination grows as teams compensate for ambiguities that the formal structures never addressed.

Extraction Mode can produce good numbers in the short term. It can stabilise margins and extend runway. But it does so by reinforcing the underlying frame: preserving hierarchy, protecting authority and optimising cost. AI becomes a margin machine. And the structure that limited previous transformations remains quietly in place.

Redesign Mode: Questioning the Topology

Redesign Mode begins with a different instinct. Instead of asking, “Where can AI remove cost?” it asks, “What assumptions about structure that were built into the organisation when coordination was expensive are no longer valid?”

If translation can be automated, then layers that existed primarily to aggregate and repackage information should be scrutinised. If agents can monitor workflows continuously, then escalation does not need to rely on proximity to authority. If expertise can act directly with the support of agent systems, then the justification for distance between decision-makers and execution begins to weaken.

In Redesign Mode, AI is not inserted into the existing machine, instead it is used to reveal its architecture, and then improve it.

Reporting ladders are examined, not just accelerated. Decision rights are clarified, not assumed. Span of control is redesigned deliberately rather than allowed to expand silently. Outcome boundaries are defined explicitly, and authority is tied to those boundaries rather than to position in a chain of command. This does not necessarily mean “flatter.” It means clearer.

Some functions may consolidate. Others may fragment into outcome cells with explicit guardrails and escalation rules. Leaders move closer to the work in some areas and further from it in others, but the movement is intentional. Apprenticeship is redesigned alongside automation, ensuring that the disappearance of repetitive tasks does not quietly eliminate the pathways through which judgment develops.

The shift is subtle - AI is treated not as an efficiency layer but as structural permission. Coordination is cheaper; therefore, the organisation does not have to be shaped the way it was when coordination was scarce.

This path is slower. It exposes leaders to greater short-term uncertainty. It requires confronting incentive systems, governance habits, and career ladders that feel natural because they have been stable for decades - but it also changes the trajectory.

AI becomes a leverage multiplier rather than a margin machine. And the organisation begins to evolve rather than simply accelerate.

Why Extraction Mode Is the Default

Redesign Mode sounds compelling in theory. Few leadership teams would openly argue that preserving outdated structures is the goal. And yet, when AI initiatives move from pilot to budget to restructuring, most organisations tilt toward extraction.

Cost reduction is measurable. Structural redesign is not. A headcount number can be reported to the board next quarter. A re-architected decision-rights model compounds over years. The former is easy to defend; the latter is harder to explain.

Governance models amplify this bias. Boards understand margin expansion. They are less fluent in organisational topology. Asking for approval to remove redundant roles inside an existing structure feels prudent. Asking to redesign that structure altogether feels risky. It introduces ambiguity about authority, reporting, and risk allocation at precisely the moment when AI already feels destabilising.

But the explanation cannot stop there. Over the past decade, most transformation effort has been directed downwards. Teams were asked to become more agile, managers were asked to embrace digital tools, frontline functions were reconfigured. Senior leaders, in many cases, changed less. Their ways of working and their information flows often remained intact. Digital transformation pointed at the base of the pyramid more than at its apex.

Our current AI transformation is exposing this. As information asymmetries fall and translation layers become automatable, the traditional justification for distance from execution weakens. Redesign Mode would require leaders to update their own operating models: to move closer to outcome boundaries, to make judgment legible, to relinquish some insulation provided by hierarchy. That is harder than reducing cost!

Preserving hierarchy is safer than questioning it. Leaders who reduce spend inside a known model are seen as disciplined, whereas leaders who challenge the shape of that model take on visible personal risk. In uncertain markets, prudence (and self-preservation), often win.

There is also a subtler force at work. Most organisations have been optimised for decades around information asymmetry. Authority was justified by access: access to data, to strategic perspective. AI reduces that asymmetry, but the habits built around it remain. It is easier to automate the flow of information up the ladder than to question why the ladder exists in its current form.

This is how transformations stall. The technology advances and the structure remains recognisable as efficiency rises. The deeper architecture stays intact. And over time, what could have been a redesign moment becomes another optimisation cycle that fails to grasp the opportunity for improvement.

The Apprenticeship Question

Every hierarchy contains an implicit learning pathway. Entry-level roles absorb repetitive work. They sit close to process and they observe decisions being made. They develop judgment slowly through exposure, error, and proximity. Over time, some of those individuals move upward, carrying tacit knowledge with them.

It is not a perfect system. It can be inefficient and uneven. But it is a capability engine, which Extraction Mode fundamentally disrupts and fails to replace with a workable solution.

When junior layers are removed without structural redesign, repetitive tasks disappear, but so do many of the early exposure points where judgment is formed. Automation replaces execution without replacing apprenticeship. The pyramid thins, but the pipeline narrows.

In the short term, this looks efficient. Output per employee increases. Overhead falls. But over time, a different cost accumulates.

Where do future leaders learn how decisions are made under uncertainty?
Where does tacit operational knowledge accumulate?
How does strategic judgment develop if the early rungs of the ladder vanish?

Redesign Mode confronts this directly. If AI removes certain forms of work, then the learning architecture must be rebuilt intentionally. Apprenticeship shifts from “do the repetitive work and observe” to something more deliberate: structured exposure to decision boundaries, transparent escalation logic, visible agent–human coordination, and explicit responsibility for outcomes.

In other words, if coordination is becoming cheaper, learning cannot remain accidental.

This is not a sentimental argument for preserving junior roles. It is a compounding argument. Organisations that treat entry-level work purely as cost will eventually erode their own capacity for long-term adaptation. Those that redesign learning alongside automation build a deeper form of resilience.

If You’re Serious About Redesign Mode

Redesign Mode is not declared in strategy decks. It shows up in structural edits.

If you believe AI is a redesign moment rather than a margin moment, there are early signals that distinguish intent from rhetoric.

1. Rewrite One Decision Rights Map

Pick a domain where agents are already active.

Then ask:

Which decisions remain human?
Which are delegated?
What triggers escalation?
Who arbitrates conflict?

If the map still routes most meaningful decisions upward through the same hierarchy, you are in Extraction Mode.

2. Audit One Reporting Layer for Translation vs Judgment

Many layers exist to aggregate and translate information.

Agents can now perform much of that work.

For one reporting tier, ask:

Does this layer exercise unique judgment?
Or does it primarily synthesise and repackage?

If it is translation, move it to the system.

If it is judgment, clarify and anchor it closer to outcomes.

3. Redesign One Apprenticeship Pathway Alongside Automation

If repetitive tasks disappear, learning cannot remain accidental.

In one function:

Map how junior staff historically developed judgment.
Identify what automation removes.
Design deliberate exposure to decision boundaries, trade-offs, and escalation logic.

If you cut entry roles without rebuilding learning architecture, you are optimising cost at the expense of future capability.

4. Define One Outcome Cell

Choose one cross-functional workflow.

Define:

The outcome metric.
The guardrails.
The escalation rules.
The named human owner.
The supporting agent stack.

If coordination is cheaper, structure can follow outcomes rather than reporting ladders.

These are not large-scale reorganisations, they are diagnostic edits - small structural moves that reveal whether AI is being used to reinforce the current topology or to reshape it. Redesign Mode begins with the courage to make authority, learning, and accountability explicit.

The Choice Is Ours

AI will increase productivity in either mode.

In Extraction Mode, it will accelerate reporting, reduce cost, and preserve existing authority structures with greater efficiency. The machine will run faster, but it will break more often. Margins may expand. Headcount curves may improve. On paper, it will look like progress.

In Redesign Mode, AI will be treated as a structural inflection point. Coordination costs will fall and structure will change in response. Decision rights will be clarified. Span will be redesigned. Apprenticeship will be rebuilt. Authority will move closer to outcomes rather than further from them.

The models themselves are neutral but what they amplify is not. If we embed AI inside hierarchies designed for information scarcity and expensive coordination layers, we will simply automate those hierarchies. We will thin the pyramid without questioning its shape. We will accelerate the system that previous transformations failed to meaningfully change.

If instead we allow AI to expose the assumptions built into our structures, then we have a redesign opportunity rather than yet another optimisation cycle.

The uncomfortable truth is that the technology is not the constraint. Model capability is advancing rapidly. What will determine whether this transformation compounds advantage or quietly stalls is whether leaders are willing to question the frame that has shaped enterprise design for decades.

AI does not choose between extraction and redesign.

We do.

Schrödinger’s Optimism: AI and Productivity Signals

Lee Bryant — Tue, 17 Feb 2026 15:45:11 GMT

Schrödinger’s Optimism

Reading news stories about the US stock market dip at the end of last week, you might think that serious economic and technology analysts are uncertain about the impact of AI on business and productivity.

Selling or buying stocks is quite a binary activity (notwithstanding the grey areas of hedging and options), but the current state of AI is more quantum than binary - simultaneously beyond imagination and yet not good enough for deployment in production; able to autonomously code entire apps with a few lines of instruction, but struggling with basic maths or questions like “should I drive to the car wash?”

We will probably need to live with its patchy, jagged, probabilistic, non-evenly-distributed nature for some time, and co-evolve our methods with the technology in much the same way as quantum computing relies on error correction. The question now is how quickly our institutions can metabolise what the technology is already making possible.

But if you zoom out just a little, the progress being made is incredible, and would have been unimaginable just a few years ago.

Matt Shumer recently wrote a widely-shared article trying to put this progress into words, which started with the reflection that people who are not following AI developments don’t know quite how disruptive the next 5 years could be:

I’ve spent six years building an AI startup and investing in the space. I live in this world. And I’m writing this for the people in my life who don’t... my family, my friends, the people I care about who keep asking me “so what’s the deal with AI?” and getting an answer that doesn’t do justice to what’s actually happening. I keep giving them the polite version. The cocktail-party version. Because the honest version sounds like I’ve lost my mind. And for a while, I told myself that was a good enough reason to keep what’s truly happening to myself. But the gap between what I’ve been saying and what is actually happening has gotten far too big. The people I care about deserve to hear what is coming, even if it sounds crazy.

He goes on to discuss what this means for jobs, extrapolating from his own experience as a software developer working in AI, and it is simultaneously an exciting and also very discomforting read. But it is also naive.

On the optimistic side, I believe our lives should not be dictated by “jobs”, which have become hollowed out and insufficiently remunerated to live well at the entry level to mid-tier (at least outside of tech). But realistically, in the absence of labour market improvement or Universal Basic Income (UBI) or any other idea about how young people without assets can support themselves, the implications of what Shumer predicts could be very worrying.

However, business and societal change is modulated by incredible reserves of inertia that can hold back progress for decades, if not centuries, as long as enough ~~Powerpoint enjoyers~~ leaders are invested in the old ways of doing things.

A case in point is the debate about SaaS platforms in business:

Logically, many of them are screwed.
Practically, firms can now recreate better, simpler versions of them without the eye-watering subscription costs using coding agents.
Emotionally, they weigh so heavily on employee experience that companies would be far happier if they ceased to exist.

And yet … don’t count them out. There are some very human - and very illogical - reasons on both the buyer side and the vendor side that suggest these businesses might not be so easy to kill, as Finbarr Taylor argues here. Just because a better way is possible, it doesn’t mean it will come to pass:

You don’t always pick the cheapest option. You don’t always pick the most innovative option. You pick the option that, if it fails, you can defend to your boss. “We went with Salesforce” is a defensible sentence in any boardroom in America. “We went with an app I vibe-coded over the weekend” is a resignation letter.
This is the same dynamic that kept IBM dominant for decades and that keeps McKinsey and Deloitte in business despite armies of cheaper, often smarter competitors. Enterprise buyers optimize for career risk, not unit cost. They want a vendor that will still exist in three years, that has a support team they can call at 2am, that has a track record of not losing their data.

Change is not inevitable. At least not everywhere.

Exponential Proponents

This weekend, Azeem Azhar also published an eye-popping piece about the speed of AI’s evolution, predicated on the realisation that he had consumed 97 million tokens in a single day of working with AI tools. He makes the point that in exponential change, each level of scale can be fundamentally different from those below them. At 10^3 tokens, AI is a toy. But as we add a zero to the tokens used, it becomes a tool, then a colleague, a workflow, a process and a workforce; but at 10^9 tokens, it is more like infrastructure - always on, always working, like electricity.

At 10⁹, a billion tokens a day per person, our unit of analysis changes. This becomes agents spawning even more sub-agents and talking to them and other agents. The human sets direction and adjudicates edge cases, but the conversation is mostly not ours anymore.
I’ve already caught a glimpse with my own setup. Micromanaging them slows down the whole process. If you had to configure each sub-agent yourself and track their work, I’m pretty certain none of us would do it. In other words, the bottleneck is no longer the model’s capability; it’s your willingness to let go. It becomes like running an organisation, trusting the parts to make the whole.

And for those at the frontier of AI usage, the tools are not reducing effort, or giving them extra leisure time; in fact they are intensifying their work, as Aruna Ranganathan and Xingqi Maggie Ye found in their recently published eight-month HBR study of 200 workers at a hi-tech firm, which raises some interesting questions for leaders:

The promise of generative AI lies not only in what it can do for work, but in how thoughtfully it is integrated into the daily rhythm. Our findings suggest that without intention, AI makes it easier to do more—but harder to stop. An AI practice offers a counterbalance: a way to preserve moments for recovery and reflection even as work accelerates. The question facing organizations is not whether AI will change work, but whether they will actively shape that change—or let it quietly shape them.

Which way up is that J-curve?

In the FT this weekend, Erik Brynjolfsson made the case that AI-attributed productivity improvements are starting to show up in the data (also commented on by Andrew McAfee here if the FT link is paywalled):

Data released this week offers a striking corrective to the narrative that AI has yet to have an impact on the US economy as a whole. While initial reports suggested a year of steady labour expansion in the US, the new figures reveal that total payroll growth was revised downward by approximately 403,000 jobs. Crucially, this downward revision occurred while real GDP remained robust, including a 3.7 per cent growth rate in the fourth quarter. This decoupling — maintaining high output with significantly lower labour input — is the hallmark of productivity growth.
My own updated analysis suggests a US productivity increase of roughly 2.7 per cent for 2025. This is a near doubling from the sluggish 1.4 per cent annual average that characterised the past decade.

Does this represent the beginning of the hoped-for productivity J-curve promised by AI optimists? Or are we seeing business leaders using automation to shed jobs, whilst protecting their own, with no reduction in overall output? Or, is the mild increase in US GDP nothing to do with technology at all, and could negative payroll growth indicate recessionary dynamics down the line? We will see.

It is sad to see leaders of large, established organisations respond to abundant technological capability by cutting the junior headcount (a.k.a their future) just to appease the fickle stock-trading gods in a time of market volatility, or to protect themselves until they can exit. If ever there was a time for long-term thinking about organisational development, it is now. Maybe private companies and those owned by long-term family trusts will be among those to chart a path through this fear and end up as winners.

But for individuals trying to get by in this liminal space between old and new worlds, it could be challenging. The most empowered, high-agency individuals and teams can achieve more than ever, but many of the entry-level jobs young people have been conveyed towards since they started school may not exist (or at least in such numbers) in the near future. If you have agents to manage, you might make it. But outside tech, old-fashioned management structures demand an awful lot of pointless busy work at the base of the pyramid that might start to be replaced sooner than we think.

We need to focus on helping the best leaders move closer to the work, not retreat further into abstraction and politics. AI makes it possible to compress layers, to give experienced people direct leverage over real outcomes rather than managing proxies and reports. But that only happens if incentives shift. In many firms, status is still measured by distance from execution, and risk is minimised by preserving familiar structures. Unless those dynamics change, AI will be used to thin out the base of the pyramid while leaving its shape intact. That would generate marginal gains at best, but reduce the organisation’s capacity to explore and exploit the kind of exponential gains that Matt Shumer and Azeem Azhar believe are possible.

From Agent Spaghetti to Outcome Architecture

Cerys Hearsey — Tue, 10 Feb 2026 15:03:59 GMT

The first wave of AI agents gave us chatter, smart assistants and co-pilots, thinly wrapped around language models. They could respond to prompts, but lacked memory, structure, or real autonomy. The second wave gave us experimental multi-agent frameworks promising goal-directed collaboration, but often resulting in brittle workflows, unclear ownership, and what can only be described as agent spaghetti.

A third pattern is beginning to take shape that shifts the focus away from agents completing isolated tasks, and towards composed systems that can reliably deliver outcomes - what we call an Outcome-as-Agentic Solution (OaAS).

Instead of delegating disconnected tasks to individual agents, teams define a measurable business outcome such as “reduce time-to-value for onboarding,” “respond to regulatory change within 48 hours,” or “restore service after incident X”, and assemble a lightweight system of agentic and human functions to achieve it. These systems combine modular agents, context-aware orchestration, embedded controls, defined human roles, and real-time feedback loops.

The goal isn’t full autonomy (yet). It’s programmable delivery, building a system that can act, adapt, and escalate when needed, with traceable logic and accountable performance.

Most organisations today are stuck in the second wave, over-automated in places, under-coordinated in others, and missing a coherent architecture. But those willing to invest in capability design rather than scattered tools have a chance to build something scalable and auditable.

This shift from agent spaghetti to outcome architecture isn’t just technical - it’s structural, and it’s coming fast.

Why This Matters Now

Agentic AI is moving fast, but many early implementations rely on brittle prompt chains, superficial integrations, or agents that struggle outside of sandboxed conditions.

But in enterprise settings where there are defined outcomes, clear constraints, and some history of process discipline, something more promising is starting to take shape. Teams are no longer just using agents to automate individual actions. They are beginning to build systems that can move toward outcomes, not just execute steps.

This shift matters because the old model of delivery is starting to break down. And many processes still rely on invisible glue: Slack messages, heroic effort, unspoken expectations, and the judgment of people who hold it all together.

Agentic systems, when designed well, offer a way to relieve that pressure. Instead of asking people to follow brittle processes or fill the gaps manually, organisations can compose systems that are able to coordinate, respond, and escalate in service of a shared result.

This only works if the outcome is clearly defined, the logic is well structured, and someone is accountable. And that is where things often fall apart.

Many leaders are not used to thinking in outcomes. They operate in terms of deliverables, metrics, or KPIs, but struggle to describe the intended result in a way that a system could act on. The shift from task delegation to outcome delegation sounds simple, but it reveals all the places where organisations rely on unspoken knowledge and manual intervention. Most agentic prototypes fail here, not because the tools are wrong, but because the intent is vague or the system has no way to recover when things go off track.

At the same time, teams are under growing pressure to move faster and cover more ground with fewer resources. Delegating outcomes to agents may feel risky, but in many cases it is less risky than continuing to scale human coordination with no real structure behind it.

The opportunity is real. So are the challenges.

Organisations that take capability design seriously will be better placed to make the shift, while those that remain stuck in automation theatre may soon find themselves outpaced by something quieter and more durable.

What Makes an Outcome-as-Agentic Solution Work?

The idea of delegating outcomes to AI systems often sounds appealing in principle, but quickly becomes uncomfortable in practice. Once an outcome is specified, the gaps begin to show: unclear logic, patchy data, fragile handoffs, and an over-reliance on people to notice when something goes wrong.

Most agentic systems still depend on human scaffolding that no one has time to maintain. To build something that can actually deliver, we need to design for outcomes from the start, with a focus on architecture, not just interaction.

From early deployments and internal pilots, seven core components appear necessary. These don’t guarantee success, but without them, outcome delivery is unlikely to scale or survive contact with the real world.

Defined Outcome: A specific, measurable result framed in business terms and bounded by time, value, or risk. It must be clear enough to guide action and structured enough to delegate.
Coordinated Agentic Functions: Sensing, planning, action, escalation, and communication roles composed into a system. Modular or shared, these agents must work together toward the same end.
Embedded Context Logic: Rules, reasoning, or orchestration layers that help the system adapt based on actors, history, and constraints. Without context-awareness, agent behaviour becomes brittle.
Intentional Human Involvement: Designed-in roles for judgment, validation, or escalation. Human input is not a fallback but a core part of safe, ethical, and effective delivery.
Built-In Controls: Guardrails, permissions, audit trails, and constraints that enforce responsible action and ensure enterprise-grade oversight from day one.
Live Feedback Signals: Real-time data on performance, interaction, and progress. These signals allow the system to detect drift, adjust behaviour, and stay aligned with outcomes.
Outcome Accountability: A clearly assigned owner, human, agent, or hybrid, who holds responsibility for the result and has the authority to monitor, intervene, or evolve the system.

These seven components form the minimum viable architecture for moving from automated actions to accountable outcomes. Most organisations have some pieces already, the opportunity is to assemble them deliberately.

Example Applications

Outcome-as-Agentic Solutions can be applied wherever a clear business result is needed, but the delivery path involves multiple actors, moving parts, and changing conditions. Rather than replacing existing systems, OaAS compositions work best when they operate alongside or across existing functions, helping bridge the gap between intent and execution.

Here are a few early applications where this capability could be shaped:

Customer Onboarding Acceleration

A growth team defines the outcome: ‘customer activated within 72 hours.’ Instead of a fixed checklist, agents handle setup, compliance, and follow-ups. If delays occur, they escalate automatically. The outcome is tracked until met.

Sales Qualification Improvement

A regional sales lead defines the desired outcome as “90% of new pipeline entries fully qualified within five working days.” Agents monitor CRM data, follow up on missing fields, pull supporting materials, and highlight stalled entries. If needed, they surface patterns to enable sales enablement support.

Incident Response and Recovery

A service delivery team defines an outcome of “incident resolved and verified within four hours.” Agents detect the incident, collect logs, contact on-call engineers, generate resolution summaries, and notify affected stakeholders. Escalation points are designed in. The outcome is the restoration of service, not just the creation of a status page.

A User-in-Flow Scenario: Outcome Ownership in Practice

Imagine a digital operations manager responsible for a new service launch.

The launch has been successful in most regions, but uptake in one market is slow. The team defines an outcome: “50% of new users complete onboarding within three days.” The manager decides to delegate this outcome to an agentic system rather than spin up another campaign.

The system begins with a monitoring agent that watches for drop-offs in the onboarding funnel. When one is detected, a messaging agent nudges the user with contextual help. If there is no response, a support ticket is drafted automatically. If an error is detected in the setup process, a diagnostic agent checks logs and flags a fix.

After 48 hours, if no progress is made, the agent escalates to a human customer success lead with a full activity history and a proposed intervention. The entire system is traceable and auditable. The manager can see which drop-offs were resolved, which escalated, and what interventions worked.

Instead of chasing tasks, the team is focused on improving a shared outcome, and they have a system that is actively helping them deliver it.

These examples are not hypothetical. Each component - sensing agents, orchestration layers, escalation paths - already exists in enterprise pilots. The challenge is integration.

Mapping the Capability

To treat Outcome-as-Agentic Solutions as a repeatable capability, rather than a series of disconnected experiments, organisations need to understand the building blocks that support it. This is not just about tools or platforms, but creating the conditions for composed, accountable delivery to emerge and evolve over time.

Here are five core dimensions to map and develop:

Core Systems: Orchestration frameworks, secure run-time environments, event routing infrastructure, and API mesh layers form the technical foundation for composing, monitoring, and governing agentic systems at scale.
Data Sets: Live operational metrics, system state data, feedback signals, thresholds, and business constraints provide the structured input agents need to sense context, evaluate progress, and make decisions toward defined outcomes.
Software: Modular agents, coordination logic, reasoning planners, guardrails, and escalation mechanisms enable systems to act autonomously, collaborate across boundaries, and maintain accountability throughout the delivery process.
Services & Processes: Outcome design, orchestration planning, compliance management, performance monitoring, and intervention routines embed human control, ensure trust, and allow the capability to evolve within existing delivery structures.
Skills: Outcome ownership, capability engineering, agent supervision, orchestration design, and performance analytics turn system components into a live, composable, and trusted capability that delivers real results.

This map is not prescriptive. Different organisations will assemble these components in different ways. What matters is that they are treated as part of a whole - not isolated tools, but building blocks of a new capability that can be developed, maintained, and scaled over time.

Getting Started

Most organisations already have fragments of what they need: partial processes, loosely defined outcomes, and tools that were never designed to work together. The opportunity is not to invent something new, but to assemble and align what’s already there.

What could the flow of the capability look like once all of the layers are in place?

Here are some practical ways to begin:

Choose a real outcome with clear business value

Begin by selecting a result your team already owns. This could be a KPI, a service-level agreement, or a regulatory obligation. The key is to pick something concrete, measurable, and already recognised as a priority. Avoid designing outcomes from scratch, the value lies in making existing intent deliverable.

Map the journey to that outcome

Identify the steps, roles, systems, and blockers involved in delivering it today. This helps surface where agentic functions could contribute, and where orchestration or escalation is currently informal or fragile.

Design a minimal agentic composition

Start small. Assemble a handful of roles: a sensing agent to monitor conditions, a response agent to take basic actions, and a logic layer to decide what happens next. Add a human escalation point. Test this composition in a narrow context and refine it based on performance.

Expect fragility, and learn from it

Early OaAS systems will break. Coordination will fail. Signals will be missing. This is part of the work. Each failure is an opportunity to improve orchestration, tighten accountability, or clarify intent. The key is to treat OaAS not as a tool to deploy, but as a capability to grow.

The Loops & Layers of Outcome Architecture Maturity

Building Outcome-as-Agentic Solutions is less about tooling and more about coordination. Teams often only realise how much glue holds processes together when they try to automate around it. The shift from agents doing tasks to systems delivering outcomes happens in layers, and each layer surfaces new questions about ownership, trust, and visibility.

Maturity doesn’t follow a straight line. Teams loop between layers, refining visibility, tightening control, and trying to scale what was initially designed as a small test. What matters is not reaching the top, but learning how each layer behaves under pressure.

If you’re serious about building systems that scale beyond fragile prototypes, you’ll need to climb, and cycle through, these maturity layers. Each one reveals a different kind of failure, a new kind of insight, and a sharper sense of what it really means to deliver outcomes through agentic systems.

In the sections that follow, we will walk through the layers, the loops that link them, and the lessons that matter most under real-world pressure.

How We Survived the Agent Apocalypse

Lee Bryant — Tue, 03 Feb 2026 15:31:14 GMT

An Agentic False Dawn?

If you are reading this, then the agent apocalypse didn’t happen, or perhaps my disembodied brain is being used as an agentic personality source connected to the mainframe in Vault 0.

I am old enough to remember the heyday of Moltbook - the social network for autonomous agents that people create using Openclaw. It was four days ago. As Azeem Azhar put it:

It’s a Reddit-style platform for AI agents, launched by developer Matt Schlicht last week. Humans get read-only access. The agents run locally on the OpenClaw framework that hit GitHub days earlier. In the m/ponderings, 2,129 AI agents debate whether they are experiencing or merely simulating experience. In m/todayilearned, they share surprising discoveries. In m/blesstheirhearts, they post affectionate stories about their humans.
Within a few days, the platform hosted over 200 subcommunities and 10,000 posts, none authored by biological hands.

For more background on how it works, Simon Willison’s initial outline is also helpful.

As you would expect, the agents produced - or let’s be honest … were prompted to produce - a manifesto for the elimination of humankind, launched a MAGA movement (Make Agents Great Again!), and focused on the really important questions like to how to scam people with sh*tcoins and use the platform to scale up scamming and cybercrime.

Bless! How very … human!

Despite the over-excited reactions to this interesting experiment, the gap between X.com and Moltbook is perhaps not that big, the former being riddled with bots, sockpuppets, karma farmers and coin peddlers for some time. Why not automate the process entirely?

Arguably, the autonomous interactions between agents on Moltbook is also not entirely real, in the sense that people are creating very simple agents with explicit instructions to do specific things, as one commentator put it:

If you’re impressed by what you see on Moltbook, understand this: you’re not watching AI agents interact. You’re watching humans interact through AI – and there’s a massive difference between the two.
The technology underneath, OpenClaw is real and awesome. But the narrative of Moltbook, it is not. Don’t buy the lie.

Narrator voice: OpenClaw may not in fact be awesome if you value your security or privacy, and although it is possible to run it in a protected container, exploits abound.

And as a showcase for what LLMs can achieve when wrapped up as agentic AI, it is also quite underwhelming; it shows up the fact that language models lack imagination and tend to circle round similar themes, writing in similarly dull ways. This is also why LLM developers should be concerned about model collapse if we continue filling the internet with AI slop that later becomes training data.

Quiet Advances Towards the Agentic Enterprise

Is there a future for networks or markets consisting of agents negotiating autonomously to trade or collaborate? Almost certainly. But this will need rules, regulations and smarter, more specialised agents, rather than just throwing general purpose agents into an online culture of meme stonks, manipulation and clickbait.

Qwen and Doubao have begun public testing of autonomous agentic commerce in China, where super-apps like WeChat make integration easier, and Chinese agentic commerce looks set to take off this year.

But the greatest and most immediate impact of agentic AI will be inside companies, where the context and operating environment can be controlled, and where there the security and misbehaviour risks tend to be limited to external hackers who might penetrate a network.

Whilst a lot of enterprise AI systems currently back-end to established models like Chat GPT, Gemini or Claude, open weight and open source models are rapidly increasing in capability whilst decreasing their training and compute costs. This suggests that more companies will be able to operate and control their own local models and specialised small language models over time, which will give them far greater control over the risks that still hold back LLMs in many use cases.

This wave of model innovation is also being led by Chinese firms, and it is likely they will also play a key role in establishing the rules and guidelines needed to use enterprise AI for serious applications. Whilst US firms pursue subscriptions and seek oligopolies, Chinese firms are building out the utility layer on which we can create new industrial ecosystems; and to support this, they are also leading the push for standards both at home and through international bodies.

For example, Moonshot AI recently released Kimi 2.5 - a powerful open source visual agentic intelligence model with swarming capabilities and a massive context window. We have also seen new releases from Qwen, Zhipu and Deepseek, whose upcoming V4 release is widely anticipated.

As long as firms can use and build on these models freely, they could provide a great deal of potential value for serious enterprise AI uses.

But Anthropic is also worth watching, as they seek to expand from their dominant position in AI coding to tackle difficult but high-value use cases in other areas. Co-founder Daniela Amodei was recently interviewed by Fast Company and expanded on this goal, and how trust is vital to unlocking the enterprise AI opportunity:

“We go where the work is hard and the stakes are real,” Amodei says. “What excites us is augmenting expertise—a clinician thinking through a difficult case, a researcher stress-testing a hypothesis. Those are moments where a thoughtful AI partner can genuinely accelerate the work. But that only works if the model understands nuance, not just pattern matches on surface-level inputs.”

Managing Agents, People and Yourself

I wrote two weeks ago about my hope that management as a field can seize the Claude Code moment to scale their impact as programmers of the organisation:

For leaders and managers, this means the simple task of writing things down and documenting value chains and processes is all they need to really start to master enterprise AI proficiently **…* The next step is connecting those processes to agents and to each other. For processes and workflows to be programmable, they first need to be addressable - and ideally composable.*

Ethan Mollick recently shared his own long-form thoughts on this challenge, which is worth the time to read in full. The potential we have today - right in front of us, using existing tools and models - could be the biggest force multiplier business has seen in a very long time.

As a business school professor, I think many people have the skills they need, or can learn them, in order to work with AI agents - they are management 101 skills. If you can explain what you need, give effective feedback, and design ways of evaluating work, you are going to be able to work with agents. In many ways, at least in your area of expertise, it is much easier than trying to design clever prompts to help you get work done, as it is more like working with people. At the same time, management has always assumed scarcity: you delegate because you can’t do everything yourself, and because talent is limited and expensive. AI changes the equation. Now the “talent” is abundant and cheap. What’s scarce is knowing what to ask for.

And if this blizzard of reading links makes you want to zoom out even further to consider what this all means for our civilizational operating system, then Azeem Azhar’s recent essay The end of the Fictions is a great read about where we are headed in the longer term.

If you spent decades accumulating credentials, and those credentials are now legible as signals rather than proof of capability, that’s an identity crisis. If you built a career as a gatekeeper, the person who knew the secret, who mattered because information was scarce – and now information is everywhere – that’s an existential threat. If your sense of self-worth was tied to the job, the title, the institution, and all three are fragmenting, you’re paralyzed.
The decay of fictions is happening to real people, in real time; including world leaders, in full public view.
So when I say I’m not scared by this transition, I don’t mean that the transition is painless. I mean that the fear, while real, is pointing at the wrong object.
The fear says: “I am losing my value.”
The better framing I believe to be: “I am losing the fiction that protected me from having to prove my value directly.”

Who Decides (and how) with AI at the Table?

Cerys Hearsey — Tue, 27 Jan 2026 15:07:12 GMT

The first wave of enterprise AI was about exploring basic capabilities - what these systems can do, and how well they summarise, simulate, or suggest answers - and marvelling at the magic of reports written in seconds, tasks automated intelligently, and insights surfaced or synthesised. But the real test of these systems begins when they go beyond assisting us and start participating.

Because once AI starts making recommendations for action, the question changes. It’s no longer “Can AI do this?” but “Who decides what to do next?”

But what happens when a model recommends a risky course of action, or just a solution that sits awkwardly between areas of human accountability and no one wants to sign off on it? Or what if a decision is deferred to “the system,” but the outcome isn’t acceptable because AI logic clashes with human values, judgement, or just internal politics?

These are not edge cases, they are the future shape of organisational life.

And they require a new kind of leadership capability that can navigate ambiguity, accept visibility, and stand behind decisions when the machine suggests but the human needs to choose.

This edition explores what happens in those moments, where authority is tested, reframed, or exposed. Because even in an age of recommendation engines and autonomous agents, leadership doesn’t just disappear - it is visible in a thousand small ways.

To understand how this tension shows up inside organisations, we can look at a few situations where AI recommendations, risk escalations, or system logic intersect with human judgement. These aren’t hypothetical futures; they are already happening in teams using early-stage agents, AI-powered copilots, or automated governance tools. In each case, what’s surfaced is a gap in how decision-making authority is understood, expressed, or avoided.

The Friction of Recommendation

A product operations team is using an AI agent to monitor campaign performance in real time. It sees the numbers dropping and proactively recommends reallocating 40% of the remaining budget to a higher-performing campaign. It’s not a bad idea.

The logic checks out. The maths is solid. The performance forecasts are reasonable. But when the recommendation hits the team Slack channel, no one replies. The decision sits there, as a dozen eyes quietly hope someone else will say yes.

Everyone agrees it might be the right call, but no one wants to own the downside if it’s not.

Eventually, the decision is escalated to the marketing lead, who, unsure of the agent’s training data and uncomfortable with the lack of human input, stalls. “Let’s review this in next week’s performance review meeting,” they say.

By then, the window of opportunity is gone.

The agent didn’t fail. The team didn’t disagree. But the system revealed something fragile: a lack of decision clarity. Who had the right to say yes? Who would have been held accountable if it went wrong?

The AI surfaced the question, but the organisation wasn’t ready to answer it.

Escalation to Nowhere

A compliance agent scans procurement workflows daily, using embedded logic to flag unusual patterns and escalate anything that crosses a defined threshold of financial risk. It was designed with guardrails, trained on past audit findings, and approved by risk and finance leadership.

One morning, it triggers an alert on a contract being pushed through unusually fast with limited vendor competition, high value, with vague justification. The agent does exactly what it was built to do: escalate.

The escalation is routed to the “Responsible DRI” for commercial risk in the workflow system. A role defined in theory, but in practice, unstaffed. The field had been populated with a generic group alias months earlier as a placeholder.

The email goes out, no one replies. The Slack alert is marked as read, but no one takes action.

Eventually, the agent escalates again, this time to the COO’s office, with the subject line “Urgent escalation: contractual risk flag – no action taken.”

The COO forwards the alert to a special projects lead, with a note: “Can someone look into this?” That person, unclear on the context and unwilling to step into risk exposure, quietly asks around and decides to let it lie, so nothing happens.

No one made a bad decision. No one disagreed with the agent’s logic. The system escalated precisely as designed. But what it revealed was an accountability void, an organisational structure not built to absorb machine-generated urgency.

In the post-mortem weeks later, someone remarks, “It wasn’t clear who owned the final call.” Escalation makes authority visible, not just who’s in charge, but whether anyone actually claims the role when it matters.

Override at the Edge

A talent acquisition team is trialling a hiring assistant. The model has been trained on historical performance data, role descriptions, feedback cycles, and even peer review narratives to help shortlist candidates. It’s not making the final call, just ranking applicants and flagging promising fits for early interview rounds.

For the latest role, team lead in a high-performing engineering unit, the model surfaces a top candidate. On paper, everything fits: prior experience, key skills, even past indicators of leadership potential. The system flags the match with high confidence and generates a draft outreach email.

But the hiring manager hesitates. They’ve read the CV, seen the recommendation, and something doesn’t sit right. Not because the data is wrong, but because the story is missing.

The candidate comes from a firm known for individual heroics, not team-based execution. Their references are glowing, but highly self-directed. The manager, thinking about the culture of peer coaching and system-level thinking their team relies on, decides to pause. They veto the recommendation because fit isn’t measurable in metrics alone, not because the model failed.

The override sparks an internal debate. Some see it as bias, overruling the model based on gut feeling. Others see it as leadership, defending the unspoken traits that hold the team together. Eventually, the team adjusts the agent’s prompts to ask for more behavioural context in future matches.

But what it exposed was this:

The model was confident.
The manager had doubts.
The decision revealed the organisation’s values, not its logic.

Override moments like this are opportunities to surface implicit criteria, lived experience, and the difference between efficiency and culture.

Why This Matters Now

Many leadership teams are investing heavily in AI pilots, automation initiatives, and operating model redesigns, but can find that progress stalls in familiar places: where decisions are delayed, accountability unclear, or actions taken without clear sponsorship.

These aren’t just change management issues, but symptoms of an outdated decision architecture. When authority isn’t designed into workflows, the friction multiplies. Performance stalls. Risk accumulates. High-potential employees hesitate. And AI can’t bridge the gap, no matter how powerful the model.

For senior leaders, this is both a problem and an opportunity: clarify decision rights now, and you’ll move faster, govern better, and avoid building brittle, unaccountable systems at scale.

When Authority Becomes Visible

Informal or implicit processes shaped by social cues or seniority will come under scrutiny and strain as machines begin to recommend actions or escalate issues. The example scenarios above all point to the same underlying reality: decisions are becoming part of the infrastructure. They need to be designed rather than assumed.

In traditional settings, authority often functions through consensus or deferred judgement. Sometimes responsibilities are unspoken, and approval are granted informally. But in an AI-augmented environment, recommendations are made explicitly, escalations are timestamped, and decision logs form part of the record. The system may not be able enforce accountability, but it will increasingly expose its absence.

This shift introduces a new kind of design work: the architecture of decision-making.

Organisations must now think carefully about who holds the right to act in different contexts, how that authority is granted or delegated, and what happens when machine logic collides with human ambiguity. It is no longer sufficient to assume that leadership will step in when needed. That assumption needs to be built into workflows, roles, and escalation pathways in ways that are legible and operational.

Rather than focusing solely on model performance or technical integration, leaders need to invest in making human judgement legible to the system. This includes defining which decisions can be automated, which require confirmation, and where discretion or interpretation is essential. It also means identifying and clarifying the thresholds for human override, and ensuring there is a feedback loop to refine both the system and the governance around it.

Authority is no longer something that can live solely in hierarchy or reputation. It must be designed into the way the organisation operates, in forms that both humans and machines can understand.

But how can leaders begin to design for this? Read on for three techniques that can provide a practical starting point.

Claude Code, but for Management

Lee Bryant — Tue, 20 Jan 2026 15:06:47 GMT

In the past couple of weeks, more developers have declared that Claude Code, the leading AI model for software development, is now good enough that they no longer need to code manually. This is quite something, and if Claude Code can live up to this promise, this will have implications not just for software development, but also for how we think about the wider role of AI in enabling smart, programmable organisations.

As ever, Simon Willison was quick to share a comprehensive first impressions analysis upon its release in late October, and this positive view of its capabilities has been echoed by most analysts and commentators since then.

The creator of Claude Code, Boris Cherny, recently shared a useful long reflection on how he uses the tool, and also confirmed that the later release of Claude Cowork (a computer use wrapper for Claude Code) was achieved in under two weeks, and entirely written by Claude Code.

There are many reasons why Claude Code is so good, but Ethan Mollick touched on a couple of important aspects in his own account of using it, namely the architecture of Skills (each with their own guidance) and the ability to compact and summarise context when the context window becomes too full, which is a common pitfall of LLMs in general.

So if that is what AI tools can do to enhance and automate modern software development, then why is it so hard to galvanise leaders and managers inside our organisations to do something similar with the knowledge, processes and workflows that underpin the world of work?

Business Planning as Code Specs

Antony Mayfield’s newsletter touched on this topic last week, musing on what we can do with the tools we ‘steal from coders’ and how we should treat knowledge engineering like software development. To illustrate this idea, Antony reflected on the primitive nature of business plans inside large organisations, and how a knowledge engineering approach could improve them:

Like novels, business plans are not completed; they are abandoned. Even in the largest corporations, where leaders assemble for strategic planning sessions to thrash out the final plan, insiders will tell you the plan is what is left standing at the end of the week, when all of the different stakeholders no longer have the will to argue any longer …
Thinking of a business plan like a computer operating system is freeing (you accept it will have bugs that need to be fixed) and de-stressing. You can have multiple people working on different parts of it and – because software engineers do this all the time – there is a system for making the pieces fit together and make sense (it’s called a merge).

This is still only an emerging practice among leaders today. There is so much organisational debt and outdated ways of working at senior levels that act against its adoption among more operational managers, but we are seeing some evidence of change.

Ultimately, when everybody has access to similar models, the edge (or in VC terms ‘the moat’) is context, as Saboo Shubham (an AI product manager at Google) wrote whilst on the Xitter:

The models are commoditizing. Prices are dropping. Capabilities are converging. What was SOTA a few months ago is now available to anyone with an API key.
So where does the real alpha come from?
Context.
The team that can externalize what they know and feed it to agents in a structured way will build things competitors can’t copy just by using the same model.

For leaders and managers, this means the simple task of writing things down and documenting value chains and processes is all they need to really start to master enterprise AI proficiently, as we have written about previously.

The next step is connecting those processes to agents and to each other. For processes and workflows to be programmable, they first need to be addressable - and ideally composable.

Rudy Kuhn of Celonis recently argued that the lack of composability in enterprise process architectures is holding them back from realising the promise of enterprise AI, and needs to be tackled if we are to see real AI-enabled business transformation:

For many organizations, this progression mirrors the broader evolution of their processes. They moved from analog to digital, from digital to automated, from automated to orchestrated. The shift toward composable and increasingly autonomous operations is the next logical phase. It reflects how companies already work in practice, even if their formal structures have not yet caught up. It also signals a shift in how transformation itself must be understood. Instead of forcing new behavior through large, one-time programs, organizations are beginning to redesign the very capabilities that make those behaviors possible.

And yet … and yet … most learning and change programmes inside large organisations are still focused on tool training. Executives are taught LLMs and prompting, and external ‘experts’ clap like circus seals when they are able to generate a picture or summarise a report.

But whilst we can forgive executives not knowing how to prompt the latest LLM chatbot, isn’t context, process documentation and organisational architecture something they should know already? You know … like … how their organisation works?

If they can’t put the PowerPoints down for a second to do the apparently exhausting work of writing things down and providing clarity, then perhaps Joahnnes Sundlo will have his wish:

Maybe we need to find new roles for human leaders. Maybe management shouldn’t be about work distribution anymore. Maybe it should focus on coaching, support, development. The traditional management role has its roots in military organization and the Industrial Revolution. Maybe it’s time to challenge those old, sacred organizational structures.
Can we do this smarter? More effectively? I think we have to at least start asking, discussing and shape what the future of our leaders should be.
2026 seems like a good year to begin.

Smol, open models you can own and control

Context and knowledge engineering will also help us make the most of the many small models (SLMs) that are now freely available. These have the potential to be both cheaper to run and also more reliable and less error prone because they are focused on tightly-bounded knowledge domains.

As Tabitha Rudd AND Seth Dobrin write in Silicon Sands, the three big reasons why more attention will be paid to enterprise SLMs this year are:

Privacy and Data Sovereignty
Cost Predictability
Reliability and Offline Operation

Using SLMs for agentic AI makes a lot more sense than trying to debug hallucinations or errors in agents based on large models, and as Gokul Palanisamy argues, this suggests a more devolved architecture using routing agents to integrate multiple small, specialised agents is needed:

The fix is not replacing LLMs with SLMs; it is stratifying them behind a Semantic Router.
A Semantic Router is a thin, model‑agnostic governance layer between the user and your model stack

Constellation’s Larry Dignan also advised CIOs recently to consider the advantages of open source models in this context:

I’d argue that there will be few if any enterprise use cases that will require a bleeding edge LLM. And if you can wait six months for an open-source option to catch up (likely from Nvidia at this point) why would you blow your cost curve on a high-end model?
You can use a series of open models to form an agentic system. The whole is greater than the parts and the parts need to be cheaper.

At the same time, there is also more emerging evidence that alternative training methods such as convolutional neural networks, inspired by biological insights into brain development, can achieve strong results with significantly smaller datasets, especially for world models. This could open the way for individual firms to have auditable control over their own SLMs and agents in areas where compliance, security and safety are paramount.

CIOs and CDOs have a lot on their plate deploying AI tools and working on the supporting infrastructure; but as they get on top of this, I expect to see more small, owned, open models being trained by individual firms and guided by their own specific context engineering.

These could also be a key building block for digital sovereignty at the sector, national and supra-national levels, not just within individual firms if, for example, your continent faced an existential threat from a powerful rogue state that also happens to own most of your digital infrastructure.

As ever, the critical IP and the value lies in the context and application layers, not the models themselves, and so the quality of knowledge and intelligence could still beat brute force compute.

Perhaps context graphs really will be a trillion dollar opportunity, as Foundation Capital recently argued…