Agents of Progress or Agents of Chaos?
The delta between personal and enterprise agentic AI development is worrying, but perhaps combining the two offers a way to help overcome their limitations...
The OpenClaw moment we covered a few weeks ago was a wild ride. But in the age of YOLO, Hodl and r/wallstreetbets, it should come as no surprise that there is an apparently limitless supply of people willing to hand over control of their personal computer to AI agents in pursuit of rapid progress.
One fascinating research study - Agents of Chaos - let OpenClaw run riot within a controlled lab environment, and concluded:
During a two-week experimental investigation, we identified and documented ten substantial vulnerabilities and numerous failure modes concerning safety, privacy, goal interpretation, and related dimensions. These results expose underlying weaknesses in such systems, as well as their unpredictability and limited controllability as complex, integrated architectures. The implications of these shortcomings may extend directly to system owners, their immediate surroundings, and society more broadly. Unlike earlier internet threats where users gradually developed protective heuristics, the implications of delegating authority to persistent agents are not yet widely internalized, and may fail to keep up with the pace of autonomous AI systems development.
For those of us with a slightly lower risk appetite, Claude Cowork has also evolved very quickly as a more mainstream alternative to OpenClaw, and is incredibly impressive.
To take but one of many examples of personal agent setups, Eric Porres recently shared his own Claude Cowork harness and lauded its ability to help him manage his portfolio of work activities in a more powerful and efficient way:
the gap between “AI as chatbot” and “AI as operating system for your work” is closing fast. And Cowork is where that gap collapses for non-developers.
At their impressive GTC event in San José recently, Nvidia placed great emphasis on this agentic inflection point as a pointer to where AI is headed next, and also crucially where it can start to show strong returns. In his keynote, CEO Jensen Huang challenged companies to understand the nature of this shift with a typically provocative statement:
Every company needs an OpenClaw strategy
But this was not a crazy call for companies to let OpenClaw run riot in their organisations. The message, as Azeem Azhar interpreted it, is a lot more far-reaching: it is about the shift from the model training era to one of inference and execution.
The dominant driver of AI progress (and NVIDIA’s revenue) was training compute: the vast one-time cost of training large foundation models. This emerging thesis says that the next scaling frontier is inference-time compute — spending more compute at the moment of generating a response, letting models “think longer” on hard problems (chain-of-thought, test-time search, etc.) rather than just being bigger. This changes the hardware economics significantly: inference demand is continuous, distributed, and latency-sensitive rather than concentrated in large training runs. It also opens up physical AI (robotics, autonomous systems) as a major new inference market.
The focus shifts from models to what we do with them - or as Azeem put it: “The harness is the revolution”…
For AI, the harness moment happened at the tail end of 2025. Claude Code began to work reliably enough that you could leave it running overnight and trust what it had done in the morning. Not perfectly, but reliably enough. And that threshold, that “I can leave it to its own devices” threshold, changed everything. It changed what users asked AI to do. It changed how long tasks ran. It changed the token usage profile of every organization that crossed it.
Now, OpenClaw is the harness for the next layer.

Talk to my agent!
What does this mean for agentic AI in the enterprise, and should we be concerned about the way organisation-owned agentic services seem to be lagging so far behind the rapid evolution of personal agents?
Personal AI agents have evolved so much faster than enterprise agentic services that the gap between them is becoming structural, and we could end up with more ways to navigate broken enterprise systems instead of business transformation.
But if we embrace personal agents at work as prototypes and testbeds for shared enterprise services, then perhaps we can close the gap.
Looking at this challenge within the wider context of the shift from model training to inference and runtime intelligence, however, it is clear that much more focus is needed on world models and decision intelligence infrastructure to enable shared enterprise agents to work reliably with less human supervision. The good news is we can do much of this with the tools we have access to today. It is less of a tech challenge and more of a readiness / architectural question.
Microsoft has recently been beefing up its agentic capabilities in its M365 platform, and has just announced a multi-model capability for verifying complex research.
Perplexity has launched an agentic harness for the enterprise, based on internal tooling that was used by its own employees to speed up delivery.
Elsewhere, Salesforce’s agentic foundry, SAP and a host of other stalwarts from the previous generation of enterprise platforms continue to announce new agentic capabilities.
But the capability diffusion gap continues to widen, and there is now a risk that personal agents will evolve so much faster than enterprise agents that we could recreate the ‘old wine in new bottles’ problem that we saw with the earlier phase of Robotic Process Automation (RPA), and use AI to navigate a broken system better, rather than to fix the system or start building a better one.
As Eric Zhou and Seema Amble of Andreesen Horowitz remarked recently, the world still largely runs on old, poorly-designed enterprise platforms not because they are good, but because organisations contorted themselves around their inadequacies and foibles to such an extent that ripping them out could be painful:
To ask a question that sounds almost disrespectful until you’ve spent a week in a Fortune 500: why do people still use SAP (and ServiceNow, and Salesforce) at all?
The short answer is that SAP, or any major legacy system of record, captures critical data across the businesses that use it. But on top of that, the business has customized it and built a set of specific procedures and roles on top of it, much of which is not actually documented anywhere.
Or at least that was the case until now. The authors argue that agentic AI in the enterprise could replace these behemoths over the medium term, but even in the short-term, it could make them more malleable and easier to work with.
Perhaps this is an area where personal agents in the enterprise can become a testbed for enterprise agentic services, trying out automations, workarounds and multi-agent tasks under human oversight, before becoming candidates for new shared services that run more autonomously.
With a personal agent harness, we can each maintain our own personal work context, memory, preferred styles and so on, and our agents will be able to use this to navigate the enterprise and interface with systems of record at the API and data level, bypassing the UI altogether. And our agents can talk to each other to take over a lot of the boring, time-wasting scheduling, alignment, stakeholder communications and basic coordination tasks that consume so much of a leader’s time in large enterprises today.
Later, as more shared enterprise agents come online, our personal agents could help manage our relationships with these as well, for example sequencing the various actions needed for us to run a project or gather intelligence for new ideas.
That suggests to me that we should try to find a way to embrace technologies like Claude Cowork safely within the enterprise to deliver on the potential that copilots promised. Nothing is risk-free, and we clearly need to focus on guardrails and permissions, but if advanced users are willing to be accountable for their agents in return for the productivity and value they could generate, then we can probably find ways to make it work.
World Knowledge, Architecture and Run-time Intelligence
But it is also worth thinking about the differences between personal agents and enterprise agents, and what we have learned so far on this journey of discovery.
First, we need a more considered and thoughtful approach to productivity than boasting about how many lines of code (LOC) we can churn out.
Mario Zechner tried to summarise his own lessons from agentic coding last week in an interesting, opinionated piece about the dangers of brittle software, missed learning and unmaintainable systems, and concluded we need to “slow the f** down.”*
In a similar vein, Matt Webb reminds us that good architecture beats a high LOC count every time, and helps avoid personal-agents-as-workarounds suffering from the same failure modes as RPA, mentioned earlier:
The thing about agentic coding is that agents grind problems into dust. Give an agent a problem and a while loop and - long term - it’ll solve that problem even if it means burning a trillion tokens and re-writing down to the silicon.
Like, where’s the bottom? Why not take a plain English spec and grind in out in pure assembly every time? It would run quicker.
But we want AI agents to solve coding problems quickly and in a way that is maintainable and adaptive and composable (benefiting from improvements elsewhere), and where every addition makes the whole stack better.
So at the bottom is really great libraries that encapsulate hard problems, with great interfaces that make the “right” way the easy way for developers building apps with them. Architecture!
Another question is where we can live with probabilistic models, and where we need a more deterministic approach. We can tolerate the limits of probabilistic models in personal agents we oversee and can test, but when we start to think about embedded or autonomous agents that work the same way for everybody, a more deterministic approach is often needed.
Just as Nvidia’s forward strategy is about more real-time inference in the output stage, we can start to imagine how more verification and compliance could also be done at runtime to make some enterprise AI agents more deterministic.
Artur Huk wrote about this a few days ago for O’Reilly, describing ‘decision intelligence runtime’ as a missing capability layer in agentic AI - more of an engineering pattern than a specific solution or technique.
And this brings us back to a topic we have been noodling on for some time, which is the vital importance of world-building in ensuring the success of enterprise AI.
Whilst some aspects of world-building are about creating good general context for people and machines to have clarity about goals, ways of working culture, language and so on, there are other aspects of world knowledge that are more precise and scientific.
The way that current models handle world knowledge is largely in the training stage, rather than the inference or runtime moment. The way autonomous driving systems learn is a good example. Waymo vehicles in Austin developed a nasty habit of illegally overtaking school buses on pick up, and the school district worked with the company to give them simple rules and guidance on how to avoid this happening. But the training process for Waymo’s system is so long and includes so much data, that they were unable to simply add a new rule quickly, and the cars kept overtaking buses. More runtime inference and decision intelligence based on world models is perhaps one way to tackle such anomalies.
So much of the existing decision intelligence inside organisations has never been captured, as Sharon Richardson remarked yesterday in her informative piece about context graphs, which means there is a lot we can achieve quite easily and quickly if we are smart about it. This does not require some kind of Sisyphean manual knowledge mapping exercise, because it is the kind of task that AI can accelerate with the right supervision, even down to the level of conducting structured interviews or After Action Reviews (AARs) to capture decision traces and reasoning from people.
The kind of world models we will need to realise the promise of enterprise agentic AI will go way beyond intangible knowledge and culture. They will need to understand the physical world, manufacturing, distribution and even domains like geopolitics that (not again!) are messing with supply chains and pricing.
How we build and evolve these models is truly a fascinating challenge, and one where we can use AI itself to help improve our AI readiness by doing the mapping, collating and documentation of the information we need to make them real.
Rohit Krishnan recently wrote that this idea is really the key to the future of work, which he sees operating more like a strategy co-op game than a single-player game, and I think that is right.
What’s needed in the enterprise world is such a world model - an engine that knows the rules, tracks the state, understands and predicts consequences.
The environment would connect to the systems a company already runs, the information that is gathered, the agents it uses, and build a live operational model of the business. Scale it across companies and you have the training data to build a compelling environment and an even better world model!
The question is, can visionary CIOs and leaders of AI adoption programmes make the case that urgent attention and investment is needed in AI readiness efforts rather than just rolling out co-pilot licenses and hoping for some marginal productivity gains?
We need composable, addressable processes, services and systems if enterprise agents are to operate autonomously. And we need codified rulesets, world models and decision intelligence to be available at run-time if we want them to operate more deterministically without the kind of oversight we perform with personal agents.


