Agents at the Ready? Yes and No...
Mythos, memex and management - how the various layers of agentic AI development are moving at different speeds and why we need to bring them together
Agentic AI capabilities are developing within several pace layers at once - economic, infrastructure, capability readiness, and knowledge engineering - and it is getting harder to stay on top of these developments whilst tracking their interdependence. But organisational readiness is moving slower than each of them for most companies. Are agents ready for the big time yet, and if not, where should we focus our efforts?
Anthropic in the news
Anthropic has been making the news a lot recently, with the leak of its Claude Code codebase (the harness code, not the model weights, etc) and more recently with the announcement that its new model preview Mythos is so powerful that it was held back from release until Anthropic built some counter-measures for its astonishing ability to detect security exploits in all kinds of commonly-used software:
According to Anthropic, Mythos Preview crosses a threshold of capabilities to discover vulnerabilities in virtually any and every operating system, browser, or other software product and autonomously develop working exploits for hacking. With this in mind, the company is only releasing the new model to a few dozen organizations for now—including Microsoft, Apple, Google, and the Linux Foundation—as part of a consortium dubbed Project Glasswing.
Both developments have created a lot of fear and noise, and Mythos points to the huge gap opening up between the (risky, arguably uncontrollable) art of the possible on one hand, and the architectures and control systems we have in place to govern and guide AI within complex organisations.
But it is a third, less dramatic development that I think is most immediately relevant to enterprise AI and the pursuit of new organisational operating systems: Claude Managed Agents, which they describe as “a hosted service in the Claude Platform that runs long-horizon agents on your behalf through a small set of interfaces meant to outlast any particular implementation—including the ones we run today.”
It seems to be aimed at SaaS teams that want to embed Claude agents into their products, providing a managed infrastructure layer with composable APIs that sits between your system and Claude’s models. Ken Huang provided a good overview of its three main components (harness, session log and memory) and some thoughts on its potential usage.
Anthropic have called it a meta-harness that could help scale agentic AI and provide an abstraction layer that can cope with future changes and innovation:
The challenge we faced is an old one: how to design a system for “programs as yet unthought of.” Operating systems have lasted decades by virtualizing the hardware into abstractions general enough for programs that didn’t exist yet. With Managed Agents, we aimed to design a system that accommodates future harnesses, sandboxes, or other components around Claude.
This is one of the first serious attempts to define an agentic infrastructure layer, rather than just an agent.
Is Agentic AI Showing up in Enterprise ROI?
Adoption and investment analysis for enterprise AI usage is beginning to show the impact of agentic AI above and beyond individual productivity use cases for chatbots. A16z’s latest report (released last week) cites coding, search and support use cases as the most active in the firms they surveyed, with tech, legal and healthcare as the sectors most keen on AI adoption.
But we will not really begin to develop a full picture of enterprise AI’s impact until we make more progress building out the kind of architecture necessary for agentic automation to start operating as a managed layer of the tech stack.
McKinsey discussed this paradox - AI is everywhere, and yet the agentic organisation is not yet visible - in a recent podcast with Senior Partner Alexis Krivkovich:
The real promise with agentic, relative to generative AI or previous evolutions of AI, is that you can have the equivalent of superhuman capabilities added to your teams. But the day-to-day workflows and the rituals around ways of working will need to fundamentally change.
That’s what we mean when we say the operating model needs to shift. You need to think about how the hours of the day happen differently, the process of overseeing an agent population, how you engage in problem-solving as a team—and put the right governance and risk controls on top of that.*
AI is everywhere in the enterprise, in other words, but the work of organising around it has barely begun.
Agents & The Coasean Singularity
But the real prize is not just greater efficiency in the old work model, but the so-called Coasean Singularity that occurs if/when agents drive workflow transaction costs towards zero, changing the entire premise of what organisations are and why they exist, which was the question Ronald Coase posed in his famous 1937 essay The Nature of the Firm.
NBER released an interesting working paper on this question in late 2025, which concluded that this shift could challenge existing market structures in good and bad ways, whilst also opening up entirely new forms of exchange beyond simple labour, jobs and contracting:
The capacity of AI agents to dramatically reduce transaction costs as automated intermediaries could unlock new forms of market participation, enable previously infeasible mechanisms, and push allocative efficiency closer to competitive ideals. Yet the same forces that make agents attractive— their tireless persistence, computational superiority, and negligible marginal costs—also threaten to overwhelm existing market structures. The ultimate impact will depend critically on collective choices adopted regarding agent design, market structures, and regulatory frameworks.
Professor Howard Yu wrote an interesting piece on this recently, covering many angles that I think are useful starting points, such as Haier’s Rendanheyi model and Sangeet Paul Choudary’s work on unbundling and rebundling tasks and jobs. One of his key takeaways is that when transaction costs are lowered and supply can meet demand more efficiently, there is a tendency to create barbell-shaped markets where the premium and commodity ends grow, but the middle tier contracts.
Jack Dorsey’s rather chaotic financial services firm Block claims to be developing its own agentic operating system that will sever the link between headcount and output, but it is hard to know if their layoffs are the result of this innovation or just an unwinding of previous over-hiring.
John Rossman wrote glowingly of Block’s intentions last week, and although I profess a degree of scepticism given Block’s history, there was one observation by Block’s Executive Officer in the piece that is worth highlighting:
Before Block could restructure how work gets done, they had absolute clarity on what work needs to accomplish. This requires “Thinking in Outcomes”. Jennings described three non-negotiable outcomes that governed every restructuring decision — reliability (no outages), regulatory integrity (compliance teams untouched, full stop), and durable growth (roadmap commitments honored).
Outcomes clarity and context are vitally important if we want reliable agentic architectures.
But we should not presume that agentic architectures will simply automate and wipe out jobs in complex organisations. As a layer of intelligence, automation and orchestration, agentic AI will make many new value creation methods possible, which could have an expansionist effect resulting in more activity and potentially also more or better human roles.
Agentic Capabilities Still Maturing
We have written a lot about emerging agentic capabilities and how companies hope they will develop to become a key new infrastructural layer in the organisational operating system stack, and the progress we are seeing is encouraging. But it is easy to conflate proofs of concept that demonstrate theoretical performance with mature, tested capabilities.
Luca Mezzalira wrote a thoughtful post last month reflecting on an O’Reilly Network event that discussed agentic capabilities in some depth, which captured some nuanced critiques of current agentic AI state of the art. For example, as long as our agent testing is based only on behaviour and outputs, rather than its real capabilities, then we are at risk of missing deeper problems with how agents sometimes satisfy test conditions, which is not always honest in the sense people would understand the term. There is hidden risk here if we move too fast and assume too much.
Mezzalira’s take-aways from this are threefold.
First, he argues that introducing deterministic guardrails to probabilistic agents is non-negotiable.
Second, we don’t yet know what good looks like, so we should avoid the mis-steps that accompanied the micro-services movement, which had some parallels with agentic AI, but we should also be cognisant of the risk gap this opens up. If agents can already autonomously develop working exploits using a model like Mythos, not knowing what good looks like is a scary prospect.
Finally, we should recognise and accept that we are all beginners and there is a lot still to learn and discover about how agentic AI will play out in the real world, and especially how it will intersect with human behaviours and failure modes.
Recently, we have seen several studies and startup announcements relating to neuro-symbolic AI systems and agents that might be part of the solution for deterministic agents in areas where we cannot tolerate probabilistic risk. Perhaps at a minimum, it is possible to use neuro-symbolic methods in testing or orchestrator agents to verify the output of other agents in the system.
Memex Redux: Agents that Document your Knowledge
There is one final development worth flagging, as I think it points to a bigger trend we will see in agentic enterprise AI in the future. Andrej Karpathy shared a gist called LLM Wiki that helps us use AI agents to build, grow and curate our own structured personal knowledgebases, as Vannevar Bush first imagined in 1945, and Ted Nelson later attempted with project Xanadu:
Most people’s experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question….
The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources.
The idea of feeding our AI agents relevant information and knowledge, but asking them to maintain a structured knowledge graph or wiki is potentially helpful in overcoming the limits of memory and the context window when performing long-run or multi-stage tasks. I would like to imagine each agent we train and deploy in the enterprise could maintain its own knowledgebase to improve its learning and to stay updated on relevant developments that might impact its work.
Among the AI solopreneur and personal productivity communities, there is already a popular technique that involves combining Claude with the personal knowledge repository Obsidian to give personal agents better context to work with. But as one commentator pointed out, simply loading relevant wiki pages as .md files into an LLM’s context window is not really the same as a functioning memory that can be queried like a database, nor is it very scalable.
So whilst this is a great starting point for a simple memory and context system for people and agents, it would probably need to be paired with a real knowledge graph and/or database to allow agents to query, search and refactor the information they need without overwhelming the context window.
What would it mean for every enterprise agent to maintain its own structured knowledge base? What are the implications for how we build and govern knowledge infrastructure? Given how hard it has been over the years to persuade leaders that this kind of knowledge engineering should be a key part of their work, perhaps this development might point to us being able to accelerate world-building and context engineering using agents to do the hard work.
Within most organisations, there is a lot of talent and work going on across the operational, technical and (hopefully!) architectural layers that will support agentic AI. But it is worth repeating that only leaders will have the holistic view of how the economics, infrastructure, capability readiness, and knowledge engineering all work together to make reliable, productive agentic AI a reality.
Managed agents with well-curated knowledge repositories could lead us to the Coasean Singularity with all the economic benefits that entails. But if we don’t bring it all together in a considered and safe way, then Mythos points to what might go wrong.



