Agents on the Night Shift
Self-improving agents, Socratic dialogue and temporal stress: things to think about on the way towards agentic engineering and machines that make machines
Andrej Karpathy’s new autoresearch tool recently ran 700 experiments on his nanochat codebase in two days. It found 20 improvements he had missed, delivering an 11% uplift in output. Tobi Lütke at Shopify tried it on his own hand-tuned model: 19% improvement, parameter size halved. What makes this remarkable is not the numbers. It is the mechanism. The tool does not just run tests — it updates its own Python code based on what it learns. The researcher sets the direction; the machine runs experiments overnight and arrives with findings.
This is one example of an important archetype I think we will see more and more in agentic AI: the machine that makes the machines, which is both exciting and slightly strange.
This raises interesting questions about where learning lives in a human-machine system, and who benefits from it.
As Jeremy Keith put it recently when considering how agentic coding is changing development practices:
Outsourcing execution to machines makes a lot of sense.
I’m not so sure it makes sense to outsource learning.
But the productive division isn't just human vs. machine learning — it's human imagination operating at the meta-hypothesis level, and machine speed exhausting the territory around it. A single wild guess or idea can now seed hundreds of downstream tests; what comes back isn't just an answer, but a richer map of the problem space than any individual researcher might have drawn alone.
From Agentic Coding to Agentic Engineering
It seems everybody is intrigued right now by the rapid changes that the latest agentic AI models are bringing to software development, and it is worth paying attention because a similar process is likely to play out across other areas of work.
The New York Times Magazine recently published a major feature, Coding After Coders: the End of Computer Programming as we Know it, covering the history of the field and the experience of developers navigating rapid transformation:
How things will shake out for professional coders themselves isn’t yet clear. But their mix of exhilaration and anxiety may be a preview for workers in other fields. Anywhere a job involves language and information, this new combination of skills — part rhetoric, part systems thinking, part skepticism about a bot’s output — may become the fabric of white-collar work. Skills that seemed the most technical and forbidding can turn out to be the ones most easily automated. Social and imaginative ones come to the fore. We will produce fewer first drafts and do more judging, while perhaps feeling uneasy about how well we can still judge. Abstraction may be coming for us all.
This is almost certainly not the end of computer programming as a discipline, despite the pace of change. Computer science will become more sciencey; programming — talking to computers — will become more literary. But the need for people who understand what is possible and how to make it happen will continue to grow.
But agentic coding is also creating new forms of cognitive overload among AI-assisted developers, including the puzzling sight of people sitting outside in Silicon Valley watching — but not touching — their laptops as coding agents grind through the work.
Matt Jones captured this strangeness well this week in a lovely piece of writing — Gas Town and Bullet Hell — in particular the temporal mismatch between human cognition and machine speed:
If brain fry is a clock problem — a temporal mismatch between human cognition and machinic speed — then solutions that only address interface design or training will help at the margins but miss the structural issue…
If we want AI agent work to feel more like flow and less like fry, the challenge isn’t making things faster or even slower — it’s about legibility, consent, and reversibility, and all three matter at once.
As we hit the cognitive limits of what single-player mode can achieve, the shift from agentic coding to agentic engineering becomes important.
Simon Willison has a typically thorough guide to what this means in practice: instead of using an agent to write some code, agentic engineering means tasking systems with higher-order goals and the ability to self-manage the path towards them with less micro-management. The craft, Willison argues, was never primarily about writing code — it was always about figuring out what code to write.
The Organisation as the Machine That Makes the Machines
We have argued for a long time that to produce good software, organisations need to become like software themselves.
Corporate failures such as Volkswagen’s first attempt to build a software division for its vehicles within a hierarchical and bureaucratic organisation prove the point. The technology was not the problem; the organisational architecture was.
And yet, elsewhere in the automotive world, this has been understood for some time. Jurriaan Kamer recently shared lessons from F1 teams, quoting a team principal on what they borrowed from the Apollo project in their pursuit of agility and excellence under pressure:
“What you can’t have is an engineer here having to go up and down a particular hierarchy and then hop across — in our instance, not just a different geographic location, but a different country altogether — and then go up and down. So instead, it’s a kind of different structure where it’s mission control instead of command and control.”
This distinction matters more than it might appear. Today, a developer can use an agent to write better static software, and that is a productivity story everybody can follow. But if we trace the trajectory of agentic engineering towards its logical conclusion — and Karpathy’s autoresearch is an early signal of where that leads — we will need a much more fluid and connected organisational structure where services and processes are digitised and addressable, so that they can become truly programmable and genuinely capable of self-improvement.
The organisation itself needs to become the machine that makes the machines. ASML’s famous EUV system is a useful reference point: a machine so complex that it requires extraordinary coordination between hundreds of specialist suppliers and internal teams, but one whose design assumes that it will be continuously improved by the people who build and operate it. The infrastructure is not static. It learns.
This also brings the learning question back into focus. If the machine is updating its own code overnight and accumulating insights from hundreds of experiments, organisations need to build the governance and oversight architecture that keeps humans genuinely in the loop — not as approvers of every output, but as the people setting direction, interpreting results, and carrying the institutional memory that the machine cannot hold. Otherwise, you end up with iteration without learning, which is just faster drift.
As Daniel Hulme reminds us in his recent thoughtful account of the philosophical and historical pre-cursors of agentic AI, we already have rich bodies of knowledge and methods to draw on:
The irony of this moment is that we are simultaneously living through the most rapid deployment of autonomous agents in history and underutilising the most relevant bodies of knowledge ever produced on how to make such systems safe. From Socrates’ method of structured interrogation to Aristotle’s formal logic, from Chrysippus’ propositional reasoning to the medieval protocols of adversarial disputation – and then from Carl Hewitt’s Actor Model to Michael Bratman’s theory of practical reasoning, from Leslie Lamport’s work on distributed consensus to Edmund Clarke’s model checking, from Lotfi Zadeh’s fuzzy logic to the agent architectures of Michael Wooldridge and Nick Jennings – these thinkers and many others spent careers building the conceptual and mathematical toolkit for exactly the challenges we now face. Their work isn’t historical curiosity. It’s a foundation we should be actively building on
The same could be said of our accumulated knowledge about organisational design. How systems learn, adapt, and maintain coherence under rapid change is not a new problem. We just have a new urgency to solve it.
The Infrastructure Is Coming
The broader technology ecosystem is already moving in this direction. Nathan Lambert’s survey of the current state of open AI models suggests we will eventually reach a place where specialised small models are freely available for organisations to adapt and build on when creating their own AI platform architectures.
Jensen Huang is unambiguous about where this leads:
“There will be no software in the future that’s not agentic. How could you have software that’s dumb? And so, it is absolutely true that every software company will become an agentic company.”
Instead of using AI agents to write better SaaS tools, this implies that software firms will make available agents that can continuously write, maintain, and evolve living software — software that has a sense of its own role and mission.
Incidentally, this also supports the thesis that ‘services as software’ will be a major new opportunity for specialist service providers.
Futurum Group published new research this week on CIO AI priorities, finding that enterprise goals are shifting from basic efficiency towards innovation and organisational change. Dion Hinchcliffe’s conclusion that “the generic efficiency argument for AI is dead” is heartening. The route to greater returns is more about systems and architecture than it is about individual tool use, and it seems more enterprise leaders are beginning to see this. The danger is that “innovation and organisational change” becomes the new banner under which old structures get expensively automated rather than genuinely redesigned.
Hypotheses & Organisational Learning
Karpathy ran 700 experiments in 48 hours on a well-defined optimisation problem with clean metrics and the ability to measure improvement objectively. That particular set of conditions is relatively rare. Most organisational improvement problems do not have clean metrics, do not produce outputs that can be evaluated overnight, and do not have the structured test environment that makes autoresearch possible.
What humans might lack in speed of iteration, they more than make up for in their ability to generate the wild guesses and what ifs that make for rich experimentation. Super-charging this innate human capability with the power of machines to loop through variations or play out scenarios could accelerate our learning and innovation in exciting new ways.
What autoresearch points towards isn't the automation of discovery, but its amplification. The human makes the leap; the machine explores where it lands.
Every organisation already has processes that could, with sufficient effort, be made legible, measurable, and addressable. The question for leaders is not whether to wait for the infrastructure to arrive — it will. The question is whether the organisation they are building now can actually use it when it does. The machine that makes the machines requires a very different kind of organisation than the one that deploys tools to make existing tasks faster.
What is the hypothesis-testing loop in your organisation that you most wish you could accelerate? And who, right now, is doing the learning?
A Quick Favour to Ask
Please consider signing the Rebuild Letter to support a great initiative I have been loosely involved in over the last year or so that aims to stimulate the development of better European social tools and networks to reduce our reliance on weaponised attention farming.



