AI Round-up: Both Destroyer and Maker of Worlds?
Interesting new developments in models big and small, plus a note about the increasing focus on world models and world building as the next frontier for AI
Large, small, tiny & nano model developments
After the recent release of GPT-5-Codex, Anthropic picked up the AI-enhanced coding baton at the end of September with the release of Claude Sonnet 4.5. Since then, results seem positive for both models, which means we are continuing to push forward the boundaries of AI coding.
Claude is now particularly good for long-horizon tasks and has been able to sustain a 30-hour multi-stage task in the labs, and both models are no longer really simple LLMs, but can be thought of as incorporating and managing sub-agents, tool chains and memory to plan long tasks and self-evaluate results. OpenAI also recently released an updated Codex and Agentkit, which is a set of tools designed to help developers and enterprises build AI agents from prototype to production.
There are so many other factors that will determine the success of agentic AI in the enterprise, but the continued improvement of models is at least encouraging.
Another recent event worthy of note is the release of Google’s Gemini Enterprise, which bundles existing Google AI tools into a single stack, whilst adding interesting new features such as agent orchestration and management, real-time context integration, central governance and pre-built agent templates to accelerate adoption. Gemini’s in-browser automation capabilities are also improving, with the preview release of Gemini Computer Use 2.5.
But in addition to the big model announcements, it is also interesting to see the attention being paid to approaches such as Tiny Recursive Models (TRMs). These are models that are very small in terms of their number of parameters, but use recursive improvement to produce impressive results, albeit in limited areas of application thus far.
In a similar vein, AI pioneer Andrey Karpathy this week released a rather hipster-ish micro model he calls Nanochat, which uses just ~8k lines of hand-written organic free-range code to generate a minimum level of intelligence that can be trained using just a few hundred dollars of compute.
We have written about Small Language Models (SLMs) several times, and continue to believe that they will play a key role in specialist areas of knowledge and operations where organisations want more control over training data and outputs in narrow domains. As explainability of outputs become more important with companies moving AI into production, such models could be more transparent, and therefore more trusted.
Changing the world vs making new ones
There have been a few other recent examples of research and innovation aimed at making better use of existing model capabilities, for example:
Meta’s very expensive ‘super-intelligence lab’ has described a new approach to document chunking for Retrieval Augmented Generations (RAG) that could be much cheaper and quicker to return relevant information when taking into account large document stores to guide the reasoning of LLMs.
Google and University of Illinois Urbana-Champaign researchers have shared a paper about ReasoningBank - a memory framework designed to let agents distill, store, retrieve, and reuse reasoning strategies (not just raw logs) guided by Memory-aware Test-Time Scaling (MaTTS), which allocates extra compute to generate multiple trajectories (parallel scaling) or self-refinements (sequential scaling) in a feedback loop.
The increased focus on efficiency as opposed to just brute force scaling of compute to train models is a welcome trend, given the insatiable appetite of LLM developers for cash and energy.
As Azeem Azhar put it in his newsletter this weekend:
Silicon Valley spent decades abstracting away from physical reality. But AI is so computationally intensive that it’s dragging tech back into the world of concrete and copper wire. Software is no longer eating the world; it’s demanding a new one.
Physical infrastructure and energy are becoming bottlenecks, with a whole host of implications for the US economy, environment and industrial policy. An optimist might argue that this will be the catalyst for a new wave of renewable energy investment, and maybe that will be the case outside the USA.
But demands for a new world don’t end there.
Elon Musk is apparently so disappointed with the conclusions and outputs of his own anti-woke xAI that he wants to re-write the source material it is trained on - most notably Wikipedia.
It might seem like a ludicrous idea, but it is alarming enough to be taken seriously. The current US administration has demonstrated that it is possible to both persuade people to distrust observable facts and also to construct entirely fake reality bubbles for them to live in, powered by anger and fear.
At the same time, the FT reported this weekend that xAI is also investing heavily in world building, apparently in order to make a foray into the potentially lucrative world of AI-generated games.
Michael Douse, head of publishing for the developers of Baldur’s Gate 3, told the FT that the bigger issue in gaming is leadership and vision, adding that the industry did not need “more mathematically produced, psychologically trained gameplay loops [but] rather more expressions of worlds that folks are engaged with, or want to engage with”.
This is an area that Google, Meta and others are also investing heavily in, but perhaps without quite the same ideologically-driven goals as Musk.
Google / Deepmind’s Genie 3 is trying to advance world building and AI video generation by bridging generative and embodied worlds, with better object identification, more coherence and object permanence, and better physics and realism. They hope it will serve as a testbed for agents & simulation by providing virtual worlds where AI agents (robots, autonomous systems) can learn, experiment, plan, and generalise, in a rich, open-ended environment (listen to Deepmind team members talk about their innovation here).
This puts it in a slightly different bracket to OpenAI’s Sora 2, which is a powerful and rapidly improving video generator with less focus on world building. These approaches could perhaps be combined in future to generate cinematic experiences within a more structured world-aware model, with implications not just for gaming, but also learning, training, digital twins and simulations.
World Building in Organisations
But world building is not just for games, movies and books. It could also play an important role in designing and shaping our organisations, and making them more attractive places to work or associate with.
Like the planet Kairos where I spent Saturday morning levelling up my adorable vault hunter Rafa in Borderlands 4, each organisation has lore, culture, and a back story. And like the groups who live in the game, they also have a mission, purpose and goals that motivate and connect their people. So although we are all mostly using consumer LLMs trained on the same huge corpus of data, organisations will in future need to localise and refine its own knowledge and world view to create the context in which its own people and AI can operate successfully.
Organisations have long understood the power of narratives, brand identity and corporate history to build a sense of common purpose that also guides peoples’ day-to-day behaviour. But in future we will need more detailed and explicitly stated norms, rules and guidance to really make these important intangibles part of how our systems work.
In a time of such political and economic turbulence, companies emerging in the AI era that want to last as long as some of their industrial-age predecessors will need to create stronger reasons for employees, partners and customers to associate with them.
Regardless of the swirling political eddies that surround them, the most successful organisations will continue to be those that respect and value their people and customers, For them, world building could be a deeper and more substantial re-imagining of brand identity that makes their values and norms real, and gives them longevity, as part of their operating system.
As Alex McCann writes in his newsletter today, with employee engagement rates at all time lows (21% globally), it would be a mistake to think that AI automation of repetitive process work might solve retention problems by increasing competition for roles.
When you strip away all the tasks that don’t require human creativity or judgment, you’re left needing far fewer people. But those people become exponentially more important.
Someone who genuinely cares about the specific problem your company solves? Who brings creative energy to that problem? Who can work alongside AI tools while providing what machines can’t?
These people will be like gold dust.
As with so many other aspects of the emerging AI era, we should probably be using technology to make the workplace experience more human, not less.