Shift*Academy

Shift*Academy

AgentOps: The Scaling Layer for Agentic AI in the Enterprise

To move beyond prototypes and achieve practical impact, we need to build the organisational infrastructure for agent development, deployment, and evolution.

Cerys Hearsey's avatar
Cerys Hearsey
Sep 09, 2025
∙ Paid
Share

Why Ad-Hoc Agents Won’t Scale

Developing and scaling Agentic AI architectures in the enterprise is an emerging challenge that leaders are starting to grapple with.

Teams have experimented with LLM-powered assistants, retrieval agents, and simple chain-of-thought tools that automate fragments of knowledge work. But beyond the excitement of early wins, a familiar pattern is starting to emerge: there is no solution for coordination and scaling.

What begins as a clever automation often struggles to gain traction beyond its creator’s desk. The same blockers show up again and again:

  • No shared context or memory layer across agents

  • No logging or rollback mechanism for when things go wrong

  • No way to approve, adapt, or monitor an agent once it’s deployed

  • No structure for versioning, governance, or handover to other teams

Designing an agent is relatively easy. It is much harder to deploy agents that others can trust, extend, govern, and learn from. And in most organisations, the foundations for doing so simply don’t exist yet. McKinsey finds that fewer than 10% of vertical GenAI use cases make it to scale. A striking indicator of the infrastructure gap in enterprise AI.

That’s why we believe AgentOps, the missing infrastructure layer that enables the safe, scalable development and deployment of agents, is becoming an important strategic capability.

A good agent needs more than intelligence. It needs structure: memory, goals, observability, and the ability to operate inside a shared environment.

The parallels to the early days of DevOps are striking. Back then, great code often failed to make it into production because of brittle handoffs, unclear ownership, and infrastructure gaps. Today, agents are hitting the same wall, unable to move from proof-of-concept to production because the organisation lacks the platform thinking needed to support them.

Without a shared framework for how agents are built, evaluated, governed, and improved, adoption will remain patchy and fragile. And the bigger prize (the shift to a composable, augmented organisation) will stay out of reach.

The next generation of intelligent systems will not succeed because they are “smarter.” They will succeed because they are better structured, better governed, and better connected to the organisations they serve.

That’s what AgentOps is really about.

From Agent Experiments to Internal Agent Platforms

As we build more agents, we need the conditions for them to thrive, safely, scalably, and in ways that create value across the organisation, not just inside a team’s local environment.

That means making a shift:

From individual agent experiments to internal platforms that support their design, deployment, governance and evolution.

Just as internal developer platforms (IDPs) helped software teams ship faster and more safely, AgentOps is emerging as the platform layer for intelligent, adaptive systems. It’s how we move beyond clever pilots toward durable, explainable, and governed agent ecosystems. What’s needed isn’t just tooling, but a composable architecture (what McKinsey calls the agentic AI mesh) with interoperability between agents built in from the start. This allows agents to coordinate, share context, and compose into broader workflows without relying on fragile, bespoke integrations. AgentOps provides the foundations to make this possible.

At the heart of this shift is a return to an idea we’ve long championed: the organisation-as-a-platform. In a world of agentic work, this goal becomes much more attainable.

In platform-based organisations, the role of central teams is not to control outcomes, but to provide consistent services - identity, data, guardrails, memory, tools - that enable distributed actors (people, teams, and now agents) to do meaningful, connected work.

Without this platform logic, every team becomes responsible not just for what an agent does, but how it stores memory, retrieves context, monitors behaviour, escalates failure, and adapts to change. That’s a recipe for complexity and risk, as well as a ton of duplicated effort.

With a platform in place, agent developers can focus on behaviour and purpose, while shared services handle the hard stuff:

  • Hosting and execution

  • Observability and rollback

  • Access to knowledge and memory

  • Secure use of tools and data

  • Compliance and permissioning

However, this is not just DevOps for AI — what we need to create is the operating fabric of the programmable enterprise. The platform should serve not just software engineers, but also business teams, centaur service teams, and the domain specialists who will increasingly build or oversee agents tailored to their context.

The future we’re moving toward is not one giant AGI running the business. It’s thousands of small, specialised agents working alongside people, coordinating, communicating, adapting. And they will only succeed if the organisation provides the connective tissue to support them.

What Makes AgentOps Work

Building an AgentOps platform isn’t just about tools or infrastructure. It’s about creating the conditions under which agents, and the teams who use them, can work intelligently, safely, and at scale.

What makes this possible is a set of design principles that shape how agents are built, governed, and evolved within the enterprise. These principles draw from software engineering, platform strategy, knowledge management, and organisational design, but take on new meaning in the context of agentic systems.

Here are six key design principles we believe should guide any AgentOps initiative:

1. Small, Specialised Agents Over Generalist Monoliths

Build small agents with clear goals, and let the system evolve from the interactions.

  • Specialist agents are easier to observe, govern, and improve

  • Use small language models (SLMs) tuned to task, domain, or context

  • Let complexity emerge through coordination, not via bloated prompts

2. Composability

Make agents modular, interoperable, and remixable.

  • Agents should be constructed from composable primitives: tasks, tools, logic, and context modules.

  • Each element, from prompt chains to memory functions, should be swappable and testable in isolation.

  • This enables both reuse and adaptation across teams and domains.

3. Abstraction & Reusability

Don’t hard-code behaviour, encode reusable abstractions.

  • Just as object-oriented programming unlocked software scale, structural abstraction is key to scaling agent design.

  • This includes:

    • Reusable context wrappers

    • Versioned goal-setting templates

    • Domain-specific toolchains

  • These become the building blocks of the organisation-as-a-platform.

4. Observability & Explainability

If you can’t see what the agent is doing, or why, you can’t govern it.

  • AgentOps platforms must include:

    • Trace logs

    • Decision trees

    • Escalation flags

    • Cost, latency, and tool usage metrics

  • This enables humans to intervene, improve, and trust agentic systems.

5. Governance by Design

Safety and control shouldn’t be retrofitted — they should be embedded.

  • Every agent should run inside clear boundaries:

    • Who can use it?

    • What data can it access?

    • What actions can it trigger?

    • What happens when something goes wrong?

  • Governance shouldn’t block innovation, it should enable confidence and scale.

6. Feedback-Driven Evolution

Agents should get better with use, not worse.

  • Every interaction with an agent is a learning opportunity:

    • Was the task completed successfully?

    • Did the user need to intervene or override?

    • What new exceptions emerged?

  • AgentOps should include a feedback-to-improvement loop, allowing agents to be retrained, adjusted, or re-contextualised based on real-world behaviour.

Key Capabilities

At its core, AgentOps should function as a platform capability - a consistent internal environment that provides the building blocks for agent creation, coordination, and evolution. This includes both technical components and organisational practices.

We see five foundational capabilities:

1. Design & Prototyping Support

Goal: Enable teams to safely explore agent behaviours before going live.

  • Pre-built agent templates and workflows

  • Sandboxed environments for simulation and testing

  • Prompt versioning, A/B testing, and logic chaining tools

  • Clear guidelines for behaviour definition and edge cases

2. Execution & Runtime Infrastructure

Goal: Ensure agents run securely, observably, and performantly.

  • Standardised execution environments

  • Secure API access to enterprise tools and systems

  • Monitoring of latency, cost, and usage

  • Version control and rollback mechanisms

For example, Salesforce’s Atlas engine orchestrates across LLM providers with seamless failover and performance optimisation, critical features for reliability at scale.

3. Context & Memory Management

Goal: Give agents access to consistent, governed context.

  • Connection to internal knowledge bases and graphs

  • Short- and long-term memory storage

  • Shared context libraries (e.g. customer, product, policy)

  • Memory versioning and explainability support

Emerging standards like the Model Context Protocol (MCP) help agents understand the shared environment they’re operating in, ensuring consistent interpretation of tasks, tools, and data across agent architectures

4. Governance, Oversight & Guardrails

Goal: Protect the organisation and its people while enabling autonomy. This is where AgentOps meets enterprise trust.

  • Role-based access and permissions

  • Audit trails, observability, and anomaly detection

  • Human-in-the-loop escalation and override

  • Alignment with internal policies, risk frameworks, and external regulations

5. Feedback, Learning & Continuous Improvement

Goal: Enable agents to get better over time, and learn from use.

  • Structured feedback loops from human users

  • Logging of agent decisions, errors, and interventions

  • Metrics for agent performance, coverage, and impact

  • Mechanisms for retraining, fine-tuning, or updating behaviour

Together, these capabilities form the platform layer for agentic systems — just as continuous integration, observability, and runtime orchestration do for software systems.

As Antonio Gulli’s Agentic Design Patterns demonstrate, even simple capabilities like reflection, memory, and planning require structure. AgentOps is what makes that structure reusable, observable, and governable inside real organisations.

Read on for thoughts about how to begin creating the AgentOps platform layer.

Keep reading with a 7-day free trial

Subscribe to Shift*Academy to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Shiftbase Ltd
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture