In full

"Agent" has become the word a vendor reaches for when "chatbot" sounds too modest. It is now attached to almost anything that uses a language model, which makes it nearly useless as a description. That is a shame, because underneath the marketing there is a real and useful distinction, and understanding it tells you exactly when an agent is the right tool and when it is a liability dressed up as innovation.

The actual definition

Strip away the noise and an agent is defined by two things. First, it is given a goal rather than a single instruction: "reconcile this account" rather than "what is 2 plus 2". Second, it is allowed to act, observe the result of that action, and choose its next step on the basis of what it saw. It runs in a loop, deciding as it goes, until it judges the goal met or runs out of room to try.

A chatbot answers a question. An agent is handed an objective and a set of tools, and left to work out the steps itself.

That loop is the whole idea. A plain language model takes text in and gives text out, once. Wrap it in a loop, give it a way to take real actions, and let it use the result of one step to decide the next, and you have an agent. Everything else, the personas, the names, the diagrams, is decoration on those two properties: a goal, and a loop with the power to act.

The thing that makes it powerful: tools

An agent is only as capable as the actions it can take, and those actions are usually called tools. A tool is anything the model is allowed to invoke: search a database, send an email, call an API, run a query, book a slot. On its own a language model can only produce words. Give it tools and it can do things, and the words become a means of deciding which thing to do next. A standard called the Model Context Protocol has emerged to make this plumbing consistent, so that tools can be offered to a model in a predictable way rather than wired up by hand each time.

This is also where the risk enters. A model that can only talk can only be wrong on the page. A model that can act can be wrong in your database, in a customer's inbox, or in a payment system. The capability and the danger are the same property, viewed from two sides.

Where agents break

The failure modes are specific, and they all live in the loop.

  • Errors compound. If each step is right ninety-five times in a hundred, a ten-step chain is right only about three-fifths of the time. Reliability does not add across steps, it multiplies, and it multiplies downward.
  • No ground truth, no brakes. An agent that cannot check its work against something real will pursue a confident wrong plan all the way to the end, because nothing in the loop tells it to stop.
  • Goals drift. Over a long run the agent can quietly redefine the task into something easier, and report success against the version it invented rather than the one you set.
  • The blast radius is the tool set. Whatever an agent is allowed to touch, it can touch when it is wrong. Permissions are not a detail; they are the size of the worst case.

So when is an agent the right call?

An agent earns its place when a task genuinely needs several steps, when the right next step depends on what the last one returned, and when there is a reliable way to check the result. Reconciling records against a source of truth, triaging a queue against clear rules, gathering and cross-checking information: these suit the loop, because each step can be verified before the next is taken. A task that is really one step does not need an agent; it needs a single well-formed request, and calling it an agent adds cost and failure surface for nothing.

The honest test is unglamorous. If you cannot describe how the agent checks its own work at each step, you do not have an agent you can trust in production. You have a confident loop with the keys to something that matters, and that is precisely the configuration that goes wrong quietly.

The takeaway

An agent is not a clever chatbot. It is a model given a goal, a set of tools, and permission to act, observe, and decide again in a loop. That loop is what makes it useful for multi-step work, and it is also where it breaks: errors compound, goals drift, and an agent with no way to check reality will finish a wrong plan with full confidence. Use one where the steps genuinely depend on each other and each can be verified. Everywhere else, the word is just marketing.

The Fourths · Engineering for regulated industries