How AI Agents Actually Work — AI for the Rest of Us

Section 3

How an Agent Actually Works

You now know what agents can do. Before you learn to build with them, it helps to understand how they work — not at a technical level, but enough to know what is happening when you give an agent a task and it starts working.

This takes five minutes and will make everything else in this guide click faster.

The brain: a language model

At the center of every AI agent is a large language model — an LLM. This is the engine that reads text and generates text.

An LLM does not "know" things the way you do. It has been trained on enormous amounts of text — books, websites, conversations, code, research — and it has learned patterns in that text. When you give it input, it predicts what should come next based on those patterns. It is extraordinarily good at this, which is why the output often feels intelligent, thoughtful, even creative.

But it is prediction, not understanding. This is why an agent can write a compelling paragraph about a topic and still get a basic fact wrong. It is not lying. It is generating what sounds right based on patterns. Sometimes what sounds right is right. Sometimes it is not.

Knowing this gives you the right posture: trust the capability, verify the output.

The loop: what makes an agent an agent

Here is the key difference between chatting with AI and using an agent.

When you use ChatGPT or Claude as a chatbot, the exchange is: you send a message, the model responds, done. One turn. You do the next step yourself.

An agent is an LLM running in a loop.

Read the instructions and current situation.
Decide what to do next.
Take an action (using a tool).
Observe the result.
Go back to step 1.

The agent keeps going — step after step, action after action — until the task is done or it needs your input.

This is why an agent can do things like "research this property, gather photos, write descriptions, and build a website." It is not doing that in one shot. It is running through the loop dozens or hundreds of times: search for the listing, read the results, click the link, extract the data, search for neighborhood info, read that, pull photos, generate descriptions, assemble the page, review it, fix issues, publish.

Each turn through the loop is simple. The power comes from chaining those turns together autonomously.

Tools: the agent's hands

An LLM by itself can only read and write text. It cannot visit a website, open a spreadsheet, send an email, or create an image. It is a brain in a jar.

Tools are what give the agent hands.

A tool is any specific capability the agent can use during its loop: browse a webpage, read a file, search the web, write a document, query a database, send a message, generate an image. When you hear that an agent can "connect to your email" or "browse the web," what this means is that someone has given the agent a tool for that — a defined action it can choose to invoke when it decides it needs to.

Here is what a single turn of the loop looks like in practice:

The agent reads its instructions: "Find the listing price for 742 Evergreen Terrace."
It decides to use the web search tool.
The search results come back.
It reads the results and decides to use the browse tool to visit a real estate site.
It extracts the price from the page.
It writes the answer.

The agent chose which tools to use and in what order. You did not direct each step. You described the goal. The agent figured out the path.

More tools means more capability. An agent with access to your email, calendar, and project board can do things an agent with only web browsing cannot. When you are choosing agent platforms, you are largely choosing which tools you are willing to give the agent access to.

The instructions: your standing orders

Every time the agent goes through its loop, it starts by reading its instructions. These go by different names — system prompt, custom instructions, agent configuration — but the idea is the same: a set of standing orders the agent follows on every turn.

This is where your workflow descriptions, quality standards, formatting rules, and decision criteria live. When this guide tells you to "describe your process clearly," what you are actually doing is writing these standing orders.

The agent does not remember what you told it three weeks ago (unless you include it in the instructions). It does not infer your preferences from a single example (unless you spell them out). It follows what is written, every time.

This is why clear, specific instructions produce dramatically better results than vague ones. The instructions are the single most important lever you have.

The context window: working memory

An agent has a limited amount of text it can "see" at any given moment. This is called the context window. Think of it as the agent's desk: it can only fit so many papers on the desk at once.

The context window holds everything: your instructions, the conversation so far, the results from tool calls, the content of files it has read. When the desk fills up, the oldest items start falling off the edge.

This is why agents sometimes lose track of details on long tasks. It is not that they forgot — it is that the relevant information scrolled out of their working memory.

And it is why breaking work into steps matters. Smaller, focused steps keep the relevant context on the desk where the agent can see it. A single massive instruction asking for twenty things at once fills the desk with noise.

Why this explains everything

You now know five things: the LLM (the brain), the loop (continuous action), tools (what it can do), instructions (how it behaves), and the context window (what it can hold in mind).

This simple model explains almost everything you will encounter:

Why agents can work autonomously: the loop keeps them going without you prompting every step.
Why they sometimes go in circles: the loop can repeat if the agent is not sure what to do next. Better instructions prevent this.
Why clear instructions matter so much: the instructions steer every single turn of the loop.
Why they lose track on long tasks: the context window filled up and earlier details dropped away.
Why they can be confidently wrong: the LLM is predicting what sounds right, not looking up verified facts.
Why breaking work into steps helps: it keeps the context window focused and each loop iteration clear.
Why connecting more tools makes them more capable: more tools means more actions available in each loop turn.

You do not need to think about this model constantly. But when something goes wrong — or when something goes surprisingly right — these five ideas will tell you why.

Next 4. Why This Has Always Been True

How Agents Work