Original: Components of A Coding Agent https://magazine.sebastianraschka.com/p/components-of-a-coding-agent
Author: Sebastian Raschka, PhD
How Coding Agents in Practice Leverage Tools, Memory, and Repository Context to Unlock Greater Power from LLMs
In this article, I want to talk about the overall design of Coding Agents and Agent harnesses. What exactly are they? How do they work? How do the various components collaborate in practical applications? Since readers often ask me about agents (many have read my books “Build a Large Language Model (From Scratch)” and “Build a Reasoning Model (From Scratch)”), I decided to write this reference guide to share with everyone in the future.
Broadly speaking, the reason agents are so hot right now is due to recent advances in practical Large Language Models (LLMs). It’s not just about the models themselves getting stronger, but more about how we use them. In many real-world deployment scenarios, the peripheral systems surrounding the model—such as tool calling, context management, and memory functions—play a role no less significant than the model itself. This explains why systems like Claude Code or Codex feel much more powerful than chatting directly with their underlying models in a standard chat interface.
Next, I will outline the six core modules of a coding agent.
Claude Code, Codex CLI, and Other Coding Agents
You might already be familiar with Claude Code or Codex CLI (CLI stands for Command Line Interface, a tool that allows users to operate a computer by typing code commands in a terminal). To set a simple baseline: they are essentially agentified programming tools. They wrap the LLM within an application layer (what we call an Agent harness), making them more convenient and better performing when handling programming tasks.
WECHATIMGPH_1
Figure 1: Claude Code CLI, Codex CLI, and my own minimalist coding agent.
Coding agents are specifically built for software development. Here, the main focus isn’t just which model you choose, but more on the peripheral support systems—including repository context (Repo Context), tool design, prompt cache stability, memory capabilities, and the coherence to handle long, continuous work sessions.
Understanding this distinction is crucial. Because when we talk about “the programming capabilities of LLMs,” people often conflate “the model itself,” “the model’s reasoning behavior,” and “agent products.” So before diving into the details of coding agents, let me take a moment to briefly clarify the differences between the broad concepts of LLMs, reasoning models, and agents.
Clarifying the Relationship: LLMs, Reasoning Models, and Agents
Large Language Models (LLMs) are the core; essentially, they are models that continuously predict the “next token.” Reasoning Models are also LLMs, but they have undergone special training or prompt guidance, investing more computational power in generating answers (i.e., increasing inference-time compute, known as test-time compute) for intermediate step reasoning, self-verification, or searching for the best result among multiple candidate answers.
Agents are a layer on top of the model; you can think of them as a “control loop” operating around the model. Typically, it works like this: you give a goal, and the agent layer (or Harness) makes decisions for the model: what to check next? Which tool to call? How to update the current state? When is the task complete and it can stop?
To use an imperfect but intuitive analogy: an LLM is like a regular engine; a reasoning model is a heavily modified, more powerful engine (and, of course, more expensive); and the Agent harness is the entire vehicle system that helps us better harness that engine. Although we can also use LLMs and reasoning models directly in a chat interface or Python code, I hope this metaphor clarifies their relationship.
WECHATIMGPH_2
Figure 2: The relationship between regular LLMs, reasoning LLMs (or reasoning models), and LLMs wrapped in an Agent harness.
In other words, an agent is a system that continuously loops, calling the model within a specific environment.
To summarize:
-
• Large Language Model (LLM): The most primitive foundational model. -
• Reasoning model: An optimized LLM, specifically designed to output intermediate reasoning steps (what we often call Chain of Thought) and enhance self-verification capabilities. -
• Agent: A loop system comprising “model + tools + memory + environmental feedback.” -
• Agent harness: The software scaffolding built around the agent, responsible for managing context, tool calls, prompts, state, and control flow. -
• Coding harness: A “specialized version” of the Agent harness, tailored specifically for software engineering, responsible for managing code context, development tools, code execution, and iterative feedback.
As listed above, when discussing agents and programming tools, we often encounter these two terms: Agent harness and Coding harness. A Coding harness is the software scaffolding that helps models efficiently write and modify code. The Agent harness has a broader scope, not limited to programming (e.g., OpenClaw). Codex and Claude Code can both be considered Coding harnesses.
In summary: better LLMs provide a stronger foundation for reasoning models (which still require additional training), and excellent harnesses squeeze the potential of reasoning models to the limit.
Of course, LLMs and reasoning models can solve some programming problems on their own (without any harness). But real coding is more than just “predicting the next token.” A significant part of development effort is spent browsing repositories, searching documentation, finding functions, applying code diffs, running tests, troubleshooting errors, and mentally connecting all this information. (Programmer friends surely understand—this is an extremely mentally taxing task, which is why we hate being interrupted when focused on coding 🙂).
WECHATIMGPH_3
Figure 3: A Coding harness integrates three layers: the model family, the agent loop, and runtime support. The model provides the “engine,” the agent loop drives the iterative problem-solving process, and runtime support provides the necessary foundational plumbing. Within this loop, “Observe” collects intelligence from the environment, “Review” analyzes this intelligence, “Choose” decides the next step, and “Execute” carries out the action.
The key point here is: an excellent Coding harness makes a model (whether a reasoning model or not) feel infinitely more powerful than in a bare-bones chatbox because it handles all the dirty work like context management for you.
Coding Harness: The Model’s Super Add-on
As just mentioned, when we say “Harness,” we usually refer to that layer of software wrapped around the model. It’s like a full-service butler, responsible for stitching prompts, providing tools, tracking file status, modifying code, executing commands, managing permissions, caching unchanging prompt prefixes, and storing memory, among other things.
Today, when you use a large model, the vast majority of your experience is determined by this Harness layer. This is vastly different from directly prompting the model or using an ordinary web chat interface (which is more like “uploading a file and having an awkward chat with it”).
In my view, the current vanilla base models (like GPT-5.4, Opus 4.6, and GLM-5 base versions) are actually very close in capability. At this stage, the decisive factor that truly creates a gap and makes one model seem much better to use is often this peripheral Harness.
Here’s a bold guess: if we put the latest, strongest open-source model (like GLM-5) into an equally excellent Harness, its performance would likely be on par with GPT-5.4 in Codex or Claude Opus