One agent is useful.
Three agents arguing with each other is more useful.
That is the basic idea behind NAX, the Netlify Agent Executor. It is a workflow layer I built on top of Netlify Agent Runners and Netlify AI Gateway so I can run multi-step, multi-model agentic workflows without turning every task into a bespoke prompt ritual.
I already wrote about using Netlify Agent Runners to launch Codex, Claude, and Gemini from GitHub issues. That flow is the remote control.
NAX is the playbook.
Instead of asking one model to do one job, NAX lets me define a workflow like:
That pattern matters because a lot of agent work fails at the same boring point: the first answer sounds plausible, but nobody cross-examines it.
I think of NAX as a council of models.
Not because models are magic. Because they are different.
Codex, Claude, and Gemini have different strengths, different blind spots, and different failure modes. When they independently inspect the same problem, the overlap is useful. When one model finds something the others missed, the disagreement is useful too.
The workflow is not "vote and trust the majority." That is too shallow.
The useful loop is:
flowchart TD Input[Task, repo, PR, or issue] --> Fanout[Run selected models independently] Fanout --> Claude[Claude pass] Fanout --> Gemini[Gemini pass] Fanout --> Codex[Codex pass] Claude --> CrossCheck[Cross-review findings] Gemini --> CrossCheck Codex --> CrossCheck CrossCheck --> Consensus[Consensus summary] Consensus --> Artifact[Saved artifacts in .nax] Consensus --> Human[Human review or approval] Human --> PR[Optional PR or follow-up run]
The point is to preserve independent judgment before the models influence each other.
If you show every model the same prior answer too early, you get agreement theater. They converge around the first confident response. NAX keeps the first pass separate, then uses later steps for cross-review.
NAX workflows are config files. A workflow defines the stages, the allowed agents, and the prompts for each step.
A code review workflow might look roughly like this:
name: council-code-review
models:
- codex
- claude
- gemini
steps:
- id: independent-review
run: all
prompt: |
Review the current diff for bugs, security issues, regressions,
missing tests, and unnecessary complexity.
Return concrete findings only. Include file paths and the evidence
that supports each finding.
- id: cross-check
run: all
prompt: |
Read the other model findings from the previous step.
Mark each finding as:
- confirmed
- plausible but unproven
- incorrect
- duplicate
Add any missed findings, but include evidence.
- id: consensus
run: codex
prompt: |
Produce the final review summary.
Include only:
- confirmed issues
- high-signal disagreements
- recommended fixes
- what a human should inspect before mergingThat is the boring version, and the boring version is powerful.
You can swap the task from code review to security review, feature planning, docs review, bug triage, UI critique, or a bake-off where the models each design an implementation and the final step compares tradeoffs.
NAX is useful locally, but it gets more interesting when the same workflow can run in the place where the work already lands.
Netlify Agent Runners provide the remote execution environment. The agent has project context and can do real work against the repo without me keeping a terminal open.
Netlify AI Gateway handles model access through the platform instead of scattering separate provider credentials across scripts, laptops, CI jobs, and random automation experiments.
That combination gives NAX three execution modes:
The important bit is that the workflow definition stays portable. The same review playbook can run from my machine, from a Netlify Agent Runner, or from a GitHub comment.
The CLI is the fastest path:
nax run council-code-reviewOr with a narrower model set when I want to save tokens:
nax run council-code-review --models codex,claudeModel toggling is not a cute feature. It matters.
Sometimes I want the full council. Sometimes I only need one implementer and one reviewer. Sometimes I am running a cheap triage pass and do not want to spend premium-model tokens until the workflow proves there is something worth investigating.
A useful orchestration tool has to treat model choice as a runtime decision, not a hardcoded personality test.
The GitHub integration is where this starts to feel practical.
You can wire NAX behind a GitHub Actions workflow and trigger it from comments:
@netlify nax run security-reviewOr:
@netlify nax run bug-triage --models codex,geminiThe flow is straightforward:
The last part is important: human-in-the-loop does not mean "the human does all the work." It means the system can do the expensive exploration, synthesis, and first-pass implementation while a human still owns the merge decision.
The CLI is for running workflows.
The dashboard is for understanding them.
nax dashboardThe dashboard exists because agent workflows are otherwise too opaque. When a run is bad, I do not want a single final blob that says "no issues found." I want to inspect the prompts, the intermediate outputs, the disagreements, and the step where the workflow went sideways.
The dashboard gives me:
That last one matters when building new workflows. A dry run can catch obvious mistakes: missing step inputs, prompts that reference artifacts that do not exist yet, or a consensus step that runs before the independent reviews complete.
NAX saves workflow output into .nax.
That directory is not just a cache. It is the audit trail.
For each run, I want to keep:
Agent workflows become much easier to trust when the intermediate reasoning is inspectable. If the final summary says "all three models agreed," I want the evidence. If one model disagreed, I want to know why.
The artifact trail also makes workflows improvable. You can compare runs, tune prompts, remove useless steps, and identify which model combinations actually help for a given task.
NAX includes pre-built skills for common workflows:
The meta-skill is the one I care about most: NAX can teach you how to build a NAX workflow.
That sounds recursive because it is.
The system already knows the workflow schema, available steps, model routing, artifact shape, and common failure modes. So the first version of a new workflow should not start from a blank file. It should start as a conversation:
Build me a NAX workflow that reviews a Next.js PR for:
- server/client component boundary mistakes
- hydration risks
- accessibility regressions
- missing loading and empty states
- avoidable bundle-size increases
Use Codex and Claude for the first pass, Gemini for cross-review,
and Codex for the final patch plan.That gives me a draft workflow I can run, inspect, and tighten.
The obvious use case is code review.
The better use case is repeatable review.
One-off prompts are fine until you find yourself pasting the same instructions into three tools every week. NAX turns the useful prompt into a reusable workflow.
The workflows I keep reaching for:
The common thread is not "AI does everything."
The common thread is structured disagreement before action.
The first wave of coding agents made it easy to ask one model to do one thing.
The next useful layer is orchestration:
That is what NAX is for.
Not a chatbot. Not a single prompt. Not a pile of shell scripts with API keys.
A repeatable way to run agentic workflows where Claude, Gemini, and Codex can do independent work, challenge each other, produce an inspectable artifact trail, and hand the final decision back to a person.
That is the shape I want for serious AI-assisted development: fast agents, explicit workflows, preserved evidence, and a human still holding the merge button.