• Toggle Theme
  • Search Site
  • View as Mobile

One agent is useful.

Three agents arguing with each other is more useful.

That is the basic idea behind NAX, the Netlify Agent Executor. It is a workflow layer I built on top of Netlify Agent Runners and Netlify AI Gateway so I can run multi-step, multi-model agentic workflows without turning every task into a bespoke prompt ritual.

I already wrote about using Netlify Agent Runners to launch Codex, Claude, and Gemini from GitHub issues. That flow is the remote control.

NAX is the playbook.

Instead of asking one model to do one job, NAX lets me define a workflow like:

  1. Ask Claude, Gemini, and Codex to independently review the same codebase change.
  2. Feed each model the other models' findings.
  3. Ask them to challenge, confirm, or reject the findings.
  4. Summarize only the issues with enough agreement or enough evidence.
  5. Optionally let an agent open a PR, while a human keeps final review authority.

That pattern matters because a lot of agent work fails at the same boring point: the first answer sounds plausible, but nobody cross-examines it.

The council pattern

I think of NAX as a council of models.

Not because models are magic. Because they are different.

Codex, Claude, and Gemini have different strengths, different blind spots, and different failure modes. When they independently inspect the same problem, the overlap is useful. When one model finds something the others missed, the disagreement is useful too.

The workflow is not "vote and trust the majority." That is too shallow.

The useful loop is:

flowchart TD
  Input[Task, repo, PR, or issue] --> Fanout[Run selected models independently]
  Fanout --> Claude[Claude pass]
  Fanout --> Gemini[Gemini pass]
  Fanout --> Codex[Codex pass]
  Claude --> CrossCheck[Cross-review findings]
  Gemini --> CrossCheck
  Codex --> CrossCheck
  CrossCheck --> Consensus[Consensus summary]
  Consensus --> Artifact[Saved artifacts in .nax]
  Consensus --> Human[Human review or approval]
  Human --> PR[Optional PR or follow-up run]

The point is to preserve independent judgment before the models influence each other.

If you show every model the same prior answer too early, you get agreement theater. They converge around the first confident response. NAX keeps the first pass separate, then uses later steps for cross-review.

What NAX runs

NAX workflows are config files. A workflow defines the stages, the allowed agents, and the prompts for each step.

A code review workflow might look roughly like this:

name: council-code-review
models:
  - codex
  - claude
  - gemini

steps:
  - id: independent-review
    run: all
    prompt: |
      Review the current diff for bugs, security issues, regressions,
      missing tests, and unnecessary complexity.

      Return concrete findings only. Include file paths and the evidence
      that supports each finding.

  - id: cross-check
    run: all
    prompt: |
      Read the other model findings from the previous step.

      Mark each finding as:
      - confirmed
      - plausible but unproven
      - incorrect
      - duplicate

      Add any missed findings, but include evidence.

  - id: consensus
    run: codex
    prompt: |
      Produce the final review summary.

      Include only:
      - confirmed issues
      - high-signal disagreements
      - recommended fixes
      - what a human should inspect before merging

That is the boring version, and the boring version is powerful.

You can swap the task from code review to security review, feature planning, docs review, bug triage, UI critique, or a bake-off where the models each design an implementation and the final step compares tradeoffs.

Why this belongs on Netlify

NAX is useful locally, but it gets more interesting when the same workflow can run in the place where the work already lands.

Netlify Agent Runners provide the remote execution environment. The agent has project context and can do real work against the repo without me keeping a terminal open.

Netlify AI Gateway handles model access through the platform instead of scattering separate provider credentials across scripts, laptops, CI jobs, and random automation experiments.

That combination gives NAX three execution modes:

  • Local, when I want to iterate quickly from the CLI.
  • Netlify cloud, when I want the workflow to run close to the project and deployment context.
  • CI/CD, when GitHub Actions or another system should trigger the workflow from a PR, issue, schedule, or release process.

The important bit is that the workflow definition stays portable. The same review playbook can run from my machine, from a Netlify Agent Runner, or from a GitHub comment.

CLI first

The CLI is the fastest path:

nax run council-code-review

Or with a narrower model set when I want to save tokens:

nax run council-code-review --models codex,claude

Model toggling is not a cute feature. It matters.

Sometimes I want the full council. Sometimes I only need one implementer and one reviewer. Sometimes I am running a cheap triage pass and do not want to spend premium-model tokens until the workflow proves there is something worth investigating.

A useful orchestration tool has to treat model choice as a runtime decision, not a hardcoded personality test.

GitHub as the trigger surface

The GitHub integration is where this starts to feel practical.

You can wire NAX behind a GitHub Actions workflow and trigger it from comments:

@netlify nax run security-review

Or:

@netlify nax run bug-triage --models codex,gemini

The flow is straightforward:

  1. A developer comments on a PR or issue.
  2. GitHub Actions starts the NAX workflow.
  3. NAX runs the configured steps through Netlify Agent Runners and AI Gateway.
  4. The artifacts are saved.
  5. The final summary is posted back to GitHub.
  6. If the workflow is allowed to write code, an agent can open a PR for human review.

The last part is important: human-in-the-loop does not mean "the human does all the work." It means the system can do the expensive exploration, synthesis, and first-pass implementation while a human still owns the merge decision.

The dashboard

The CLI is for running workflows.

The dashboard is for understanding them.

nax dashboard

The dashboard exists because agent workflows are otherwise too opaque. When a run is bad, I do not want a single final blob that says "no issues found." I want to inspect the prompts, the intermediate outputs, the disagreements, and the step where the workflow went sideways.

The dashboard gives me:

  • Past runs.
  • Step-by-step prompts.
  • Per-model outputs.
  • Cross-review notes.
  • Editable prompts for the next run.
  • A dry-run simulator for testing workflow shape before spending tokens.

That last one matters when building new workflows. A dry run can catch obvious mistakes: missing step inputs, prompts that reference artifacts that do not exist yet, or a consensus step that runs before the independent reviews complete.

Artifacts are the product

NAX saves workflow output into .nax.

That directory is not just a cache. It is the audit trail.

For each run, I want to keep:

  • The workflow config.
  • The normalized task input.
  • Every model prompt.
  • Every model response.
  • Cross-review results.
  • Final summaries.
  • Generated patches or PR metadata.
  • Token and cost metadata when available.

Agent workflows become much easier to trust when the intermediate reasoning is inspectable. If the final summary says "all three models agreed," I want the evidence. If one model disagreed, I want to know why.

The artifact trail also makes workflows improvable. You can compare runs, tune prompts, remove useless steps, and identify which model combinations actually help for a given task.

Built-in skills

NAX includes pre-built skills for common workflows:

  • Code review.
  • Security review.
  • Feature suggestions from docs.
  • Bug triage.
  • PR summary.
  • Model bake-offs.
  • Consensus synthesis.

The meta-skill is the one I care about most: NAX can teach you how to build a NAX workflow.

That sounds recursive because it is.

The system already knows the workflow schema, available steps, model routing, artifact shape, and common failure modes. So the first version of a new workflow should not start from a blank file. It should start as a conversation:

Build me a NAX workflow that reviews a Next.js PR for:
- server/client component boundary mistakes
- hydration risks
- accessibility regressions
- missing loading and empty states
- avoidable bundle-size increases

Use Codex and Claude for the first pass, Gemini for cross-review,
and Codex for the final patch plan.

That gives me a draft workflow I can run, inspect, and tighten.

Where NAX is useful

The obvious use case is code review.

The better use case is repeatable review.

One-off prompts are fine until you find yourself pasting the same instructions into three tools every week. NAX turns the useful prompt into a reusable workflow.

The workflows I keep reaching for:

  • Code review: independently inspect a diff, cross-check findings, and post the final review.
  • Security review: focus on auth, billing, webhooks, data exposure, dependency risk, and deployment configuration.
  • Bug triage: classify reports, reproduce likely causes, rank severity, and suggest the next diagnostic step.
  • Feature planning: ask multiple models what should be built next from docs, analytics notes, customer pain, and repo constraints.
  • Bake-offs: ask models to produce competing implementation plans, then compare complexity, risk, and expected quality.
  • PR repair: feed the council a failing PR and ask for a fix plan before letting one agent implement.

The common thread is not "AI does everything."

The common thread is structured disagreement before action.

The real lesson

The first wave of coding agents made it easy to ask one model to do one thing.

The next useful layer is orchestration:

  • Which model should run?
  • In what order?
  • With what context?
  • What artifacts should be passed forward?
  • What should be checked by another model?
  • What requires human approval?
  • What should be saved so the workflow can improve?

That is what NAX is for.

Not a chatbot. Not a single prompt. Not a pile of shell scripts with API keys.

A repeatable way to run agentic workflows where Claude, Gemini, and Codex can do independent work, challenge each other, produce an inspectable artifact trail, and hand the final decision back to a person.

That is the shape I want for serious AI-assisted development: fast agents, explicit workflows, preserved evidence, and a human still holding the merge button.