Should a coding agent update tests before code?

For a bug fix, a reproducing test is ideal. For a broken existing test, first confirm whether the product behavior or the test expectation is wrong.

Can the agent merge its own PR?

Keep merge rights with humans or CI policy. The agent can prepare the patch, checks, and PR summary.

What makes a good test-repair task?

A command that fails, a small expected behavior, an owned file set, and a clear acceptance check.

Test Repair and Pull Request Workflow for Coding Agents

Last reviewed: 2026-05-14.

Who this is for

This workflow is for teams that want a coding agent to repair tests, fix a small bug, or prepare a pull request without handing the agent merge authority. It works best when there is a concrete verifier: a failing test, a build command, a smoke check, or a reproducible error.

“Fix the tests” is not a verifier. “Make python3 scripts/check_site_units.py pass without changing runtime behavior” is a verifier. The agent can reason against the second target because it has a command, a scope, and a success condition.

Key takeaways

Start with the failing command, not a vague repair request.
Require diagnosis before edits.
Keep the patch scoped to the behavior under test.
Run the same check after the fix.
Run adjacent checks when the touched code is shared.
Make the PR summary name files changed, checks run, risks, and follow-up work.
Do not let the agent merge, deploy, buy, delete, or approve production changes by default.

The repair loop

The basic loop is short:

Reproduce or inspect the failing check.
Explain the likely cause in one paragraph.
Patch only the owned file set.
Run the same check again.
Run adjacent checks if the patch touched shared behavior.
Prepare a PR-ready summary.

That loop is deliberately conservative. It keeps the agent anchored to observable behavior and makes the final review easier for a human.

Task shape that works

Weak task	Better task
Fix CI.	Inspect the failing GitHub Actions job, identify the first failing command, and patch only the smallest file set needed.
Make tests pass.	Make `npm test -- auth.spec.ts` pass; do not rewrite unrelated tests.
Clean up the repo.	Remove the unused import introduced by this patch and run `git diff --check`.
Improve content quality.	Add a static content gate that fails on broken sources, thin public posts, missing internal links, and missing UTM metadata.

The better tasks all have a target and a boundary. A senior reviewer should be able to tell whether the agent succeeded without reading a long narrative.

Example prompt

Run the failing check and summarize the first failure.
Then make the smallest patch that fixes that failure.
Do not change unrelated formatting.
After the patch, rerun the same check and report the output summary.

For a repository with strict agent rules, add:

Read AGENTS.md first.
Protect unrelated user changes.
Use apply_patch for manual edits.
Do not stage or commit until checks pass.

The exact command depends on the project, but the structure stays stable: reproduce, diagnose, patch, verify, summarize.

Diagnosis before edits

Ask the agent for a short diagnosis before it writes. The diagnosis should name:

the failing command or file;
the expected behavior;
the actual behavior;
the likely cause;
the intended patch scope.

If the diagnosis is hand-wavy, the edit will probably be hand-wavy too. Stop and narrow the task before the agent changes files.

Patch scope rules

Use these boundaries unless the maintainer says otherwise:

Rule	Rationale
Touch only files connected to the failure	Prevents opportunistic refactors.
Preserve existing style	Reduces review noise.
Remove only unused code introduced by the patch	Avoids deleting unrelated stale code.
Keep generated files out unless the task requires them	Prevents build artifacts from hiding behavior changes.
Report skipped checks	Makes residual risk visible.

The agent can suggest unrelated cleanup in the final note. It should not do it during a repair task.

PR-ready summary

A useful PR summary is compact and evidence-backed:

Changed:
- Added deterministic validation for static canary content before build.
- Replaced two thin public briefs with aliases to stronger pillar pages.

Checks:
- python3 scripts/check_static_content_quality.py --site-id coding-agent-guide
- python3 scripts/build_static_site.py --site-id coding-agent-guide
- python3 scripts/check_site_units.py

Risk:
- Plausible and Search Console remain blocked on account access; status docs record that blocker.

This format is better than “fixed issue” because it tells the reviewer what moved and how it was verified.

Human gates

Keep these actions outside the agent’s default authority:

merging its own PR;
pushing to protected branches;
deploying production;
changing DNS, registrar, or billing settings;
deleting live content;
submitting forms on external sites;
approving analytics, Search Console, or advertising integrations that need account access.

The agent can prepare the patch and document the next step. Humans or CI policy should own externally visible approval.

Failure modes to catch

Failure mode	Symptom	Gate
Test expectation is stale	Agent changes product code to satisfy a bad test	Require diagnosis and product-behavior confirmation.
Fix is too broad	Diff touches unrelated modules	Enforce file ownership and review `git diff --stat`.
Check was not rerun	Summary says “should pass”	Require exact commands and results.
CI-only failure remains	Local test passes but CI matrix fails	Inspect Actions logs and environment differences.
Content gate is bypassed	Static canary builds without source/quality checks	Wire quality checks into build commands.

GitHub’s pull request and Actions documentation cover the review and CI surfaces, but the repository still has to define what counts as an acceptable agent handoff.

Reader next step

Use this workflow after the setup and model-routing guide and the repository context guide. Then record the final check results in the PR or execution log.

The core CTA target remains blocked until approved. Until then, the safe reader action is internal: review the editorial note and keep production writes behind human approval.

Sources checked

GitHub pull request documentation, checked 2026-05-14, for review and PR workflow context.
GitHub Actions documentation, checked 2026-05-14, for CI verification context.
Cursor Agent documentation, checked 2026-05-14, for IDE agent workflow context.
Claude Code documentation, checked 2026-05-14, for terminal coding-agent workflow context.
OpenCode documentation, checked 2026-05-14, for terminal agent operation context.