What it does

A team-level standard for how AI coding agents are used day to day. It covers the boring-but-load-bearing parts: how repos are set up to be agent-friendly, which prompts and skills are reusable, what’s allowed without human review, and how AI-generated PRs are evaluated against the same bar as human ones — not a higher one, not a lower one.

What’s in scope

  • Agent configurationCLAUDE.md and equivalent root-level instructions per repo, including parallel-execution rules, sensitive-file protection, branch hygiene, and commit message conventions.
  • Prompt patterns — reusable prompts for common tasks (triage, refactor planning, code review, test generation), with explicit success criteria so the agent knows when it’s done.
  • Skill libraries — composable skills for things like Conventional Commits, PR review, debugging, and TDD, plus rules for when each one fires.
  • Review gates — what an AI-generated change must satisfy before merge: passing CI, no secrets touched, no unrelated drive-by changes, commit messages that read like a human wrote them.
  • Sensitive surfaces — explicit deny lists (.env, credentials directories, customer data) wired into both the agent config and the harness so they’re enforced, not aspirational.

Why it exists

Without standards, every AI-assisted PR is a coin flip — sometimes excellent, sometimes a sprawling mess that touches twelve files for a one-line fix. The goal is to make AI output predictable enough that reviewers can trust the process, not just inspect every diff line by line. The standards aim at:

  • Atomic, reviewable commits instead of “AI did a thing” mega-PRs.
  • Plain commit messages with no AI tells, em dashes, or marketing prose.
  • Stable repo conventions so the same prompt produces broadly the same shape of change next month.
  • Safety by default — destructive actions, force pushes, and secret access require explicit human approval, not a polite ask.

Tech rationale

  • Claude Code as the primary agent harness — strong tool ergonomics, scriptable skills, and a hook system that lets us enforce policy at the harness layer instead of trusting the model to remember.
  • OpenAI for adjacent tooling — embeddings, evaluation, and lighter inline assists where it’s a better fit.
  • GitHub as the integration point — PRs, reviews, and CI are where the standards bite. Anything below the PR layer is process; the PR is where evidence shows up.

What I focus on

  • Writing the CLAUDE.md / agent rule layer so the right behaviors are the path of least resistance.
  • Designing review gates that catch AI-specific failure modes (over-eager refactors, fabricated APIs, sycophantic agreement) without blocking the wins.
  • Keeping the standards short enough to actually be read.