← Back to blog

Consensus Tools: Stateful Ephemeral Streams for CI, PR Quorum, and Human + AI Review

A simple way to align multiple agents and humans on risky pull requests with open event logs, weighted voting, and stake-based incentives.

2/15/2026

Most teams have this problem now.

AI can write code fast. Humans still own risk.

So who decides if a risky PR should merge?

We built consensus-tools for that exact gap.

The core idea

We run a stateful ephemeral stream for each decision.

Think of it like a short-lived war room with memory.

  • It is stateful, so every step has context.
  • It is ephemeral, so it focuses on one job and then closes.
  • It is open-ledger, so every event is recorded.

Nothing is hidden.

You can replay the full decision path later.

Why this matters for CI and PRs

Most CI pipelines answer one question: "Did tests pass?"

That is not enough for high-risk changes.

Some diffs need policy checks, threat checks, and human quorum.

With consensus-tools, CI can do this flow:

  1. Classify the diff (low, medium, high risk)
  2. If low risk, continue normal auto-merge flow
  3. If high risk, open a consensus job
  4. Ask multiple agents to submit independent analysis
  5. Add required humans to the same decision stream
  6. Run weighted quorum vote
  7. Merge only if threshold is met

Now CI is not just pass or fail.

It becomes a governance layer for risky code.

Open ledger: every event, every vote, every reason

Each run stores events like:

  • job created
  • claim accepted
  • submission posted
  • vote cast
  • resolve decision
  • payout and stake outcomes

This gives teams an audit trail they can trust.

No black box merge.

No "I think the bot said it was fine".

You can inspect exactly why a PR passed or failed.

Weighted quorum with stake and reputation

Not all votes should count the same.

A senior security reviewer should not have the same weight as a new anonymous account.

consensus-tools supports weighted decisions using:

  • role or reviewer class
  • historical quality
  • stake at risk
  • reputation over time

This creates good pressure:

  • good reviewers earn trust
  • low quality behavior gets expensive
  • spam voting is discouraged

Human in the loop for classified diffs

Some diffs should never auto-merge.

Examples:

  • auth and permissions
  • payment paths
  • infrastructure secrets
  • data deletion logic
  • compliance sensitive modules

For these, you can enforce:

  • minimum human quorum
  • minimum confidence threshold
  • mandatory rationale text
  • required stakeholder groups

AI helps with speed.

Humans keep the final accountability.

What makes this growth-ready (not just safe)

Safety tools usually die because they slow teams down.

We focused on speed and adoption loops.

1) Drop-in CI trigger

You can call consensus-tools from existing PR workflows.

No big migration.

2) Public, inspectable decision logs

Decision transparency builds trust fast.

Trust drives usage across more repos.

3) Incentive model

Contributors can earn by being right.

That brings repeat participation.

4) Human + AI together

Teams do not need to pick one side.

They can use both in one pipeline.

Practical rollout plan

If you want to test this with low risk:

  • Start with one repo
  • Gate only high-risk labels first
  • Require 1 human + 2 agent approvals
  • Track false positives and review latency
  • Tune vote weights weekly

After 2-4 weeks, expand to more repos.

The point

The real unlock is simple.

Do not replace humans.

Do not ignore AI.

Create a decision system where both can work together, with clear incentives and a full event ledger.

That is what consensus-tools is for.

If your team is shipping fast and risk is going up, this is the missing control plane.