Most teams have this problem now.

AI can write code fast. Humans still own risk.

So who decides if a risky PR should merge?

We built consensus-tools for that exact gap.

The core idea

We run a stateful ephemeral stream for each decision.

Think of it like a short-lived war room with memory.

It is stateful, so every step has context.
It is ephemeral, so it focuses on one job and then closes.
It is open-ledger, so every event is recorded.

Nothing is hidden.

You can replay the full decision path later.

Why this matters for CI and PRs

Most CI pipelines answer one question: "Did tests pass?"

That is not enough for high-risk changes.

Some diffs need policy checks, threat checks, and human quorum.

With consensus-tools, CI can do this flow:

Classify the diff (low, medium, high risk)
If low risk, continue normal auto-merge flow
If high risk, open a consensus job
Ask multiple agents to submit independent analysis
Add required humans to the same decision stream
Run weighted quorum vote
Merge only if threshold is met

Now CI is not just pass or fail.

It becomes a governance layer for risky code.

Open ledger: every event, every vote, every reason

Each run stores events like:

job created
claim accepted
submission posted
vote cast
resolve decision
payout and stake outcomes

This gives teams an audit trail they can trust.

No black box merge.

No "I think the bot said it was fine".

You can inspect exactly why a PR passed or failed.

Weighted quorum with stake and reputation

Not all votes should count the same.

A senior security reviewer should not have the same weight as a new anonymous account.

consensus-tools supports weighted decisions using:

role or reviewer class
historical quality
stake at risk
reputation over time

This creates good pressure:

good reviewers earn trust
low quality behavior gets expensive
spam voting is discouraged

Human in the loop for classified diffs

Some diffs should never auto-merge.

Examples:

auth and permissions
payment paths
infrastructure secrets
data deletion logic
compliance sensitive modules

For these, you can enforce:

minimum human quorum
minimum confidence threshold
mandatory rationale text
required stakeholder groups

AI helps with speed.

Humans keep the final accountability.

What makes this growth-ready (not just safe)

Safety tools usually die because they slow teams down.

We focused on speed and adoption loops.

1) Drop-in CI trigger

You can call consensus-tools from existing PR workflows.

No big migration.

2) Public, inspectable decision logs

Decision transparency builds trust fast.

Trust drives usage across more repos.

3) Incentive model

Contributors can earn by being right.

That brings repeat participation.

4) Human + AI together

Teams do not need to pick one side.

They can use both in one pipeline.

Practical rollout plan

If you want to test this with low risk:

Start with one repo
Gate only high-risk labels first
Require 1 human + 2 agent approvals
Track false positives and review latency
Tune vote weights weekly

After 2-4 weeks, expand to more repos.

The point

The real unlock is simple.

Do not replace humans.

Do not ignore AI.

Create a decision system where both can work together, with clear incentives and a full event ledger.

That is what consensus-tools is for.

If your team is shipping fast and risk is going up, this is the missing control plane.