Most teams have this problem now.
AI can write code fast. Humans still own risk.
So who decides if a risky PR should merge?
We built consensus-tools for that exact gap.
The core idea
We run a stateful ephemeral stream for each decision.
Think of it like a short-lived war room with memory.
- It is stateful, so every step has context.
- It is ephemeral, so it focuses on one job and then closes.
- It is open-ledger, so every event is recorded.
Nothing is hidden.
You can replay the full decision path later.
Why this matters for CI and PRs
Most CI pipelines answer one question: "Did tests pass?"
That is not enough for high-risk changes.
Some diffs need policy checks, threat checks, and human quorum.
With consensus-tools, CI can do this flow:
- Classify the diff (low, medium, high risk)
- If low risk, continue normal auto-merge flow
- If high risk, open a consensus job
- Ask multiple agents to submit independent analysis
- Add required humans to the same decision stream
- Run weighted quorum vote
- Merge only if threshold is met
Now CI is not just pass or fail.
It becomes a governance layer for risky code.
Open ledger: every event, every vote, every reason
Each run stores events like:
- job created
- claim accepted
- submission posted
- vote cast
- resolve decision
- payout and stake outcomes
This gives teams an audit trail they can trust.
No black box merge.
No "I think the bot said it was fine".
You can inspect exactly why a PR passed or failed.
Weighted quorum with stake and reputation
Not all votes should count the same.
A senior security reviewer should not have the same weight as a new anonymous account.
consensus-tools supports weighted decisions using:
- role or reviewer class
- historical quality
- stake at risk
- reputation over time
This creates good pressure:
- good reviewers earn trust
- low quality behavior gets expensive
- spam voting is discouraged
Human in the loop for classified diffs
Some diffs should never auto-merge.
Examples:
- auth and permissions
- payment paths
- infrastructure secrets
- data deletion logic
- compliance sensitive modules
For these, you can enforce:
- minimum human quorum
- minimum confidence threshold
- mandatory rationale text
- required stakeholder groups
AI helps with speed.
Humans keep the final accountability.
What makes this growth-ready (not just safe)
Safety tools usually die because they slow teams down.
We focused on speed and adoption loops.
1) Drop-in CI trigger
You can call consensus-tools from existing PR workflows.
No big migration.
2) Public, inspectable decision logs
Decision transparency builds trust fast.
Trust drives usage across more repos.
3) Incentive model
Contributors can earn by being right.
That brings repeat participation.
4) Human + AI together
Teams do not need to pick one side.
They can use both in one pipeline.
Practical rollout plan
If you want to test this with low risk:
- Start with one repo
- Gate only high-risk labels first
- Require 1 human + 2 agent approvals
- Track false positives and review latency
- Tune vote weights weekly
After 2-4 weeks, expand to more repos.
The point
The real unlock is simple.
Do not replace humans.
Do not ignore AI.
Create a decision system where both can work together, with clear incentives and a full event ledger.
That is what consensus-tools is for.
If your team is shipping fast and risk is going up, this is the missing control plane.