Open framework · Community-built · Free

How risky is your AI system? Get one defensible number.

AITBM — the AI Trust Benchmarking and Maturity Framework — replaces subjective "low / medium / high" judgments with a repeatable method: granular scoring rubrics, a three-layer architecture, and grounded math. You get a single 0–10 risk score, with the dimensional detail behind it still visible.

See how it works Try the calculator New to the terms? Glossary →

0–10: single risk score (ERS)
3: assessment layers
21: checks across 5 areas
0: assessor guesswork

AITBM IN 60 SECONDS

The short version

What it is

A repeatable way to measure how risky an AI system is — built so two assessors reach the same score.

What you get

One 0–10 Effective Risk Score, plus the five-dimension profile that explains it.

Who it's for

Security assessors, compliance & risk teams, and engineers shipping LLM and agentic systems.

What it costs

Nothing. Open, community-built, and licensed CC BY 4.0 — no paid tooling required.

HOW IT WORKS

Three lenses, combined into one score

AITBM looks at a system through three independent lenses, then composes them into a single ERS. Each lens answers a different question — and a strong result in one cannot hide a weak result in another.

LAYER 1 · THE SYSTEM

IVP

Intrinsic Vulnerability Profile

Asks: how strong is the system itself?

21 checks across 5 security areas — Robustness, Fairness, Transparency, Privacy, Containment — each scored 0–4 on a fixed rubric.

LAYER 2 · THE DEPLOYMENT

ORP

Operational Risk Posture

Asks: how risky is where and how it's used?

Autonomy, exposure, blast radius, and how hard it is to fix — combined into a Compound Risk Multiplier (CRM).

LAYER 3 · THE EVIDENCE

ACI

Assurance Confidence Index

Asks: how much can we trust what we know?

Discounts the score as evidence ages or thins out — and flags when it's time to re-assess.

ERS

Effective Risk Score

ERS = α + (1 − α) × f(IVP, ORP, ACI)      where α = 0.15

One number, 0–10. The residual-risk floor (α = 0.15) is deliberate: even perfect controls leave an irreducible 15% of risk. AI risk can never be zeroed out — so AITBM never pretends it can.

See the full scoring model Score a system now

SEVERITY SCALE

0 Low5 Moderate10 Critical

A higher ERS means higher residual risk — and triggers a deeper assessment tier.

WHY IT EXISTS

What today's frameworks miss

CVSS adaptations, OWASP AIVSS, and the OWASP Top 10 for LLMs each leave the same four gaps. AITBM was designed to close them.

Guesswork

Vague severity language drives 15–30% disagreement between assessors on the same system.

AITBM → fixed 0–4 rubrics

Context blindness

A medical-diagnosis model and a recipe chatbot with identical flaws score identically.

AITBM → ORP deployment layer

Scope blindness

Static-model assumptions miss agentic and MCP threats — tool poisoning, rogue agents, identity spoofing.

AITBM → Containment axis + Cn-5

Zero-risk fallacy

Frameworks imply enough controls eliminate risk. Emergent behavior makes that impossible.

AITBM → α = 0.15 risk floor

See the full 12-gap analysis and head-to-head comparison →

START HERE

Pick the path that fits you

New to AITBM? Here's the shortest route to what you need.

I assess AI systems

Understand the scoring model, then score a real system and watch controls move the number.

The framework → Calculator →

I own compliance & risk

See how AITBM maps to ISO 42001, NIST AI RMF, and the EU AI Act — and where it goes further.

Gap analysis → Standards →

I want to contribute

Pilot an assessment, validate inter-assessor consistency, or open an issue or pull request.

GitHub → Get involved →

WORKED EXAMPLE

Proven on a real scenario: "Finbot"

Finbot is a financial-advisory agent that combines RAG with tool calling — AITBM's canonical test case. Independent assessors run the rubric and land on the same numbers, every time. That repeatability is the whole point.

See controls take Finbot from 9.7 to 3.2 →

10.0

ERS

1.35

CRM

0.00

variance

Aligned to the standards you already use

AITBM maps against OWASP, MITRE ATLAS, NIST AI RMF, ISO 42001/42005, the EU AI Act, and the AIDEFEND defensive taxonomy — turning controls into measurable evidence rather than checklists.

Explore the framework Resources & standards