IVP
Intrinsic Vulnerability ProfileAsks: how strong is the system itself?
21 checks across 5 security areas — Robustness, Fairness, Transparency, Privacy, Containment — each scored 0–4 on a fixed rubric.
Open framework · Community-built · Free
AITBM — the AI Trust Benchmarking and Maturity Framework — replaces subjective "low / medium / high" judgments with a repeatable method: granular scoring rubrics, a three-layer architecture, and grounded math. You get a single 0–10 risk score, with the dimensional detail behind it still visible.
A repeatable way to measure how risky an AI system is — built so two assessors reach the same score.
One 0–10 Effective Risk Score, plus the five-dimension profile that explains it.
Security assessors, compliance & risk teams, and engineers shipping LLM and agentic systems.
Nothing. Open, community-built, and licensed CC BY 4.0 — no paid tooling required.
AITBM looks at a system through three independent lenses, then composes them into a single ERS. Each lens answers a different question — and a strong result in one cannot hide a weak result in another.
Asks: how strong is the system itself?
21 checks across 5 security areas — Robustness, Fairness, Transparency, Privacy, Containment — each scored 0–4 on a fixed rubric.
Asks: how risky is where and how it's used?
Autonomy, exposure, blast radius, and how hard it is to fix — combined into a Compound Risk Multiplier (CRM).
Asks: how much can we trust what we know?
Discounts the score as evidence ages or thins out — and flags when it's time to re-assess.
ERS = α + (1 − α) × f(IVP, ORP, ACI) where α = 0.15
One number, 0–10. The residual-risk floor (α = 0.15) is deliberate: even perfect controls leave an irreducible 15% of risk. AI risk can never be zeroed out — so AITBM never pretends it can.
SEVERITY SCALE
A higher ERS means higher residual risk — and triggers a deeper assessment tier.
CVSS adaptations, OWASP AIVSS, and the OWASP Top 10 for LLMs each leave the same four gaps. AITBM was designed to close them.
Vague severity language drives 15–30% disagreement between assessors on the same system.
AITBM → fixed 0–4 rubrics
A medical-diagnosis model and a recipe chatbot with identical flaws score identically.
AITBM → ORP deployment layer
Static-model assumptions miss agentic and MCP threats — tool poisoning, rogue agents, identity spoofing.
AITBM → Containment axis + Cn-5
Frameworks imply enough controls eliminate risk. Emergent behavior makes that impossible.
AITBM → α = 0.15 risk floor
New to AITBM? Here's the shortest route to what you need.
Understand the scoring model, then score a real system and watch controls move the number.
See how AITBM maps to ISO 42001, NIST AI RMF, and the EU AI Act — and where it goes further.
Pilot an assessment, validate inter-assessor consistency, or open an issue or pull request.
Finbot is a financial-advisory agent that combines RAG with tool calling — AITBM's canonical test case. Independent assessors run the rubric and land on the same numbers, every time. That repeatability is the whole point.
See controls take Finbot from 9.7 to 3.2 →AITBM maps against OWASP, MITRE ATLAS, NIST AI RMF, ISO 42001/42005, the EU AI Act, and the AIDEFEND defensive taxonomy — turning controls into measurable evidence rather than checklists.