Gaps in AI Security Assessment — AITBM Gap Analysis

Four failure modes

1. Methodological foundations

Ambiguous severity language produces 15–30% inter-assessor variance and single-score reductionism that discards multi-dimensional signal.

2. Operational reality

Systems are scored in isolation. Deployment context, threat sophistication, and evidence staleness go unmodeled.

3. Scope blindness

Static-model assumptions miss agentic and MCP threats: autonomy risks, agent identity, and tool/function-calling attack surface.

4. Structural impossibilities

Frameworks imply zero risk is achievable. Emergent behavior and cross-layer cascades make that false.

The twelve gaps

Ten are fully addressed today; two are partially addressed with planned enhancements.

Gap	Impact	AITBM solution	Coverage
Methodological foundations
Subjectivity in scoring	15–30% inter-assessor variance	Five-level quantitative rubrics	Full
Single-score reductionism	Multi-dimensional signal loss	Three-layer IVP / ORP / ACI architecture	Full
Incomplete scoring guidance	Implementation paralysis	Test methods + tool integration per sub-metric	Full
Operational reality
Context-blind scoring	Critical vs. low-risk indistinguishable	ORP layer + Compound Risk Multiplier (CRM)	Full
Temporal evidence decay	Stale assessments, false confidence	ACI decay + re-assessment triggers	Full
Operational vs. intrinsic conflation	Zero-risk fallacy	IVP + ORP separation; α = 0.15 floor	Full
Scope blindness
Agentic autonomy risks	Agentic threat classes unaddressed	Containment axis (Cn-1 … Cn-5)	Full
Identity security gaps	Agent impersonation; no IAM for agents	Cn-5 Agent Identity Integrity	Full
Tool / function-calling security	Tool poisoning; MCP CVEs	Ro-4 + Cn-1/Cn-2 + Tr-3	Full
Structural impossibilities
Residual risk floor	False belief in zero risk	α = 0.15 irreducible risk	Full
Emergent-behavior unpredictability	Non-deterministic risk underestimated	α = 0.15 + Cn-4 detection	Partial
Cross-layer cascading failures	Amplification not modeled	Attack-surface proxy; enhancement planned	Partial

10 / 12 fully addressed 2 / 12 partially addressed (enhancements planned)

Evidence from 2025–2026 research

The analysis draws on five research segments and 43+ sources. The agentic/MCP segment in particular surfaced the threat data that motivates AITBM's Containment axis.

84.2%

tool-poisoning attack success rate reported by the MCPTox benchmark.

8,000+

MCP servers found exposed without authentication — Cn-5 Level 0.

30+

CVEs filed against MCP infrastructure in a two-month window.

CVSS 9.3

EchoLeak (CVE-2025-32711) — a zero-click prompt-injection disclosure.

87%

of downstream decisions poisoned within four hours from a single compromised agent.

SPIFFE / OIDC-A

agent-identity standards mapped directly onto the Cn-5 rubric levels.

Figures are drawn from published 2025–2026 security research cited in the AITBM gap analysis. See the Resources page for the full documentation.

Still ahead: AIVSS v0.8 and AIUC-1

The comparison was re-verified against OWASP AIVSS v0.8 (released March 2026 — CVSS 4.0 baseline plus ten equal-weight agentic amplification factors) and extended with AIUC-1 (January 2026 — the AI-agent certification standard with a Lloyd's-backed insurance backstop). Verdicts below reflect v0.8's genuine improvements; all twelve structural gaps remain open.

Capability	CVSS 4.0	AIVSS v0.8	RAISE	AIUC-1	AITBM
AI-native (non-deterministic)	No	Partial	Yes	Yes	Yes
Multi-dimensional profile	No	No	Yes	No	Yes
Deterministic weights	N/A	Yes	Partial	N/A	Yes
Epistemic confidence scoring	No	No	No	No	Yes
Behavioral drift monitoring	No	No	No	Partial	Yes
Supply chain integration (AIBOM)	No	Partial	No	Partial	Yes
Tiered SME pathway	N/A	No	No	Partial	Yes
Stateful/agentic risk modeling	No	Yes	No	Partial	Yes
MVT enforcement per dimension	No	No	No	Partial	Yes
Compound operational risk	No	No	No	No	Yes
Residual deployment risk floor	No	Partial	No	Partial	Yes
Graduated MVT severity	No	No	No	No	Yes
Jurisdictional fairness	No	No	No	Partial	Yes
Architecture classification tree	No	Partial	No	Partial	Yes
Inter-rater reliability targets	No	No	No	No	Yes

Verdicts mirror Section 10 of the framework specification (June 2026). CVSS 4.0 and RAISE columns are hidden on small screens.

All twelve gaps persist in v0.8

v0.8 fixes factor weights and adds a relative 0.67 mitigation floor, but keeps a single 0–10 score by design, rates factors on three-point single-phrase anchors with no test methods, and drops fairness, privacy, and transparency entirely.

AIUC-1 independently validates Cn-5

The official AIVSS↔AIUC-1 crosswalk maps zero AIUC-1 controls to Agent Identity Impersonation and Multi-Agent Orchestration, and the standard is single-agent scoped — exactly the exposure Cn-5 and AITBM's agentic weighting quantify.

Insurance is not measurement

AIUC-1's Lloyd's-backed insurance is the economic counterpart of AITBM's α = 0.15 residual-risk floor: it transfers residual risk financially, while AITBM quantifies it — so a certified, insured agent still warrants a full ERS profile.

Verdict: AITBM is the only framework with full coverage on 11 of the 15 compared capabilities — four of which (epistemic confidence, compound operational risk, graduated MVT severity, inter-rater reliability targets) no compared framework addresses at all — and the only one with five-level rubrics and required test methods per sub-metric. Both standards strengthen the ecosystem AITBM plugs into: AIVSS quantifies vulnerabilities, AIUC-1 certifies controls, and AITBM measures system risk across dimensions, context, and time.

Gap analysis