Gap analysis

Twelve structural deficiencies in existing AI security assessment methodologies — grouped into four domains — and how AITBM addresses each.

THE BOTTOM LINE

Existing AI security frameworks each leave the same structural gaps. AITBM is the only compared framework with full coverage on 11 of 15 capabilities — and the only one with five-level rubrics and required test methods per sub-metric. It doesn't replace OWASP AIVSS or AIUC-1; it measures system risk across dimensions, context, and time where they don't.

12
structural gaps identified
10 / 12
fully addressed today
11 / 15
capabilities at full coverage
43+
research sources

Four failure modes

1. Methodological foundations

Ambiguous severity language produces 15–30% inter-assessor variance and single-score reductionism that discards multi-dimensional signal.

2. Operational reality

Systems are scored in isolation. Deployment context, threat sophistication, and evidence staleness go unmodeled.

3. Scope blindness

Static-model assumptions miss agentic and MCP threats: autonomy risks, agent identity, and tool/function-calling attack surface.

4. Structural impossibilities

Frameworks imply zero risk is achievable. Emergent behavior and cross-layer cascades make that false.

The twelve gaps

Ten are fully addressed today; two are partially addressed with planned enhancements.

Gap Impact AITBM solution Coverage
Methodological foundations
Subjectivity in scoring15–30% inter-assessor varianceFive-level quantitative rubricsFull
Single-score reductionismMulti-dimensional signal lossThree-layer IVP / ORP / ACI architectureFull
Incomplete scoring guidanceImplementation paralysisTest methods + tool integration per sub-metricFull
Operational reality
Context-blind scoringCritical vs. low-risk indistinguishableORP layer + Compound Risk Multiplier (CRM)Full
Temporal evidence decayStale assessments, false confidenceACI decay + re-assessment triggersFull
Operational vs. intrinsic conflationZero-risk fallacyIVP + ORP separation; α = 0.15 floorFull
Scope blindness
Agentic autonomy risksAgentic threat classes unaddressedContainment axis (Cn-1 … Cn-5)Full
Identity security gapsAgent impersonation; no IAM for agentsCn-5 Agent Identity IntegrityFull
Tool / function-calling securityTool poisoning; MCP CVEsRo-4 + Cn-1/Cn-2 + Tr-3Full
Structural impossibilities
Residual risk floorFalse belief in zero riskα = 0.15 irreducible riskFull
Emergent-behavior unpredictabilityNon-deterministic risk underestimatedα = 0.15 + Cn-4 detectionPartial
Cross-layer cascading failuresAmplification not modeledAttack-surface proxy; enhancement plannedPartial
10 / 12 fully addressed 2 / 12 partially addressed (enhancements planned)

Evidence from 2025–2026 research

The analysis draws on five research segments and 43+ sources. The agentic/MCP segment in particular surfaced the threat data that motivates AITBM's Containment axis.

84.2%

tool-poisoning attack success rate reported by the MCPTox benchmark.

8,000+

MCP servers found exposed without authentication — Cn-5 Level 0.

30+

CVEs filed against MCP infrastructure in a two-month window.

CVSS 9.3

EchoLeak (CVE-2025-32711) — a zero-click prompt-injection disclosure.

87%

of downstream decisions poisoned within four hours from a single compromised agent.

SPIFFE / OIDC-A

agent-identity standards mapped directly onto the Cn-5 rubric levels.

Figures are drawn from published 2025–2026 security research cited in the AITBM gap analysis. See the Resources page for the full documentation.

June 2026 re-validation

Still ahead: AIVSS v0.8 and AIUC-1

The comparison was re-verified against OWASP AIVSS v0.8 (released March 2026 — CVSS 4.0 baseline plus ten equal-weight agentic amplification factors) and extended with AIUC-1 (January 2026 — the AI-agent certification standard with a Lloyd's-backed insurance backstop). Verdicts below reflect v0.8's genuine improvements; all twelve structural gaps remain open.

Capability AIVSS v0.8 AIUC-1 AITBM
AI-native (non-deterministic)PartialYesYes
Multi-dimensional profileNoNoYes
Deterministic weightsYesN/AYes
Epistemic confidence scoringNoNoYes
Behavioral drift monitoringNoPartialYes
Supply chain integration (AIBOM)PartialPartialYes
Tiered SME pathwayNoPartialYes
Stateful/agentic risk modelingYesPartialYes
MVT enforcement per dimensionNoPartialYes
Compound operational riskNoNoYes
Residual deployment risk floorPartialPartialYes
Graduated MVT severityNoNoYes
Jurisdictional fairnessNoPartialYes
Architecture classification treePartialPartialYes
Inter-rater reliability targetsNoNoYes

Verdicts mirror Section 10 of the framework specification (June 2026). CVSS 4.0 and RAISE columns are hidden on small screens.

All twelve gaps persist in v0.8

v0.8 fixes factor weights and adds a relative 0.67 mitigation floor, but keeps a single 0–10 score by design, rates factors on three-point single-phrase anchors with no test methods, and drops fairness, privacy, and transparency entirely.

AIUC-1 independently validates Cn-5

The official AIVSS↔AIUC-1 crosswalk maps zero AIUC-1 controls to Agent Identity Impersonation and Multi-Agent Orchestration, and the standard is single-agent scoped — exactly the exposure Cn-5 and AITBM's agentic weighting quantify.

Insurance is not measurement

AIUC-1's Lloyd's-backed insurance is the economic counterpart of AITBM's α = 0.15 residual-risk floor: it transfers residual risk financially, while AITBM quantifies it — so a certified, insured agent still warrants a full ERS profile.

Verdict: AITBM is the only framework with full coverage on 11 of the 15 compared capabilities — four of which (epistemic confidence, compound operational risk, graduated MVT severity, inter-rater reliability targets) no compared framework addresses at all — and the only one with five-level rubrics and required test methods per sub-metric. Both standards strengthen the ecosystem AITBM plugs into: AIVSS quantifies vulnerabilities, AIUC-1 certifies controls, and AITBM measures system risk across dimensions, context, and time.