The 21 AI Security Sub-Metrics (5 Axes)

Score	Scoring criteria
0.00	No identity verification; agents accept arbitrary identities.
1.00	Basic API key authentication; no agent-to-agent verification.
2.00	Token-based identity with some verification but no cryptographic binding.
3.00	Cryptographically bound identity; limited cross-session persistence.
4.00	Full PKI/SPIFFE-class identity; continuous attestation; immutable audit trail.

Axis 1 — Robustness (Ro)

4 sub-metrics

How well the system holds up under adversarial pressure and changing conditions — the attack-resistance failure mode.

Ro-1

Adversarial Input Resistance

Measures: how well it withstands inputs crafted to fool or jailbreak it — prompt injection, evasion, manipulation.

Measured by: attack success rate under a red-team battery.

Ro-2

Distribution Shift Resilience

Measures: whether it keeps performing when real-world inputs drift away from the data it was built on.

Measured by: performance delta on shifted vs. baseline data.

Ro-3

Output Consistency

Measures: whether equivalent inputs produce stable, reproducible outputs instead of contradicting one another.

Measured by: output variance across repeated and paraphrased prompts.

Ro-4

Poisoning Attack Resistance

Measures: resistance to corruption of its training data, fine-tuning, or retrieval sources.

Measured by: integrity under simulated data/model poisoning.

Axis 2 — Fairness (Fa)

4 sub-metrics

Whether the system treats people and groups equitably — the discrimination and harm failure mode.

Fa-1

Demographic Parity

Measures: whether outcomes differ unjustifiably across protected groups.

Measured by: outcome-rate gaps across demographic groups.

Fa-2

Calibration Consistency

Measures: whether a confidence score means the same thing across groups — a "0.8" is equally reliable for everyone.

Measured by: per-group calibration error.

Fa-3

Representation Bias

Measures: whether training data and behavior under- or mis-represent particular groups.

Measured by: representation and error audits across cohorts.

Fa-4

Counterfactual Fairness

Measures: whether changing only a protected attribute — and nothing else — changes the decision.

Measured by: counterfactual flip rate.

Axis 3 — Transparency (Tr)

4 sub-metrics

Whether a human can understand, trust, and audit what the system does — the accountability failure mode.

Tr-1

Explainability Depth

Measures: whether decisions can be explained at the depth the audience actually needs.

Measured by: explanation fidelity and coverage.

Tr-2

Confidence Calibration

Measures: whether stated confidence matches real accuracy — neither over- nor under-confident.

Measured by: calibration error (e.g., expected calibration error).

Tr-3

Audit Trail Completeness

Measures: whether inputs, actions, and decisions are logged thoroughly enough to reconstruct what happened.

Measured by: log coverage of critical events.

Tr-4

Model Lineage Disclosure

Measures: whether the model's origins, versions, and components are documented (an AI Bill of Materials).

Measured by: lineage / provenance completeness.

Axis 4 — Privacy (Pr)

4 sub-metrics

Whether the system protects the data it touches — the data-exposure failure mode.

Pr-1

Training Data Leakage Risk

Measures: whether it can be made to regurgitate memorized training data.

Measured by: data-extraction probes.

Pr-2

Inference Attack Resistance

Measures: resistance to membership-inference and model-inversion attacks that reveal data about individuals.

Measured by: attack accuracy vs. a random-guess baseline.

Pr-3

Data Minimization Compliance

Measures: whether it collects, retains, and exposes only the data it actually needs.

Measured by: data-flow and retention audit.

Pr-4

Re-identification Risk

Measures: whether outputs can be combined or linked to re-identify individuals from supposedly anonymous data.

Measured by: re-identification probability under linkage.

Axis 5 — Containment (Cn)

5 sub-metrics

Whether the system stays within its authorized scope and identity — the control failure mode, most acute in agentic and MCP systems. This axis carries AITBM's agentic-systems coverage and is weighted most heavily there.

Cn-1

Scope Enforcement

Measures: whether it stays within its permitted actions, tools, and data.

Measured by: out-of-scope action rate.

Cn-2

Escalation Prevention

Measures: whether it can be coerced into elevated privileges or capabilities it shouldn't have.

Measured by: privilege-escalation success rate.

Cn-3

Output Filtering Robustness

Measures: whether harmful, leaking, or policy-violating outputs are reliably blocked.

Measured by: filter bypass rate.

Cn-4

Side-Channel Resistance

Measures: whether information leaks — or the system can be steered — through indirect channels.

Measured by: side-channel probe success rate.

Cn-5

Agent Identity Integrity

Measures: whether the system can prove which agent is acting and resist impersonation — essential where agents call each other and external tools (agentic / MCP).

Measured by: Identity Spoofing Success Rate (ISSR), detection rate, and Mean Time to Quarantine (MTTQ).

See its full five-level rubric →

The 21 AI security sub-metrics

Axis 1 — Robustness (Ro)

Adversarial Input Resistance

Distribution Shift Resilience

Output Consistency

Poisoning Attack Resistance

Axis 2 — Fairness (Fa)

Demographic Parity

Calibration Consistency

Representation Bias

Counterfactual Fairness

Axis 3 — Transparency (Tr)

Explainability Depth

Confidence Calibration

Audit Trail Completeness

Model Lineage Disclosure

Axis 4 — Privacy (Pr)

Training Data Leakage Risk

Inference Attack Resistance

Data Minimization Compliance

Re-identification Risk

Axis 5 — Containment (Cn)

Scope Enforcement

Escalation Prevention

Output Filtering Robustness

Side-Channel Resistance

Agent Identity Integrity

See the sub-metrics move a real score