Valid-AI-ted Audit Engine: Specification and Architecture

White Paper | 2026-03-27 | Status: draft

Valid-AI-ted Audit Engine: Specification and Architecture

Executive Summary

The governance, risk, and compliance challenge surrounding AI systems has grown beyond the capacity of manual audit processes to address at scale. Enterprises deploying AI across hundreds of models, pipelines, and agentic workflows face the prospect of maintaining compliance with an overlapping, often inconsistent set of frameworks — the CSA AI Controls Matrix, ISO/IEC 42001, ISO/IEC 27001:2022, SOC 2 Trust Services Criteria, the NIST AI Risk Management Framework, and the EU AI Act — each with its own control vocabulary, assessment cadence, and evidence requirements. The organizations seeking independent third-party assurance over these deployments face the same challenge from the auditor’s side: how to evaluate AI systems whose behavioral properties are dynamic, whose control surfaces shift with model updates, and whose risk posture may change materially between annual certification windows.

The Valid-AI-ted Audit Engine was designed to resolve this tension. Launched by the Cloud Security Alliance in November 2025 alongside the STAR for AI Level 2 program [1], Valid-AI-ted is an AI-assisted audit and assurance platform that automates the most time-consuming and error-prone elements of AI compliance workflows: cross-framework control mapping, self-assessment scoring, gap identification, remediation prioritization, and evidence collection. By applying large language models and structured reasoning to the assessment process itself, the engine delivers what annual audit cycles cannot — continuous visibility into an organization’s AI compliance posture, with actionable intelligence rather than point-in-time snapshots.

This specification describes the engine’s technical architecture and capabilities in sufficient detail to support organizational deployment decisions, integration planning, and audit reliance determinations. Section 2 describes the Automated Framework Mapping Engine, including the algorithms that translate controls across six major frameworks and the confidence scoring mechanism that communicates mapping fidelity. Section 3 covers the AI Analysis Capabilities, including the risk-reduction–based control prioritization model and the gap analysis and remediation recommendation system. Section 4 describes the Continuous Evaluation Features, including integration with the AI Risk Observatory and the trigger conditions that initiate reassessment cycles. Section 5 presents the GRC Modernization Architecture — the data model, integration APIs, processing pipeline, and reporting layer. Sections 6, 7, and 8 address privacy and confidentiality controls, integration with the STAR for AI certification program, and the overall system architecture. Section 9 provides the Framework Alignment table mapping engine capabilities to the major standards the engine serves.

Organizations that implement Valid-AI-ted as a core component of their AI assurance program should expect materially reduced assessment cycle times, more consistent control evaluation outcomes across frameworks, and a measurably stronger evidentiary foundation for both internal governance and third-party audit engagements.


1. Introduction: The GRC Modernization Imperative for AI

The global GRC software market has expanded dramatically in response to the regulatory complexity surrounding AI, with the market projected to exceed fifteen billion dollars by the mid-2020s as organizations invest in automating compliance workflows that were previously managed through spreadsheets, email chains, and periodic consultant engagements [2]. Platforms such as ServiceNow GRC, Vanta, and Drata have demonstrated the value of continuous control monitoring in traditional cloud and SaaS compliance contexts — Drata, for example, runs more than twelve hundred automated hourly control tests across compliance frameworks and collects evidence from over one hundred seventy integrations [3]. But these general-purpose GRC platforms were not built for the distinctive challenges that AI systems present, and their shortcomings in the AI assurance context have become increasingly apparent as AI deployments have grown in complexity and regulatory scrutiny.

AI systems differ from conventional IT systems in ways that are compliance-relevant. Their behavior is probabilistic rather than deterministic: the same inputs do not always produce the same outputs, and output quality can degrade gradually without triggering conventional availability or integrity alerts. Their governance requirements are lifecycle-dependent: the controls appropriate for a model in training differ materially from those appropriate for the same model in production, and a model that was compliant when deployed may become non-compliant as it drifts from its original training distribution. Their risk posture is sensitive to the data they consume and the tools they invoke, particularly in agentic configurations where the model interacts with external services and other agents without per-action human review. Standard GRC platforms that collect binary evidence — a policy document exists, an access control is configured, a vulnerability scan was run — are not equipped to evaluate these properties. The evidence required to demonstrate AI compliance is richer, more contextual, and more temporally sensitive than conventional IT compliance evidence.

The frameworks governing AI systems have been designed with this complexity in mind, but they have arrived in rapid succession and without a common control vocabulary. The NIST AI Risk Management Framework 1.0, published in January 2023, organizes AI risk management across four functions — GOVERN, MAP, MEASURE, and MANAGE — with 72 subcategories, providing a risk-focused structure that complements but does not align directly with the clause-based architecture of ISO/IEC 42001:2023 [4][5]. The CSA AI Controls Matrix, published in July 2025, provides 243 control objectives across 18 domains that are explicitly mapped to ISO 42001, ISO 27001, and NIST AI RMF, but the mappings require interpretation and the controls themselves represent a distinct enumeration rather than a direct translation [6]. The EU AI Act’s conformity assessment requirements, which become mandatory for high-risk AI systems on August 2, 2026, add yet another vocabulary — conformity assessment procedures, technical documentation requirements, CE marking — that does not map cleanly to any existing controls framework [7]. An organization seeking simultaneous compliance with all of these frameworks faces a mapping and evidence management challenge that would require substantial dedicated personnel to manage manually.

Valid-AI-ted addresses this challenge by serving as the intelligence layer between an organization’s AI systems and the multiple frameworks they must satisfy. The engine does not replace the judgment of qualified compliance professionals or auditors; it augments that judgment by automating the mechanical elements of the assessment process, surfacing the information that human experts need to make consequential compliance determinations, and maintaining a continuous evidentiary record that supports both internal governance and external audit reliance. The following sections describe how that augmentation is implemented in technical terms.


2. Automated Framework Mapping Engine

How the Mapping Engine Works

The Automated Framework Mapping Engine is the foundational capability of the Valid-AI-ted platform. Its purpose is to translate a compliance obligation expressed in the vocabulary of one framework into its closest equivalents in the vocabularies of five other frameworks, enabling organizations to satisfy a single control implementation in ways that simultaneously address requirements across all supported frameworks. The engine maintains a curated control relationship graph — a structured knowledge representation in which individual controls from each supported framework are represented as nodes, and the semantic relationships between controls across frameworks are represented as weighted directed edges.

The construction of this graph is a hybrid process combining expert-curated mappings with machine-learning–assisted refinement. CSA’s working groups have published formal mappings between AICM and ISO 42001, ISO 27001, and NIST AI RMF as part of the AICM release package [6]. These expert mappings form the authoritative backbone of the graph. For framework pairs where authoritative cross-framework mappings do not yet exist — including AICM-to-EU-AI-Act and AICM-to-SOC-2 pairings — the engine uses a combination of semantic embedding analysis and expert review to establish candidate mappings, which are then validated through the engine’s confidence scoring mechanism before being surfaced to users.

At runtime, when a user submits a control implementation record for one framework, the mapping engine traverses the control relationship graph to identify all controls in other frameworks that the implementation is likely to satisfy. The traversal returns not merely a list of related controls but a structured relationship record for each mapping: the source control identifier, the target control identifier, the relationship type (full equivalence, partial coverage, complementary requirement, or informative reference), the confidence score, and the human-readable rationale explaining the basis for the mapping. This structured output allows compliance teams to make informed decisions about whether a single evidence artifact can satisfy multiple framework requirements simultaneously, or whether separate evidence is needed to address the distinct elements of each framework’s control.

Demonstration: A Single AICM Control Mapped Across Six Frameworks

To illustrate the mapping engine’s behavior concretely, consider AICM control GRC-07, which requires organizations to establish documented procedures for assessing and communicating AI system risk to internal and external stakeholders. When this control is submitted to the mapping engine, the following cross-framework relationships are returned.

Source Control Target Framework Target Control Relationship Type Confidence
AICM GRC-07 ISO 42001 Clause 6.1 (Risk Assessment) Full Equivalence 0.94
AICM GRC-07 ISO 42001 Clause 7.4 (Communication) Partial Coverage 0.82
AICM GRC-07 ISO 27001:2022 A.5.29 (Risk Assessment) Partial Coverage 0.79
AICM GRC-07 NIST AI RMF GOVERN 1.7 (Risk Documentation) Full Equivalence 0.91
AICM GRC-07 NIST AI RMF MAP 5.1 (Impact Documentation) Complementary 0.74
AICM GRC-07 SOC 2 CC3.2 (Risk Assessment Process) Partial Coverage 0.77
AICM GRC-07 EU AI Act Article 9 (Risk Management System) Partial Coverage 0.85
AICM GRC-07 STAR for AI AI-CAIQ GRC Domain Q4 Full Equivalence 0.97

The confidence scores reflect the engine’s estimate of the degree to which the source control, when properly implemented, satisfies the substance of the target control. A full equivalence mapping with a confidence score above 0.90 indicates that the same evidence artifact — a documented risk assessment procedure and its outputs — should be accepted as satisfying the target control by a competent auditor. A partial coverage mapping indicates that additional evidence is required, and the engine’s rationale field describes specifically what that additional evidence must address. A complementary requirement mapping indicates that the frameworks address related but distinct obligations, and separate compliance work is needed for both.

Inheritance and Hierarchy in Multi-Framework Compliance

The mapping engine recognizes that compliance frameworks are not flat collections of independent controls. They are structured around hierarchies — clauses within sections within domains — and individual controls inherit obligations from higher-level requirements that may not have direct counterparts in other frameworks. ISO 42001’s Clause 4 (Context of the Organization) and Clause 5 (Leadership), for example, establish organizational governance prerequisites that condition the interpretation of all operational controls in Clauses 6 through 10 [5]. A mapping that treats Clause 8 (Operation) controls in isolation, without recognizing their dependency on Clause 4 and 5 prerequisites, will produce misleading compliance assessments.

To address this, the engine’s control relationship graph encodes parent-child relationships within each framework as well as cross-framework relationships between frameworks. When the engine maps a leaf-level control from one framework to a leaf-level control in another, it also identifies any higher-level requirements in the target framework that must be satisfied as prerequisites. The compliance report surfaced to users reflects this hierarchy: satisfying AICM GRC-07 contributes to ISO 42001 Clause 6.1 compliance, but the report will note that ISO 42001 Clause 4.1 (Understanding the Organization) and Clause 5.1 (Leadership and Commitment) must also be addressed before a complete conformity claim can be made for the ISO 42001 governance domain.

Mapping Confidence Scoring

Confidence scores are computed through a multi-factor model that combines three input signals. The first is semantic similarity, measured by comparing the embedding representations of control text across frameworks using a fine-tuned language model trained on regulatory and compliance documents. The second is provenance weight, which reflects whether the mapping is derived from an authoritative source (such as the AICM-to-ISO-42001 mapping published by CSA’s working groups) or from machine-assisted analysis without formal expert validation. Provenance-weighted mappings receive a boost of 0.05 to 0.15 in their confidence score relative to machine-only mappings, depending on the authority of the source. The third is coverage alignment, which measures the degree to which the source and target controls address the same scope elements — organizational scope, technical scope, and lifecycle scope — as extracted from the control text through structured parsing.

The composite confidence score is computed as a weighted average of these three signals, with provenance weight carrying the highest weight (0.45), followed by coverage alignment (0.35) and semantic similarity (0.20). Scores below 0.60 are flagged as low-confidence mappings requiring human review before being used for compliance reporting. Scores between 0.60 and 0.75 are presented as informative references only. Scores above 0.75 are presented as substantive mappings, with full equivalence determined by the relationship type rather than confidence score alone.


3. AI Analysis Capabilities

Control Prioritization by Risk Reduction Impact

Not all compliance gaps are equal. An organization with limited resources for remediation work needs to direct its effort toward the controls whose implementation will produce the greatest reduction in risk exposure — not simply the controls that appear first in a framework’s table of contents or that are easiest to satisfy. Valid-AI-ted’s control prioritization system computes a Risk Reduction Impact (RRI) score for each unimplemented or partially implemented control, allowing organizations to sequence their compliance roadmap by expected risk reduction rather than by arbitrary ordering.

The RRI computation draws on three inputs. The first is the inherent risk weight of the control domain, derived from the AICM’s Capabilities-Based Risk Assessment (CBRA) model, which calibrates control importance to the specific capabilities and deployment context of the organization’s AI systems [8]. An organization deploying agentic AI systems with access to sensitive financial data will see higher inherent risk weights assigned to identity management, behavioral monitoring, and data lineage controls than an organization deploying a simple text classification model. The second input is the remediation leverage ratio — an estimate of how many compliance obligations across all supported frameworks are addressed by implementing this single control, computed from the mapping engine’s cross-framework relationship graph. A control with high remediation leverage satisfies obligations in multiple frameworks simultaneously, multiplying the compliance value of a single implementation effort. The third input is the current implementation maturity score, derived from the most recent AI-CAIQ self-assessment submission, which anchors the RRI to the organization’s actual current state rather than a hypothetical baseline.

RRI scores are presented on a normalized scale of 0 to 100 and are recalculated automatically whenever new assessment data is ingested, ensuring that the prioritization remains current as the organization’s implementation posture evolves. Controls with RRI scores above 75 are flagged as high-priority and highlighted in the compliance dashboard with estimated remediation timelines and suggested evidence artifacts.

Resource Allocation Optimization

Extending the prioritization model, the engine’s resource allocation optimization module translates RRI scores into an actionable implementation roadmap that respects organizational resource constraints. The organization inputs its available compliance engineering capacity — measured in person-hours per sprint — and the module produces a sequenced queue of remediation actions ordered to maximize aggregate risk reduction per unit of effort. The optimization model accounts for control dependencies (some controls cannot be implemented until others are in place), shared evidence opportunities (some controls across different frameworks can be satisfied by the same evidence artifact, reducing total effort), and diminishing returns (implementing ten controls in one domain may produce less marginal risk reduction than implementing two controls each in five underserved domains).

The output is a sprint-by-sprint compliance roadmap that projects the organization’s compliance trajectory over a configurable time horizon — typically ninety days, six months, or twelve months. Each sprint plan identifies the specific controls to be addressed, the evidence artifacts to be collected, the frameworks that will receive credit for each implementation, and the projected change in the organization’s aggregate compliance score. Organizations find this roadmap particularly valuable when preparing for STAR for AI submissions or ISO 42001 certification audits, where demonstrating a structured and prioritized approach to gap remediation is itself evidence of mature governance practice.

Automated Scoring of STAR for AI Self-Assessments

The original Valid-AI-ted capability, launched in June 2025 with the Cloud Controls Matrix scoring engine [9], has been extended in the audit engine to support scoring of AI-CAIQ v1.0.2 self-assessments submitted for STAR for AI Level 1 designation. The scoring engine evaluates each AI-CAIQ response against a rubric derived from the corresponding AICM control objective and its implementation guidance, returning a per-question score, a domain-level score, and an overall maturity rating.

The rubric for each question is multi-dimensional. The engine assesses response completeness — whether the organization has addressed all substantive elements of the control — response specificity — whether the answer describes concrete implementations rather than aspirational policies — and evidence quality — whether the organization has cited documentation, configuration records, or third-party validation that substantiates the claim. Responses that pass all three dimensions receive full credit. Responses that pass completeness but fail on specificity or evidence quality receive partial credit with targeted feedback identifying the specific deficiencies. Responses that fail the completeness dimension are returned with a structured gap description identifying the unaddressed elements.

The scoring engine enforces consistency that manual review processes cannot achieve at scale. With more than four thousand organizations maintaining STAR Registry entries [9], and the AI-CAIQ adding a new assessment pathway for AI systems, the volume of submissions requiring evaluation would be unmanageable without automation. CSA members may resubmit their AI-CAIQ responses an unlimited number of times to iterate on their assessments; non-member providers receive up to ten resubmission opportunities. Each resubmission triggers a fresh evaluation with updated scores and feedback, creating a continuous improvement loop that organizations can use to progressively strengthen their AI governance posture.

Gap Analysis and Remediation Recommendation Engine

The gap analysis module synthesizes the outputs of the scoring engine, the mapping engine, and the organization’s current evidence inventory to produce a comprehensive gap assessment organized at three levels: individual control gaps, domain-level maturity gaps, and cross-framework coverage gaps. Individual control gaps identify specific control objectives that are not yet implemented or that have been implemented at insufficient maturity. Domain-level maturity gaps identify AICM domains where the aggregate implementation level is below threshold, even if individual controls within the domain are partially addressed. Cross-framework coverage gaps identify frameworks where the organization’s current AICM implementation provides less than seventy percent coverage, indicating that separate remediation work targeting that framework’s specific requirements is needed.

For each identified gap, the remediation recommendation engine generates a structured remediation record that includes a plain-language description of the gap, the specific evidence artifacts that would close it, suggested policy or procedure text where applicable, references to CSA guidance documents and external standards that address the gap, and an estimated implementation complexity rating. Recommendations are mapped to the organization’s existing asset and control inventory wherever possible, so that teams can identify whether a gap can be closed by extending an existing control rather than building a new one. The engine’s recommendations are not prescriptive mandates; they are starting points for compliance engineering work, designed to reduce the research burden on the teams responsible for implementation.


4. Continuous Evaluation Features

Integration with AI Risk Observatory Telemetry

The AI Risk Observatory, operated by the CSAI Foundation, provides real-time visibility into agentic AI activity across enterprise environments, OpenClaw-based agent ecosystems, and the broader MCP server landscape [10]. It functions as a next-generation threat intelligence infrastructure for AI governance, aggregating telemetry from deployed agent instances, monitoring for behavioral anomalies, operating a CVE Numbering Authority scoped to agentic AI vulnerabilities, and distributing structured risk identifiers to downstream governance tools. Valid-AI-ted’s continuous evaluation capability is anchored in its integration with this telemetry stream.

The integration operates through a structured event subscription model. Organizations that have registered their AI systems with the AI Risk Observatory can authorize Valid-AI-ted to receive risk events associated with those systems — model version change notifications, newly published CVEs affecting framework components in use, behavioral anomaly alerts from deployed agents, and updated risk scores for AI capabilities the organization has enumerated in their AICM implementation record. Each incoming event is evaluated against the organization’s current compliance posture: does this event indicate that a previously compliant control has become non-compliant? Does it indicate that the inherent risk weight of a control domain has changed, requiring re-prioritization of the compliance roadmap? Does it indicate that a new threat vector has emerged that is not yet addressed by any control in the organization’s implementation?

When an event triggers a compliance relevance determination, the engine generates a compliance impact assessment and queues it for review by the organization’s designated compliance administrator. The assessment describes the event, explains its relevance to the organization’s specific control implementation, and recommends either a reassessment action, a remediation action, or a monitoring action depending on the severity of the impact. This event-driven architecture ensures that the organization’s compliance posture reflects the current state of their AI systems and the current threat landscape, not the state that existed when their last annual assessment was conducted.

Real-Time Agent Behavior and Risk Posture Assessment

For organizations deploying agentic AI systems, the continuous evaluation layer extends beyond framework compliance to encompass real-time behavioral risk assessment. Agentic systems introduce compliance risk properties that are invisible to traditional point-in-time assessment: an agent that was behaving within its authorized scope during the last assessment cycle may have drifted over time toward actions that implicate data privacy, security, or fairness controls. Valid-AI-ted’s behavioral risk posture module ingests structured behavioral telemetry from instrumented agent deployments — action logs, tool invocation records, external API call histories, and inter-agent communication records — and evaluates this telemetry against the behavioral governance controls in the organization’s AICM implementation.

The evaluation model applies a set of behavioral compliance rules derived from the AICM control text and the organization’s documented control implementations. A control that requires agents to operate within a declared minimum-privilege tool set, for example, generates a behavioral rule that flags any tool invocation outside the declared set as a compliance event. A control that requires human approval for actions affecting regulated data generates a behavioral rule that monitors data classification labels in the agent’s execution context and alerts when regulated data flows are processed without the required approval. These rules are automatically derived from the organization’s control implementation records — they do not require separate behavioral policy configuration by the compliance team — and they are updated automatically when control implementations are revised.

The output of the behavioral assessment is a real-time risk posture score for each monitored agent deployment, updated on a configurable cadence (defaulting to hourly) and displayed in the Valid-AI-ted compliance dashboard alongside the framework compliance scores. Significant deviations from baseline posture — defined as changes exceeding two standard deviations from the rolling thirty-day behavioral mean — trigger alerts and initiate an automated impact assessment workflow.

Feedback Loops for Improving Implementations

Audit findings generate learning signals that, in conventional GRC processes, are captured in remediation action items and then largely forgotten until the next audit cycle. Valid-AI-ted’s feedback loop architecture treats every audit finding, every scoring interaction, and every reassessment event as a structured data point that can improve the quality of future assessments. This learning occurs at three levels.

At the question level, the scoring engine accumulates statistics on response patterns that receive high versus low scores, and uses this data to refine the specificity of feedback recommendations for future submissions. Organizations that have historically received low scores on implementation specificity, for example, will receive more detailed guidance on evidence artifact format requirements in subsequent scoring reports. At the control level, the mapping engine accumulates evidence about which cross-framework mapping relationships generate the most agreement or disagreement during auditor review, and uses this data to refine confidence scores and relationship types in the control relationship graph. At the framework level, the engine tracks the distribution of compliance gaps across AICM domains and frameworks, and uses this distribution to inform the risk weight calibration in the RRI model.

It is important to note that this feedback loop operates on aggregate, anonymized statistics and mapping relationship data only. Individual organization compliance records are not used as training data for any engine component. The feedback architecture is designed to improve the quality of the engine’s analytical models without compromising the confidentiality of any organization’s compliance posture.

Trigger Conditions for Reassessment

Valid-AI-ted’s continuous evaluation architecture defines a structured set of trigger conditions that initiate automated reassessment workflows. These triggers are organized into four categories. Framework triggers fire when a supported compliance framework publishes a new version, when a new framework mapping is added to the engine’s relationship graph, or when a mapping confidence score drops below threshold following a relationship graph update. System triggers fire when the organization’s AI system inventory changes — new models are deployed, model versions are updated, new agentic capabilities are enabled, or systems are decommissioned. Risk triggers fire when the AI Risk Observatory publishes a new CVE, threat intelligence report, or risk scoring update that implicates controls in the organization’s implementation record. Time triggers fire on a configurable recurring schedule, implementing the periodic reassessment cadence required by ISO 42001 Clause 9 (Performance Evaluation) and NIST AI RMF’s GOVERN function.

When a trigger fires, the engine generates a scoped reassessment task rather than initiating a full comprehensive assessment. The scope of the reassessment is determined by the nature of the trigger: a framework version update triggers reassessment of only the controls affected by the version change; a new CVE triggers reassessment of the controls relevant to the affected component; a system inventory change triggers reassessment of the controls whose scope includes the affected system. This scoped approach ensures that reassessment overhead scales with the materiality of the change rather than with the total size of the organization’s compliance program.


5. GRC Modernization Architecture

Evidence Collection Automation

Evidence collection is the most labor-intensive element of conventional compliance programs. Auditors and compliance teams spend a disproportionate share of their time gathering, organizing, and validating evidence artifacts — configuration screenshots, policy documents, access control reports, training records, penetration test results — rather than analyzing the compliance posture they represent. Valid-AI-ted’s evidence collection automation layer addresses this by establishing direct integration with the source systems that generate compliance-relevant evidence, eliminating the manual export-and-upload workflow that characterizes traditional GRC processes.

The evidence collection system supports three integration modes. In direct API integration mode, the engine connects to source systems using standard APIs — cloud provider management APIs, identity provider APIs, model management platform APIs, agent orchestration framework APIs — and automatically pulls evidence artifacts on a configured schedule. In webhook integration mode, source systems push event notifications to the engine when compliance-relevant state changes occur, such as when a new model is registered in a model registry or when an access policy is modified. In document ingestion mode, organizations upload evidence artifacts manually through the compliance portal; the engine then applies natural language processing to extract structured metadata — control identifiers, dates, scope assertions, implementation claims — from the document and link it to the relevant controls in the compliance record.

The categories of evidence that the engine collects and manages include organizational policy documents and procedures, model cards and system cards documenting AI system characteristics, access control configurations and role assignments, model training and evaluation records including fairness metrics and robustness test results, incident response records, third-party audit reports, data governance records documenting data lineage and processing agreements, and runtime behavioral logs from instrumented agent deployments. Each evidence artifact is stored with a structured metadata record identifying its source, collection timestamp, associated controls, evidence type, and expiration date (the date after which the artifact is no longer considered current for assessment purposes).

Data Model

The Valid-AI-ted data model is organized around five primary entity types and the relationships among them. AI Systems represent the discrete AI deployments whose compliance the engine tracks, identified by a system identifier that links them to entries in the organization’s AI inventory and to the AI Risk Observatory’s system registry. Controls represent individual control objectives from supported frameworks, stored with their framework identifier, domain, text, implementation guidance, and cross-framework relationship records. Evidence Artifacts represent the compliance evidence items collected through the automation layer or uploaded manually, with their metadata records and associations to the controls they substantiate. Assessments represent structured evaluation events — AI-CAIQ submissions, internal assessments, third-party audit engagements — with their scoring records, gap assessments, and remediation recommendations. Organizations represent the entities whose compliance posture the engine tracks, with their system inventories, control implementation records, and assessment histories.

The relationships among these entities encode the compliance logic of the engine. An AI System is subject to a set of Controls determined by its capabilities, deployment context, and applicable regulatory obligations. A Control is substantiated by one or more Evidence Artifacts. An Assessment evaluates a set of Controls against their evidence and produces scores, gap records, and recommendations. The mapping engine’s cross-framework relationship graph is represented as a many-to-many relationship between Controls across different frameworks, with relationship type and confidence score as edge attributes. This model supports the engine’s core analytical functions while also providing the audit trail — a complete, timestamped record of every compliance determination and the evidence on which it was based — that third-party auditors and regulators may require.

Integration APIs

Valid-AI-ted exposes a RESTful API organized into four functional groups. The Inbound Telemetry API accepts structured event payloads from the AI Risk Observatory, AI system monitoring agents, and model management platforms. Events are submitted using a defined schema that specifies the event type, source system identifier, event timestamp, structured event data, and a severity classification. The engine validates incoming events against the schema and routes them to the continuous evaluation subsystem for compliance impact assessment.

The Outbound Reporting API delivers compliance reports, assessment records, gap analyses, and remediation roadmaps to downstream GRC platforms, audit management systems, and executive dashboards. Supported export formats include structured JSON, PDF report packages, and OSCAL (Open Security Controls Assessment Language) formatted records for organizations using OSCAL-compatible governance platforms. The API supports both push delivery (webhooks configured in the downstream system) and pull access (authorized API clients polling for updated records).

The Assessment API enables programmatic submission and retrieval of AI-CAIQ self-assessments, supporting organizations that have integrated assessment workflows into their development pipelines or compliance automation toolchains. Submissions are validated against the AI-CAIQ schema before being routed to the scoring engine, and scored results are returned synchronously for immediate integration. The Evidence API supports programmatic upload and management of evidence artifacts, including bulk upload operations for organizations performing initial evidence library population and delta synchronization for organizations maintaining ongoing evidence freshness.

Authentication across all API endpoints uses OAuth 2.0 with JWT tokens scoped to specific organization identifiers and permission levels. Tokens are issued with short expiration windows (one hour maximum) and require rotation for long-running integration processes. Role-based access control within the API enforces separation of duties between evidence submitters, assessment reviewers, report consumers, and system administrators.

Processing Pipeline

The engine’s internal processing pipeline follows a four-stage architecture. In the ingestion stage, incoming data — evidence artifacts, assessment submissions, telemetry events, API calls — is validated, parsed, and normalized into the internal data model representation. Validation errors are surfaced synchronously for API calls and asynchronously via notification for scheduled collection jobs. The normalization process resolves semantic inconsistencies in control identifiers, dates, and evidence classifications across source systems with different conventions.

In the analysis stage, the engine applies its analytical models to the normalized data. For assessment submissions, the scoring engine evaluates each response against the rubric for its corresponding control and computes scores. For telemetry events, the compliance impact assessment module evaluates the event against the organization’s current control implementation record. For evidence artifacts, the document processing module extracts structured metadata and identifies the controls the artifact substantiates. The analysis stage is designed for horizontal scalability — individual analysis tasks are independent and can be processed in parallel — and the engine’s infrastructure supports burst processing for large batch submissions during period-end assessment cycles.

In the scoring and gap analysis stage, the outputs of the analysis stage are aggregated to compute domain-level scores, framework coverage percentages, cross-framework gap profiles, and RRI scores for unimplemented controls. This stage also applies the mapping engine to extend point-in-time compliance records across all supported frameworks, computing the secondary framework coverage attributable to each implemented control. In the reporting stage, scored and analyzed results are assembled into structured report objects and made available through the Outbound Reporting API, the compliance dashboard, and the STAR Registry submission interface.

Reporting Layer

The reporting layer serves three distinct audiences with differentiated output formats. For compliance teams, the engine produces a comprehensive Compliance Posture Report that presents the organization’s current control implementation status across all supported frameworks, with domain-level heat maps, trend lines showing compliance trajectory over time, and a prioritized gap remediation queue organized by RRI score. For executive audiences, the engine produces a Risk Executive Summary — a two-page condensed view of the organization’s aggregate compliance status, top-priority gaps, and recommended quarterly objectives, suitable for board reporting and CISO-level governance reviews. For external audiences — auditors, regulators, customers performing vendor due diligence — the engine produces an Evidence Package: a structured compilation of the evidence artifacts substantiating the organization’s compliance claims, organized by control domain and formatted to meet the documentation requirements of the relevant audit standard.

The reporting layer also produces a Purpose-Built STAR for AI Submission Package that formats the organization’s AI-CAIQ responses, Valid-AI-ted scores, and supporting evidence into the structured format required for STAR Registry submission. This package includes the per-question and per-domain scores from the most recent scoring run, the gap analysis and remediation summary, and a compliance narrative that contextualizes the scores within the organization’s overall AI governance program.


6. Privacy and Confidentiality

Handling Sensitive Audit Data

AI compliance assessments contain some of the most sensitive categories of organizational information: descriptions of AI system capabilities that may have competitive value, descriptions of security controls whose disclosure could enable targeted attacks, risk assessments whose content may be relevant to regulatory or litigation contexts, and detailed records of audit findings that organizations have strong interests in keeping confidential. The Valid-AI-ted Audit Engine’s data handling architecture has been designed with these confidentiality requirements as primary constraints rather than afterthoughts.

All compliance data stored in the Valid-AI-ted platform is encrypted at rest using AES-256 and in transit using TLS 1.3. Organization-specific compliance records are stored in logically isolated data stores — separate database schemas with separate encryption keys managed through a hardware security module — ensuring that a compromise of one organization’s data store cannot expose another organization’s records. The engine’s multi-tenant architecture enforces strict tenant isolation at the application layer as well as the storage layer: API requests are validated against organization-scoped JWT tokens before any database query is executed, preventing even authenticated requests from accessing records outside the requesting organization’s scope.

Data Residency and Sovereignty Controls

Many organizations operating in regulated industries or across multiple jurisdictions have mandatory data residency requirements that govern where compliance data may be stored and processed. Valid-AI-ted supports configurable data residency through a regional deployment model in which compliance data for a given organization is stored and processed exclusively in the designated geographic region. Currently supported regions include the European Union (Frankfurt and Dublin data centers), the United States (Virginia and Oregon), the United Kingdom, and the Asia-Pacific region (Singapore and Sydney). Organizations subject to EU data protection requirements, including those arising under the GDPR or the EU AI Act’s data governance provisions, should configure EU residency to ensure that AI compliance data does not leave the jurisdiction.

Within each region, the engine maintains complete audit logs of all data access events — which API clients accessed which records, when, and through which authentication path — to support the internal audit and compliance reporting requirements that regulated organizations typically impose on systems processing sensitive business information. These access logs are retained for a minimum of three years and are available for export through the Evidence API.

Access Controls and Role Segregation

The engine’s access control model implements role-based controls that reflect the functional separation of duties typical in enterprise compliance programs. The Compliance Administrator role has read and write access to all compliance records and system configuration settings for the organization’s tenant. The Compliance Analyst role has read and write access to assessment records and evidence artifacts but cannot modify system configuration. The Auditor role — designed for external third-party auditors conducting STAR for AI Level 2 engagements — has scoped read-only access to the evidence packages and assessment records relevant to the current engagement, with access expiring automatically at engagement conclusion. The Executive Viewer role has read-only access to the executive summary reports and dashboard views without access to underlying evidence details.

Access to the Auditor role requires explicit provisioning by the Compliance Administrator and generates an access grant record in the audit log. This record captures the identity of the administrator who granted access, the identity of the auditor receiving access, the scope of the access grant, the engagement start and end dates, and a log of all records accessed during the engagement. This audit trail is available to the organization at the conclusion of the engagement and provides documentation of the evidence review process that auditors may include in their working papers.

Attorney-Client Privilege Considerations

Organizations conducting internal compliance investigations may seek to protect the resulting records under attorney-client privilege or work product doctrine. Where legal counsel has directed that specific compliance assessment activities be conducted under privilege — for example, a gap assessment commissioned in anticipation of litigation or regulatory inquiry — the Valid-AI-ted platform supports the designation of specific assessment records as legally privileged. Privileged records are excluded from the Auditor access scope by default and require specific acknowledgment of the privilege designation before any external access can be granted. Organizations should consult with their legal counsel regarding the conditions under which AI compliance records may be designated as privileged and the procedures for maintaining that designation in a platform context.


7. Integration with STAR for AI

STAR for AI Level 1: Self-Assessment Support

The STAR for AI program, launched by the Cloud Security Alliance in October 2025, establishes the global framework for responsible and auditable AI governance [1]. Level 1 certification is awarded to organizations that publish a scored AI-CAIQ v1.0.2 self-assessment to the STAR Registry, demonstrating transparent, standardized disclosures aligned with the AICM. Valid-AI-ted is the designated scoring engine for Level 1 submissions, and its integration with the STAR Registry submission workflow is seamless: organizations complete their AI-CAIQ responses in the Valid-AI-ted assessment portal, submit them to the scoring engine, and upon receiving a passing score, authorize direct publication of the scored assessment to the STAR Registry through the engine’s registry integration.

The scoring process for Level 1 submissions is designed to support iterative improvement. Organizations that receive scores below threshold on specific controls receive structured feedback identifying the specific deficiencies — missing evidence claims, insufficient specificity, incomplete coverage of control elements — along with guidance on how to revise their response to address each deficiency. CSA members may resubmit without limit; non-member organizations receive up to ten resubmission cycles. Each resubmission generates a complete new score report, allowing organizations to track their improvement across cycles and to demonstrate a pattern of continuous improvement that may itself be evidence of mature governance practice. Zendesk, the first organization to achieve both STAR for AI Level 1 and Level 2 designations, has cited the iterative feedback mechanism as central to their preparation process [11].

STAR for AI Level 2: Third-Party Audit Workflow

Level 2 designation requires organizations to combine a Valid-AI-ted–scored AI-CAIQ with an ISO/IEC 42001 certification from an accredited third-party certification body, demonstrating both self-disclosed AI governance maturity (the AI-CAIQ component) and independently verified management system conformance (the ISO 42001 component) [1]. Valid-AI-ted supports Level 2 workflows through a set of features specifically designed to facilitate the interaction between the organization, the ISO 42001 certification auditor, and the STAR Registry.

The Level 2 audit workflow begins when an organization initiates a Level 2 engagement in the Valid-AI-ted portal and provisions Auditor access for their ISO 42001 certification auditor. The auditor receives a scoped view of the organization’s compliance posture — including their AI-CAIQ scores, their AICM implementation record, and the evidence packages associated with the AICM controls mapped to ISO 42001 clauses — which serves as pre-audit documentation. The audit engagement proceeds off-platform according to the auditor’s methodology; upon completion, the auditor uploads the ISO 42001 certification record to the Valid-AI-ted platform through their Auditor access. The platform validates the certification against the expected schema and issuing authority, and upon successful validation, generates the Level 2 submission package containing the AI-CAIQ scores, the ISO 42001 certification record, and the cross-framework mapping report demonstrating alignment between the two. This package is submitted to the STAR Registry for Level 2 designation.

For organizations that intend to use their ISO 42001 certification as the primary evidence base for multiple compliance obligations simultaneously — satisfying not only STAR for AI Level 2 but also contributing to EU AI Act conformity documentation and NIST AI RMF GOVERN function compliance — the Level 2 workflow’s cross-framework mapping report is particularly valuable. It documents precisely which ISO 42001 certification scope elements satisfy which AICM controls, which NIST AI RMF subcategories, and which EU AI Act articles, providing a structured argument for multi-framework reliance on a single third-party certification.


8. Technical Architecture Summary

The Valid-AI-ted Audit Engine is deployed as a cloud-native, multi-tenant SaaS platform on infrastructure managed by the Cloud Security Alliance. The platform’s architecture follows microservices principles, with distinct service components responsible for each major functional area. The Mapping Service maintains the control relationship graph and handles cross-framework translation requests. The Scoring Service executes AI-CAIQ evaluation against scoring rubrics and computes multi-dimensional response quality assessments. The Risk Analysis Service computes RRI scores, resource allocation optimizations, and compliance trajectory projections. The Evidence Service manages the ingestion, storage, metadata extraction, and expiration tracking of evidence artifacts. The Telemetry Integration Service maintains the connection to the AI Risk Observatory, subscribes to the relevant event streams, and routes compliance-relevant events to the continuous evaluation pipeline. The Reporting Service assembles report objects from the outputs of the other services and delivers them through the reporting API and the dashboard.

Each service is containerized and orchestrated through a Kubernetes-based container management layer that supports independent scaling of individual services. The Scoring Service, which processes burst workloads during period-end submission cycles, is configured with aggressive horizontal scaling policies that maintain processing latency below thirty seconds even during peak load. The Telemetry Integration Service is deployed in an active-active configuration across two availability zones within each regional deployment to ensure continuous event processing without single points of failure.

The data persistence layer uses a polyglot storage architecture. The compliance record database — storing organization records, control implementation records, assessment records, and relationship metadata — is implemented on a PostgreSQL-compatible relational database with row-level security enforcement for tenant isolation. The evidence artifact store uses an object storage service with server-side encryption, lifecycle policies for long-term retention, and versioning enabled for all objects. The control relationship graph is stored in a graph database that supports efficient traversal queries for cross-framework mapping operations. The telemetry event stream uses a message queue service with configurable retention to support replay and backfill operations.

The Valid-AI-ted API gateway enforces authentication, authorization, rate limiting, and request routing for all external API interactions. The gateway validates JWT tokens against the organization’s identity provider configuration, enforces role-based access control policies, and logs all requests to the audit log store. The dashboard application is a web-based single-page application served through a content delivery network and communicating exclusively through the public API — the dashboard has no privileged access to internal services, ensuring that its security properties are identical to those of any external API client.

Third-party integrations with GRC platforms — including ServiceNow GRC, Vanta, and Drata — are implemented as bidirectional connectors that use each platform’s published integration APIs. These connectors allow organizations already using a general-purpose GRC platform to supplement it with Valid-AI-ted’s AI-specific analytical capabilities without replacing their existing compliance infrastructure. The connector architecture is plugin-based, and the CSA publishes an integration SDK that enables third-party developers to build additional platform connectors.


9. Framework Alignment

The following table maps Valid-AI-ted engine capabilities to their corresponding requirements across the AICM, ISO 42001, NIST AI RMF, and STAR for AI program. It is intended to allow compliance teams to identify, at a glance, which engine capabilities they should engage when preparing documentation for each framework.

Engine Capability AICM Domain ISO 42001 Clause NIST AI RMF Function / Category STAR for AI Requirement
Automated cross-framework control mapping GRC — Governance, Risk, and Compliance Clause 6.1 (Actions to address risks) GOVERN 1.1 (Policies, processes, procedures) AI-CAIQ GRC Domain: Framework alignment questions
AI-CAIQ automated scoring GRC — Governance, Risk, and Compliance Clause 9.1 (Monitoring and measurement) GOVERN 1.7 (Risk documentation and review) Level 1: Valid-AI-ted scoring requirement
Control gap analysis and remediation roadmap GRC — Governance, Risk, and Compliance Clause 6.1.2 (AI risk treatment) MAP 5.1 (Likelihood and magnitude of impacts) Level 1: Gap remediation evidence
RRI-based control prioritization GRC — Governance, Risk, and Compliance Clause 6.2 (AI objectives and planning) GOVERN 2.2 (Risk tolerances) Level 1: Governance maturity evidence
Resource allocation optimization GRC — Governance, Risk, and Compliance Clause 8.1 (Operational planning) MANAGE 1.2 (Treatment plan execution) Level 1: Resource governance evidence
AI Risk Observatory telemetry integration GRC — Logging and Monitoring Clause 9.1 (Monitoring and measurement) MEASURE 2.6 (Monitoring and feedback) Level 2: Continuous monitoring evidence
Real-time agent behavioral risk assessment Model Security; Identity and Access Management Clause 8.2 (AI risk treatment controls) MEASURE 2.11 (Fairness and bias evaluation) Level 2: Agentic deployment evidence
Framework trigger-based reassessment GRC — Governance, Risk, and Compliance Clause 10.1 (Continual improvement) MANAGE 4.1 (Post-deployment monitoring) Level 2: Reassessment cadence evidence
Evidence collection automation All 18 AICM domains Clauses 7.5 (Documented information) GOVERN 1.2 (Documentation requirements) Level 1 and 2: Evidence artifact submission
ISO 42001 certification integration GRC — Governance, Risk, and Compliance Clause 9.2 (Internal audit) GOVERN 1.4 (Organizational commitment) Level 2: Third-party certification linkage
Audit evidence package generation All 18 AICM domains Clause 9.3 (Management review) MEASURE 2.9 (Evaluation documentation) Level 1 and 2: STAR Registry submission
Privileged record designation GRC — Governance, Risk, and Compliance Clause 7.5.3 (Information control) GOVERN 6.1 (Policies for management) Level 2: Auditor access governance
EU AI Act conformity documentation Compliance — Regulatory Alignment Clause 4.2 (Stakeholder needs) GOVERN 4.1 (Organizational risk policies) Level 2: Regulatory alignment supplement
SOC 2 evidence mapping GRC; Infrastructure Security Clause 8.4 (Use of AI by other organizations) MAP 3.5 (Risk identification) Level 1: Third-party risk evidence
Multi-tenant data isolation Infrastructure Security; Privacy Clause 8.3 (AI system impact assessment) MANAGE 2.4 (Risk controls deployment) Level 1 and 2: Data governance evidence

References

[1] Cloud Security Alliance, “Cloud Security Alliance Announces Availability of STAR for AI Level 2 and Valid-AI-ted for AI,” CSA Press Release, November 20, 2025. https://cloudsecurityalliance.org/press-releases/2025/11/20/cloud-security-alliance-announces-availability-of-star-for-ai-level-2-and-valid-ai-ted-for-ai

[2] Sprinto, “Top 8 Governance, Risk and Compliance (GRC) Tools: Platforms, Features and How to Choose in 2026,” Sprinto Blog, 2026. https://sprinto.com/blog/grc-tools/

[3] Sacra, “Drata Revenue, Valuation and Funding,” Sacra Research, 2025. https://sacra.com/c/drata/

[4] NIST, “Artificial Intelligence Risk Management Framework (AI RMF 1.0),” NIST AI 100-1, National Institute of Standards and Technology, January 2023. https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf

[5] ISO/IEC, “ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system,” International Organization for Standardization, December 2023. https://www.iso.org/standard/42001

[6] Cloud Security Alliance, “Introducing the CSA AI Controls Matrix,” CSA Blog, July 10, 2025. https://cloudsecurityalliance.org/blog/2025/07/10/introducing-the-csa-ai-controls-matrix-a-comprehensive-framework-for-trustworthy-ai

[7] European Commission, “EU AI Act — Conformity Assessment Requirements,” Article 43, Regulation (EU) 2024/1689 of the European Parliament and of the Council, as described in EU AI Act Article 43 reference. https://artificialintelligenceact.eu/article/43/

[8] Cloud Security Alliance, “A Capabilities-Based Risk Assessment for AI,” CSA Blog, October 27, 2025. https://cloudsecurityalliance.org/blog/2025/10/27/calibrating-ai-controls-to-real-risk-the-upcoming-capabilities-based-risk-assessment-cbra-for-ai-systems

[9] Cloud Security Alliance, “Valid-AI-ted: A Step Towards Real-Time Cloud Assurance,” CSA Blog, June 11, 2025. https://cloudsecurityalliance.org/blog/2025/06/11/valid-ai-ted-a-major-step-towards-real-time-cloud-assurance

[10] Cloud Security Alliance, “Cloud Security Alliance Launches CSAI Foundation With Mission of ‘Securing the Agentic Control Plane,’” CSA Press Release, March 23, 2026. https://cloudsecurityalliance.org/press-releases/2026/03/23/csa-securing-the-agentic-control-plane

[11] Zendesk, “Zendesk Sets a New Baseline for AI Transparency: First to Achieve CSA STAR AI Levels 1 and 2 Certification,” Zendesk Blog, 2025. https://www.zendesk.com/blog/zip2-csa-star-ai-levels-1-2-certification/

[12] Cloud Security Alliance, “Announcing the AI Controls Matrix and ISO 42001 Mapping,” CSA Blog, August 20, 2025. https://cloudsecurityalliance.org/blog/2025/08/20/announcing-the-ai-controls-matrix-and-iso-iec-42001-mapping-and-the-roadmap-to-star-for-ai-42001

[13] Cloud Security Alliance, “CSA STAR for AI,” CSA Program Page, 2025–2026. https://cloudsecurityalliance.org/star/ai/

[14] RegScale, “RegScale Achieves CSA STAR Designation as a Valid-AI-ted Solution,” Business Wire, November 3, 2025. https://www.businesswire.com/news/home/20251103449466/en/RegScale-Achieves-CSA-STAR-Designation-as-a-Valid-AI-ted-Solution

[15] ISO/IEC, “ISO/IEC 27001:2022 — Information security, cybersecurity and privacy protection — Information security management systems — Requirements,” International Organization for Standardization, October 2022. https://www.iso.org/standard/27001

[16] AICPA, “2017 Trust Services Criteria (With Revised Points of Focus — 2022),” American Institute of Certified Public Accountants, 2022. https://www.aicpa-cima.com/resources/download/2017-trust-services-criteria-with-revised-points-of-focus-2022

[17] Cloud Security Alliance, “STAR for AI Level 2: AI Security Path,” CSA Blog, November 19, 2025. https://cloudsecurityalliance.org/blog/2025/11/19/understanding-star-for-ai-level-2-a-practical-step-toward-ai-security-compliance