Understanding the Impact of AI Standards
A comprehensive interactive guide to evaluating how AI standards affect innovation and trust, based on the NIST concept paper.
Introduction to AI Standards Evaluation
The Challenge
There is currently a lack of a formal or shared method to measure the impact of AI standardization activities on the goals of innovation and trust. This makes it difficult to assess the effectiveness of AI standards and improve future standardization efforts.
Proposed Solution
The framework adapts successful evaluation methods from other domains to create an analytical approach for assessing AI standards. This provides a systematic way to measure their impact on innovation and trust.
Theory of Change
At the core of the framework is the "theory of change" approach, which helps identify how and why AI standards lead to desired outcomes, what data needs to be collected, and how to measure impact against a counterfactual scenario.
Quick Quiz: Introduction
What is the main challenge in evaluating AI standards?
The Core Framework: Theory of Change
Understanding the Theory of Change
The theory of change provides a structured way to think about how AI standards lead to desired outcomes. It helps answer critical questions about what works, why it works, and for whom it works.
Advantage 1
Helps designers think realistically about what can be achieved
Advantage 2
Identifies what data needs to be collected at each stage
Advantage 3
Emphasizes the explicit identification of the counterfactual
The Counterfactual Concept
The counterfactual represents "what would have happened in the alternative state of the world" without the AI standard. The impact is the difference between outcomes with the standard and outcomes in this counterfactual scenario.
With AI Standard
Improved outcomes (e.g., faster innovation, increased trust)
Without AI Standard (Counterfactual)
Baseline outcomes (e.g., slower innovation, less trust)
Core Evaluation Questions
This question focuses on the "supply side" of AI standards, examining the inputs, activities, and outputs that lead to desired outcomes.
Example:
For a standard on AI terminology, inputs might include expert working groups and research papers. Activities would be the standardization process itself, and outputs would be the published terminology standard.
This examines both "demand and supply" aspects, identifying which parts of the standard worked well and for which stakeholders.
Example:
A standard on bias mitigation might be highly effective for large tech companies with dedicated compliance teams, but less so for smaller organizations without these resources.
This focuses on how evaluation results can inform future refinements to make standards more effective.
Example:
If evaluation shows that a standard is too complex for many implementers, future versions might include simplified implementation guides or toolkits.
Interactive Results Chain / Logic Model
The results chain (or logic model) visually represents how inputs are transformed through activities into outputs, which lead to outcomes and ultimately achieve goals.
Inputs
Resources for standards development
Activities
SDO processes
Outputs
Published standards
Outcomes
Initial adoption results
Goals
Final impacts
Example: Terminology Standard
- Inputs: Expert working groups, research papers
- Activities: Consensus-building, drafting
- Outputs: Published terminology standard
- Outcomes: Reduced communication errors
- Goals: Faster innovation, lower costs
Example: TEVV Standard
- Inputs: Testing methodologies, risk assessments
- Activities: Method validation, metric development
- Outputs: Published testing standards
- Outcomes: Reduced harm, better risk measurement
- Goals: Trustworthy AI systems
Quick Quiz: Framework
What is the purpose of identifying the counterfactual?
Considerations for a Valid Evaluation
Contextual Understanding
To conduct a valid evaluation, it's essential to identify and control for confounding factors and construct empirically distinct comparison groups. This helps isolate the true impact of the AI standard from other influences.
Important: AI standardization often occurs in complex environments with many simultaneous changes. Careful design is needed to attribute outcomes specifically to the standard.
Key Validity Issues
Internal Validity
Establishing a causal relationship when multiple standards or factors might contribute to an innovation.
Challenge: Did the standard actually cause the observed outcomes, or were other factors responsible?
Construct Validity
Ensuring that the measurement accurately reflects the underlying concept of interest.
Example: Measuring "bias reduction" may require different approaches in different contexts.
Selection Bias
Addressing systematic differences between adopters and non-adopters of AI standards.
Example: Early adopters might be more innovative to begin with, skewing results.
External Validity
Understanding whether impacts observed in one context apply to others.
Challenge: A standard effective in healthcare might not work the same in finance.
Methods for Counterfactual Construction
Several statistical methods can be used to construct the counterfactual scenario needed to measure impact:
Before & After
Compare outcomes before and after standard implementation.
Matching Methods
Pair adopters with similar non-adopters for comparison.
Difference in Differences
Track changes over time in both treatment and control groups.
Quick Quiz: Evaluation
Which validity issue concerns whether the measurement matches the concept being studied?
Illustrative Use Cases
The evaluation framework can be applied to various AI standardization scenarios. Below are illustrative examples showing how the framework works in practice.
Education
How AI standards could improve record linking to inform educational decisions, student outcomes, and workforce needs.
Education Use Case
- Standards for student data privacy and security
- Bias measurement in educational AI systems
- Explainability standards for educational recommendations
Criminal Justice
Demonstrating the value of AI standards for combining records to track individuals, provide services, and reduce re-offending rates.
Criminal Justice Use Case
- Standards for fairness in risk assessment tools
- Data quality standards for criminal records
- Validation standards for predictive policing systems
Health & Human Services
Illustrating how AI standards could improve entity resolution in healthcare records for targeted services and cost reduction.
Health Use Case
- Cybersecurity standards for health AI systems
- Preprocessing standards for bias mitigation
- Validation standards for diagnostic AI tools
Food Security
Showing how AI standards could improve entity resolution in programs like SNAP to track eligibility and minimize fraud.
Food Security Use Case
- Standards for eligibility determination algorithms
- Fraud detection system validation
- Data sharing protocols between agencies
Scenario-Based Challenge
Test your understanding by applying the framework to this hypothetical scenario:
Scenario:
A new AI standard has been developed for facial recognition systems to reduce demographic bias. Six months after publication, some companies report improved accuracy across demographics, while others report no change or even decreased performance.
1. What evaluation questions would you ask to understand this mixed adoption?
2. What validity issues might be affecting these results?
3. How would you design a study to measure the true impact of this standard?
Glossary & Resources
Key Terms Glossary
Counterfactual
What would have happened in the alternative state of the world without the AI standard. The impact is the difference between outcomes with the standard and this counterfactual scenario.
Internal Validity
The extent to which a study establishes a trustworthy cause-and-effect relationship between a standard and its outcomes.
Construct Validity
The degree to which a test measures what it claims to be measuring. For AI standards, this concerns whether our measurements truly capture concepts like "bias reduction" or "trustworthiness."
SDO (Standards Development Organization)
An organization responsible for developing, coordinating, revising, amending, reissuing, interpreting, or otherwise maintaining standards.
TEVV (Testing, Evaluation, Verification, and Validation)
Processes and metrics used to assess whether AI systems meet specified requirements and standards.
Additional Resources
NIST Concept Paper: Towards an Approach for Evaluating the Impact of AI Standards
Original source document for this framework
NIST AI Risk Management Framework (AI RMF)
Framework to better manage risks of AI systems
ISO/IEC AI Standards
International standards for AI systems
NSSCET - National Standards Strategy for Critical and Emerging Technology
U.S. strategy for standards in critical technologies
Final Knowledge Check
What does TEVV stand for in the context of AI standards?