Article 10 — Data and Data Governance
Article 10 establishes requirements for training, validation, and testing datasets used in high-risk AI systems. It mandates documented data governance practices covering collection, quality, representativeness, and bias examination.
Art. 10(1)
Training, validation and testing datasets must meet quality criteria
DATA_GOVERNANCE is a gate requirement at the IMPLEMENTATION exit. You cannot proceed to testing without documenting your data governance practices, including quality metrics and data source inventory.
Art. 10(2)(a)
Data collection processes, origin, and original purpose for personal data
The Data Source Inventory section requires collection methodology, source origin, and original purpose. The template includes specific fields for each data source documenting provenance and legal basis.
Art. 10(2)(b)
Data preparation operations (annotation, labelling, cleaning, enrichment)
Separate template sections cover data preparation methodology (cleaning, normalization, feature engineering) and the complete data processing pipeline from raw input to model-ready format.
Art. 10(2)(f-g)
Examine data for biases; identify data gaps or shortcomings
The representativeness assessment in DATA_GOVERNANCE identifies demographic gaps. BIAS_ASSESSMENT then tests for actual bias using fairness metrics. Both are gate requirements — neither can be skipped.
Art. 10(3)
Datasets must be relevant, sufficiently representative, and free of errors
Quality metrics and representativeness analysis are required template fields. The assessment must document known gaps and what populations or use cases are underrepresented.
Art. 10(4)
Account for characteristics specific to geographic, contextual, behavioral setting
The representativeness section requires geographic and demographic analysis of data coverage, ensuring the system is validated for its intended deployment context.
Art. 10(5)
Processing of special categories of personal data (Art. 9 GDPR) only with strict conditions
Consent basis and PII handling sections require explicit documentation of any special category data processed, the legal basis for processing, and data minimization measures applied.