A Model Card is a structured document accompanying a model, containing: intended use description, training data, performance metrics broken down by demographic groups, known limitations and risks, and usage recommendations.
ML models were deployed without transparent documentation of their capabilities, limitations, and potential harms. Model Cards standardize model reporting for responsible AI.
Basic information about the model: who developed it, version, model type, training algorithm, parameters, and links to further resources.
Intended use cases, target users, and explicitly out-of-scope applications.
Relevant factors affecting model performance: demographic groups, environmental conditions, instrumentation attributes. Distinguishes relevant factors from evaluation factors.
Performance metrics and decision thresholds, with justification of their selection for the model type and use case.
Description of evaluation datasets: source, composition, and representativeness for intended use cases.
General description of training data provenance and statistical composition, accounting for privacy constraints.
Disaggregated evaluation results per factor, covering both unitary and intersectional analyses.
Identification of sensitive data used, implications for human life, risks and harms, and mitigation measures.
Additional concerns and recommendations not covered in previous sections: limitations, known gaps, and user guidance.
Optional YAML front-matter at the top of the README.md file (Hugging Face Hub implementation): license, language, pipeline_tag, base_model, datasets, eval_results.
Official
Model card authors often report results only per single factor, omitting intersectional analysis (e.g., older women vs. young men) that reveals biases invisible in unitary analysis.
Ethical considerations section is often filled with generic statements rather than specific identification of risks, sensitive data, and mitigation measures.
Model cards often report only aggregate metrics without breakdown by demographic subgroups, masking differential model performance across populations.
Model card is created once at initial deployment and not updated with subsequent versions or fine-tunes, leading to discrepancies between documentation and actual model.
Mitchell et al. published the Model Cards for Model Reporting preprint on arXiv on October 5, 2018, proposing standardized model documentation with disaggregated evaluation across demographic groups.
Paper published at ACM FAT* 2019, formally introducing Model Cards into the research and industry discourse.
Google released the open-source Model Card Toolkit, a Python library automating model card generation from TensorFlow Model Analysis evaluation data.
Hugging Face published the Model Card Guidebook with an updated template integrated into the huggingface_hub library, a Model Card Creator Tool, and a landscape analysis of ML documentation.
Number and specificity of demographic, environmental, and instrumental factors included in evaluation.
Whether reported results cover only unitary (per-factor) or also intersectional (combined-factor) analysis.
Scope of disclosed information about evaluation datasets.
Scope of disclosed information about training data, often constrained by privacy or IP restrictions.
Model Cards are a documentation pattern — text files (Markdown/YAML) independent of hardware. They can describe models running on any hardware.