Name: Amazon SageMaker AI
Brand: Amazon

Description

Amazon SageMaker AI is a fully managed MLOps and generative AI platform from Amazon Web Services that covers the complete machine-learning lifecycle — from data preparation and experimentation through training, deployment, monitoring, and pipeline automation. The platform is an integral part of the Amazon SageMaker family, which also includes SageMaker Unified Studio, SageMaker Lakehouse, and SageMaker Catalog.

Key components

SageMaker Studio is a browser-based integrated IDE that unifies notebooks, experiments, debugger, profiler, and model management in a single interface. SageMaker JumpStart provides a catalog of ready-to-deploy foundation models — including Llama, Mistral, DeepSeek, and Stable Diffusion families — with one-click deployment and fine-tuning without writing any infrastructure code. SageMaker Pipelines is a native ML pipeline orchestrator with CI/CD integration, artifact versioning, and lineage tracking. Model Registry enables model version management with approval workflows before production deployment.

Training and fine-tuning

SageMaker supports distributed training on GPU/Trainium clusters with automatic model and data parallelism (SageMaker Distributed Training). Built-in algorithms and support for TensorFlow, PyTorch, MXNet, and scikit-learn allow training jobs to run on managed infrastructure without server configuration. SageMaker Clarify detects bias in training data and explains model predictions using SHAP values.

Model deployment and serving

The platform offers four hosting modes: real-time endpoints (low latency), serverless inference (no infrastructure management), asynchronous inference (large payloads), and batch transform (offline processing on large datasets). Auto-scaling and VPC deployment provide network isolation and cost flexibility.

Feature Store and data management

SageMaker Feature Store provides centralized ML feature storage with support for online serving (low-latency inference access), offline storage (historical training data), and streaming ingestion. Data Wrangler enables visual data preparation and transformation from over 40 sources — including Amazon S3, Redshift, Athena, and AWS Glue — without writing code.

Security and compliance

SageMaker AI holds FedRAMP High, FedRAMP Moderate, HIPAA, SOC 2 Type II, PCI DSS, GDPR, and DoD Impact Level 5 certifications. The platform supports VPC isolation, encryption at rest and in transit, identity management via AWS IAM Identity Center (with SAML 2.0, OIDC, Okta, and Microsoft Entra ID federation), and full audit logging in AWS CloudTrail. Resources can be scoped per project and per user with granular cost controls and budget alerts.

Pricing

SageMaker AI uses a pay-as-you-go model: charges are based on training instance runtime and endpoint uptime (per second), data processed, and optionally provisioned throughput for JumpStart foundation models. Per-project and per-user cost limits are available with budget alerts. The platform offers Standard and Enterprise 24/7 support tiers with a 99.9% SLA.

MLOps Lifecycle

13/17 supported

Model Registry

Versioning — model artifact versioning

Approval workflows — approval workflow before production

Immutable artifacts — immutability of stored versions

Lineage tracking — tracking data and model relationships

4 / 4 supported · none unsupported

Feature Store

Online serving — real-time feature serving

Offline storage — feature storage for training

Streaming ingestion — streaming ingestion (Kafka, Flink)

3 / 3 supported · none unsupported

Prompt Management

Prompt registry — central prompt repository

Versioning — prompt versioning and history

Testing frameworks — A/B testing and prompt evaluation

0 / 3 supported · 3 unsupported hidden

Monitoring

Data drift detection — input data drift detection

Concept drift detection — concept drift detection

Hallucination monitoring — LLM hallucination monitoring

Bias evaluation tools — bias evaluation tooling

3 / 4 supported · 1 unsupported hidden

Human-in-the-Loop

Labeling services — data labeling tools

RLHF workflows — reinforcement learning from human feedback

Manual override — manual override of model decisions

3 / 3 supported · none unsupported

Data & Knowledge

Applications

6

Architecture & Mechanisms

6

Security

Developer Ecosystem

SDK Languages

PyPythonJSJavaScriptTSTypeScriptGoGo

API Type

REST

Community & resources

Templates library

Quickstarts

API Reference

Tutorials

Pricing & Business Model

See full pricing

Pricing models

Usage-based

Provisioned throughput

Resource quotas

Per project

Per user

Cost alerting

SLA & Support

99.9%uptime SLA

StandardEnterprise 24/7

Supported AI Models

4

Description

MLOps LifecycleiMLOps LifecycleFull model lifecycle: registry, feature store, prompt management, monitoring and human-in-the-loop.

Model Registry

Feature Store

Prompt Management

Monitoring

Human-in-the-Loop

Data & KnowledgeiData & Knowledge ManagementData connectors, vector database integration, native vector search and data management (PII, provenance, synthetic data).

ApplicationsiAI ApplicationsDomains and use cases this platform is best suited for — from RAG and fine-tuning to scientific research.

Architecture & MechanismsiArchitecture & MechanismsArchitectural foundations and modern AI processing methods that are natively supported or used by this platform.

SecurityiEnterprise SecurityCertifications, access controls and data-protection features essential for corporate deployments and cloud privacy compliance.

Developer EcosystemiDeveloper EcosystemDeveloper resources: available SDKs, supported programming languages, and infrastructure features and model-deployment methods.

Pricing & Business ModeliPricing & Business ModelBilling models (usage-based, provisioned throughput), resource limits and SLA parameters (uptime, support tiers).