AI Accelerator · serves as: AI acceleration, AI Inference, Compute, High-level compute.
Which group NVIDIA H100 belongs to and how it is built
Compute Modules is a subcategory of hardware components that provide processing power for robotic systems. It encompasses onboard computers, single-board computers (SBCs), AI accelerators, embedded processors, GPU/NPU compute modules, and other units responsible for processing sensor data and executing control logic. These modules form the foundation of modern autonomous, humanoid, and perception-capable robots.
An AI Accelerator is a specialized hardware component designed for efficient execution of artificial intelligence computations, particularly neural network inference, computer vision processing, and sensor data analysis. In robotics, AI accelerators are used to run perception models, object recognition, image segmentation, planning, and other tasks that require high computational throughput under constrained power budgets. They may take the form of dedicated NPU, TPU, VPU, or GPU chips, or specialized embedded modules.
A data-center AI accelerator card is a design class describing the construction of high-performance compute processors (GPUs/accelerators) intended for mounting in data-center servers. It is characterised by: an SXM form factor (a module soldered onto an HGX/DGX baseboard) or a dual-slot PCIe card; high-bandwidth memory (HBM2e/HBM3/HBM3e) integrated on-package; dedicated GPU-to-GPU interconnects (NVLink, Infinity Fabric) with hundreds of GB/s of bandwidth; high TDP (350–1000 W) requiring air or liquid cooling; support for virtualisation/partitioning (MIG) and low-precision compute formats (FP8/FP16/BF16/INT8). The class includes designs such as NVIDIA H100/H200/A100, AMD Instinct MI300, Google TPU, and Intel Gaudi. It describes physical construction and configuration, not the functional role (which is given by the component type "AI Accelerator").
NVIDIA H100 is the flagship data-center AI accelerator of the Hopper generation, unveiled in March 2022 and available commercially from Q3 2022. It is based on the GH100 chip manufactured on TSMC 4N and contains ~80 billion transistors on an 814 mm² die. The SXM5 variant has a 700 W TDP, the PCIe Gen5 variant 350 W. A single unit delivers up to 1,979 TFLOPS in FP16/BF16 and 3,958 TFLOPS in FP8 (with sparsity), and in HPC mode: 67 TFLOPS FP32 and 34 TFLOPS FP64.
Key H100 innovations are the 4th-generation Tensor Cores with FP8 (E4M3/E5M2), a dedicated Transformer Engine that adaptively scales FP8/FP16 across attention layers, NVLink 4 at 900 GB/s for multi-GPU links, and NVSwitch providing all-to-all topology in 8×H100 nodes. The 80 GB HBM3 memory offers 3.35 TB/s of bandwidth — 1.5× more than A100. The H100 NVL variant fuses two dies into one module (188 GB memory, 7.8 TB/s), optimised for inference of 70B+ LLMs.
H100 is the standard unit in generative-model training centres — used by OpenAI, Anthropic, Meta, Microsoft Azure, AWS, Google Cloud (as a GPU partner), and CoreWeave, among others. Most large 2023–2024 LLMs (GPT-4, Claude 3, Llama 3, Mixtral, DeepSeek-V3) were trained on 1,000–25,000 H100 clusters. The architectural successor is H200 (141 GB HBM3e); the next generation is Blackwell (B100/B200/GB200, 2024).