RoCE encapsulates InfiniBand transport-layer messages (Base Transport Header + payload) inside Ethernet frames (RoCE v1) or UDP/IP packets (RoCE v2). The Host Channel Adapter (HCA) implements the entire protocol stack in hardware: the application posts a READ/WRITE/SEND verb and the HCA accesses remote memory without kernel involvement or data copies. Because RoCE is sensitive to packet loss, deployments use Priority Flow Control (PFC) for losslessness and ECN-based congestion signaling (CNP frames in v2).
Conventional TCP/IP-over-Ethernet imposes high latency and CPU overhead on inter-node communication in HPC and AI-training clusters. RoCE solves this by delivering RDMA (zero-copy, kernel-bypass) without requiring a dedicated InfiniBand fabric.
Priority Flow Control, required for losslessness, can trigger credit-loop deadlocks in large fabrics.
RoCE v2 over UDP has no built-in reliability; a single drop triggers go-back-N retransmission and dramatic performance loss (incast collapse).
Configuring Data Center Bridging (PFC, ETS, DCBX) per switch is considerably more complex than configuring an InfiniBand fabric.
The InfiniBand Trade Association ratifies RoCE v1 as Annex A16 to IBA specification 1.2.1.
RoCE v2 introduces UDP/IP encapsulation (port 4791), enabling routable RDMA across IP networks and ECN/CNP-based congestion control.
The mainline Linux kernel adds RoCE v2 support (Mellanox OFED 2.3+), enabling broad data-center deployment.
The acquisition makes RoCE a strategic component of NVIDIA's AI platform (Spectrum, ConnectX, BlueField).
NVIDIA launches Spectrum-X — an Ethernet platform optimized for RoCE in AI clusters; the Ultra Ethernet Consortium (AMD, Broadcom, Cisco, Meta, Microsoft) forms to design a RoCE successor.
RoCE is the standard scale-out fabric for GPU clusters (NVIDIA ConnectX/BlueField, Spectrum-X) used in LLM training.
RoCE requires an RDMA-capable NIC (HCA) but is agnostic to the CPU/GPU/accelerator above it.