Infrastructure

RoCE

2010ActivePublished: 8 May 2026Updated: 8 May 2026Published

Key innovation

Brings RDMA (zero-copy, kernel-bypass) directly over standard Ethernet networks, eliminating the need for dedicated InfiniBand fabric.

How it works

RoCE encapsulates InfiniBand transport-layer messages (Base Transport Header + payload) inside Ethernet frames (RoCE v1) or UDP/IP packets (RoCE v2). The Host Channel Adapter (HCA) implements the entire protocol stack in hardware: the application posts a READ/WRITE/SEND verb and the HCA accesses remote memory without kernel involvement or data copies. Because RoCE is sensitive to packet loss, deployments use Priority Flow Control (PFC) for losslessness and ECN-based congestion signaling (CNP frames in v2).

Problem solved

Conventional TCP/IP-over-Ethernet imposes high latency and CPU overhead on inter-node communication in HPC and AI-training clusters. RoCE solves this by delivering RDMA (zero-copy, kernel-bypass) without requiring a dedicated InfiniBand fabric.

Implementation

Reference implementations

Linux RDMA Subsystem (rdma-core)

C · Linux kernel software RoCE implementation

Official

Implementation pitfalls

PFC-induced deadlocksHigh

Priority Flow Control, required for losslessness, can trigger credit-loop deadlocks in large fabrics.

Fix:Use DCQCN, SRv6 path routing, or adaptive routing; constrain PFC domains.

Packet-loss sensitivityCritical

RoCE v2 over UDP has no built-in reliability; a single drop triggers go-back-N retransmission and dramatic performance loss (incast collapse).

Fix:Lossless ECN/PFC tuning, selective repeat (Reliable RoCE), or Multipath Reliable Connection (MRC).

DCB configuration complexityMedium

Configuring Data Center Bridging (PFC, ETS, DCBX) per switch is considerably more complex than configuring an InfiniBand fabric.

Evolution

2010

RoCE v1 specification published (IBTA Annex A16)

Inflection point

The InfiniBand Trade Association ratifies RoCE v1 as Annex A16 to IBA specification 1.2.1.

2014

RoCE v2 specification published (IBTA Annex A17)

Inflection point

RoCE v2 introduces UDP/IP encapsulation (port 4791), enabling routable RDMA across IP networks and ECN/CNP-based congestion control.

2016

RoCE v2 lands in Linux Kernel 4.5

The mainline Linux kernel adds RoCE v2 support (Mellanox OFED 2.3+), enabling broad data-center deployment.

2020

NVIDIA acquires Mellanox

The acquisition makes RoCE a strategic component of NVIDIA's AI platform (Spectrum, ConnectX, BlueField).

2024

Spectrum-X and Ultra Ethernet Consortium

Inflection point

NVIDIA launches Spectrum-X — an Ethernet platform optimized for RoCE in AI clusters; the Ultra Ethernet Consortium (AMD, Broadcom, Cisco, Meta, Microsoft) forms to design a RoCE successor.

Sources

RDMA over Converged Ethernet — Wikipedia

article

Wikipedia

InfiniBand Architecture Specification Release 1.2.1 Annex A16: RoCE

Documentation

InfiniBand Trade Association

InfiniBand Architecture Specification Release 1.2.1 Annex A17: RoCEv2

Documentation

InfiniBand Trade Association

Revisiting Network Support for RDMA

Paper

arXiv