Segment Routing over IPv6

Source routing native to IPv6 — the ingress node encodes a list of processing instructions (segments) as IPv6 addresses inside a new Segment Routing Header, removing the need for a separate MPLS control plane.

Common pitfalls

SRH header size

MEDIUM

Each SID is 128 bits; long segment lists inflate overhead and may fragment packets or exceed MTU.

Use Compressed SRv6 (uSID) — packing multiple SIDs into a single 128-bit address.

Hardware / linecard support

HIGH

Not all linecards process full SRv6 Network Programming in hardware — older ASICs may only handle the SRH in slow path.

Source-routing security

HIGH

Source routing has been an attack vector historically (RFC 5095 deprecates IPv6 RH0); SRv6 requires careful edge policy to drop untrusted SRHs from outside the SR domain.

Reference implementations

Linux SRv6 (kernel + iproute2)official

FRRouting (FRR) SRv6official

VPP (Vector Packet Processing) SRv6official

GENESIS · Source paper

IPv6 Segment Routing Header (SRH) — RFC 8754

2020IETFClarence Filsfils, Darren Dukes, Stefano Previdi et al.

2013

First Segment Routing drafts in IETF

Cisco (Clarence Filsfils et al.) publishes the initial drafts introducing the Segment Routing concept in the SPRING working group.

2017

RFC 8402 — Segment Routing Architecture

breakthrough

The IETF standardizes the general Segment Routing architecture (SR-MPLS and SRv6) in RFC 8402.

2020

RFC 8754 — IPv6 Segment Routing Header

breakthrough

The SRH is standardized — the SRv6 data plane becomes an official IETF standard.

2021

RFC 8986 — SRv6 Network Programming

breakthrough

SID behaviors (END, END.X, END.DT4, END.DT6, etc.) are defined — the formal specification of SRv6 Network Programming.

2024

SRv6 in AI scale-out fabrics

Hyperscalers (Microsoft Azure, Alibaba HPN) report SRv6 deployments for routing RoCE traffic in AI-training clusters.

Hardware agnosticPRIMARY

SRv6 is an IPv6 layer-3 protocol; it runs on any IPv6 forwarder, although full Network Programming requires SR-aware hardware.

Commonly used with

RoCE

RDMA over Converged Ethernet (RoCE) is a family of network protocols standardized by the InfiniBand Trade Association (IBTA) that bring RDMA semantics — remote memory access bypassing the host CPU networking stack — onto Ethernet. Three variants exist: RoCE v1 operates as an Ethernet link-layer protocol (Ethertype 0x8915) confined to a single broadcast domain; the experimental RoCE v1.5 runs over IP; RoCE v2 encapsulates packets inside UDP/IP (port 4791) and is routable across IPv4/IPv6 networks. To approach InfiniBand-class performance, RoCE typically requires a lossless Ethernet fabric configured with Priority Flow Control (PFC) and Data Center Bridging (DCB); RoCE v2 additionally defines an ECN-based congestion-control mechanism using CNP frames. RoCE is today the dominant interconnect for GPU clusters in large-scale AI training, with end-to-end latencies as low as 1.3 µs on modern host-channel adapters.

GO TO CONCEPT

MRC

Multipath Reliable Connection (MRC) is a network protocol designed for training frontier AI models on supercomputer clusters with more than 100,000 GPUs. It extends the RDMA over Converged Ethernet (RoCE) standard from the InfiniBand Trade Association and builds on techniques from the Ultra Ethernet Consortium (UEC), adding SRv6 source routing on top. MRC has been deployed across all of OpenAI's largest NVIDIA GB200 supercomputers, including the Stargate site operated with Oracle Cloud Infrastructure in Abilene, Texas, and in Microsoft Fairwater supercomputers. The specification was published on May 5, 2026 as an Open Compute Project (OCP) contribution and is publicly available. MRC addresses three problems of large-scale synchronous training: it enables two-tier multi-plane networks connecting 131,000 GPUs instead of conventional three- or four-tier designs, virtually eliminates core network congestion via adaptive packet spraying, and routes around failures on a microsecond timescale using static source routing instead of dynamic BGP.

GO TO CONCEPT

InfiniBand (IB) is a networking standard maintained by the InfiniBand Trade Association (IBTA, founded 1999), in which hosts connect to the fabric via Host Channel Adapters (HCAs) and peripherals via Target Channel Adapters (TCAs). Its switched-fabric topology, credit-based link-level flow control, and native RDMA deliver microsecond latencies (1.3 µs at QDR, <0.6 µs at HDR) and full line-rate without packet loss. Successive bandwidth generations are: SDR (8 Gbit/s 4×, 2001), DDR (16, 2005), QDR (32, 2007), FDR (54.54, 2011), EDR (100, 2014), HDR (200, 2018), NDR (400, 2022), and XDR (800, 2024). InfiniBand supports five message types — RDMA read/write, channel send/receive, transactional operations, multicast, and atomics. The Linux kernel has supported IB since 2.6.11 (2005) via OpenFabrics Enterprise Distribution (OFED) and the so-called verbs API. After 2014, IB briefly led the TOP500 interconnect ranking, but Ethernet/RoCE later reclaimed market share. In 2019 NVIDIA acquired Mellanox — the last independent vendor — and today IB is the primary scale-out fabric of NVIDIA's AI platforms (Quantum-2, Quantum-X800), used for LLM training in conjunction with NVLink/NVSwitch.

GO TO CONCEPT

Title	Publisher	Type
Segment routing — Wikipedia	Wikipedia	article
RFC 8402 — Segment Routing Architecture	IETF	documentation
RFC 8754 — IPv6 Segment Routing Header (SRH)	IETF	documentation
RFC 8986 — SRv6 Network Programming	IETF	documentation
Segment Routing: A Comprehensive Survey	arXiv / IEEE Communications Surveys & Tutorials	scientific article