Robots Atlas>ROBOTS ATLAS

Segment Routing over IPv6

Source routing native to IPv6 β€” the ingress node encodes a list of processing instructions (segments) as IPv6 addresses inside a new Segment Routing Header, removing the need for a separate MPLS control plane.

Category
Abstraction level
Operation level
Traffic engineering in hyperscaler fabrics (Microsoft, Meta, Alibaba)Multipath for RoCE/RDMA in AI clustersNetwork slicing in 5G networksL3VPN / EVPN without MPLSService chaining (Network Function Virtualization)

The ingress (SR source) inserts an SRH containing a list of IPv6 SIDs into the packet. The "Segments Left" field points to the current segment to process β€” its address is the active IPv6 destination. Each SR-aware node, when the destination matches one of its local SIDs, decrements Segments Left, copies the next SID into the destination address, and executes the function encoded in the SID (END, END.X, END.DT4/DT6 for VPN, END.B6 for binding, etc.). Non-SRv6-aware routers simply forward the packet via standard IPv6 longest-prefix match because a SID is just an IPv6 address.

Traditional MPLS-TE requires a complex control plane (LDP, RSVP-TE) and per-flow state in the network core. SRv6 removes both by placing path information inside the packet header and using IPv6 as the data plane.

Common pitfalls

SRH header size
MEDIUM

Each SID is 128 bits; long segment lists inflate overhead and may fragment packets or exceed MTU.

Use Compressed SRv6 (uSID) β€” packing multiple SIDs into a single 128-bit address.

Hardware / linecard support
HIGH

Not all linecards process full SRv6 Network Programming in hardware β€” older ASICs may only handle the SRH in slow path.

Source-routing security
HIGH

Source routing has been an attack vector historically (RFC 5095 deprecates IPv6 RH0); SRv6 requires careful edge policy to drop untrusted SRHs from outside the SR domain.

GENESIS Β· Source paper

IPv6 Segment Routing Header (SRH) β€” RFC 8754
2020IETFClarence Filsfils, Darren Dukes, Stefano Previdi et al.
2013

First Segment Routing drafts in IETF

Cisco (Clarence Filsfils et al.) publishes the initial drafts introducing the Segment Routing concept in the SPRING working group.

2017

RFC 8402 β€” Segment Routing Architecture

breakthrough

The IETF standardizes the general Segment Routing architecture (SR-MPLS and SRv6) in RFC 8402.

2020

RFC 8754 β€” IPv6 Segment Routing Header

breakthrough

The SRH is standardized β€” the SRv6 data plane becomes an official IETF standard.

2021

RFC 8986 β€” SRv6 Network Programming

breakthrough

SID behaviors (END, END.X, END.DT4, END.DT6, etc.) are defined β€” the formal specification of SRv6 Network Programming.

2024

SRv6 in AI scale-out fabrics

Hyperscalers (Microsoft Azure, Alibaba HPN) report SRv6 deployments for routing RoCE traffic in AI-training clusters.

Hardware agnosticPRIMARY

SRv6 is an IPv6 layer-3 protocol; it runs on any IPv6 forwarder, although full Network Programming requires SR-aware hardware.

Commonly used with

RoCE

RDMA over Converged Ethernet (RoCE) is a family of network protocols standardized by the InfiniBand Trade Association (IBTA) that bring RDMA semantics β€” remote memory access bypassing the host CPU networking stack β€” onto Ethernet. Three variants exist: RoCE v1 operates as an Ethernet link-layer protocol (Ethertype 0x8915) confined to a single broadcast domain; the experimental RoCE v1.5 runs over IP; RoCE v2 encapsulates packets inside UDP/IP (port 4791) and is routable across IPv4/IPv6 networks. To approach InfiniBand-class performance, RoCE typically requires a lossless Ethernet fabric configured with Priority Flow Control (PFC) and Data Center Bridging (DCB); RoCE v2 additionally defines an ECN-based congestion-control mechanism using CNP frames. RoCE is today the dominant interconnect for GPU clusters in large-scale AI training, with end-to-end latencies as low as 1.3 Β΅s on modern host-channel adapters.

GO TO CONCEPT
MRC

Multipath Reliable Connection (MRC) is a network protocol designed for training frontier AI models on supercomputer clusters with more than 100,000 GPUs. It extends the RDMA over Converged Ethernet (RoCE) standard from the InfiniBand Trade Association and builds on techniques from the Ultra Ethernet Consortium (UEC), adding SRv6 source routing on top. MRC has been deployed across all of OpenAI's largest NVIDIA GB200 supercomputers, including the Stargate site operated with Oracle Cloud Infrastructure in Abilene, Texas, and in Microsoft Fairwater supercomputers. The specification was published on May 5, 2026 as an Open Compute Project (OCP) contribution and is publicly available. MRC addresses three problems of large-scale synchronous training: it enables two-tier multi-plane networks connecting 131,000 GPUs instead of conventional three- or four-tier designs, virtually eliminates core network congestion via adaptive packet spraying, and routes around failures on a microsecond timescale using static source routing instead of dynamic BGP.

GO TO CONCEPT
IB

InfiniBand (IB) is a networking standard maintained by the InfiniBand Trade Association (IBTA, founded 1999), in which hosts connect to the fabric via Host Channel Adapters (HCAs) and peripherals via Target Channel Adapters (TCAs). Its switched-fabric topology, credit-based link-level flow control, and native RDMA deliver microsecond latencies (1.3 Β΅s at QDR, <0.6 Β΅s at HDR) and full line-rate without packet loss. Successive bandwidth generations are: SDR (8 Gbit/s 4Γ—, 2001), DDR (16, 2005), QDR (32, 2007), FDR (54.54, 2011), EDR (100, 2014), HDR (200, 2018), NDR (400, 2022), and XDR (800, 2024). InfiniBand supports five message types β€” RDMA read/write, channel send/receive, transactional operations, multicast, and atomics. The Linux kernel has supported IB since 2.6.11 (2005) via OpenFabrics Enterprise Distribution (OFED) and the so-called verbs API. After 2014, IB briefly led the TOP500 interconnect ranking, but Ethernet/RoCE later reclaimed market share. In 2019 NVIDIA acquired Mellanox β€” the last independent vendor β€” and today IB is the primary scale-out fabric of NVIDIA's AI platforms (Quantum-2, Quantum-X800), used for LLM training in conjunction with NVLink/NVSwitch.

GO TO CONCEPT