Why OCI Multiplanar Network Architecture Changes Everything for AI Workloads

 

Copyright: Sanjay Basu

Everyone got it backwards.

For years, the industry believed bigger switches and fatter pipes would solve the networking problem at scale. More bandwidth. Higher port counts. The three-tier architecture that worked for enterprise applications would somehow stretch to accommodate AI workloads requiring tens of thousands of tightly coupled GPUs. It did not. And Oracle, arriving late to the hyperscale cloud game, figured out why.

The answer was not a bigger network. It was multiple networks.

This is not marketing hyperbole. I have spent enough time examining OCI's architectural decisions to recognize when something genuinely breaks from convention. The multiplanar network architecture embedded within Oracle Acceleron represents exactly that kind of break. It challenges assumptions that have governed datacenter networking for two decades. And for organizations serious about running AI workloads at scale, understanding why this matters is not optional.

The Problem Nobody Wanted to Admit

Traditional datacenter networks follow a hierarchical three-tier model: access, distribution, and core. This design served enterprise workloads brilliantly for decades. Clear separation of concerns. Straightforward scalability within reasonable cluster sizes. Predictable failure domains. But AI model training does not behave like traditional enterprise workloads.

Consider what happens during distributed training of a large language model. Gradient synchronization forces all-reduce operations across thousands of GPUs. Every training iteration exchanges hundreds of gigabytes of data. For a 175 billion parameter model using FP16 precision, that is 350 gigabytes per iteration. Multiply by hundreds of iterations per hour. The aggregate bandwidth requirements reach petabytes per hour.

The core layer in a three-tier design becomes the bottleneck. Each hop through a core switch adds 100 to 300 nanoseconds of latency. This seems negligible until you realize that billions of collective communication operations occur during training. Small latency variations translate into GPU idle time. GPUs sitting idle while waiting for network operations represent millions of dollars in wasted compute.

Power consumption compounds the problem. Core switches require high port counts, deep buffers, and sophisticated routing logic. Kilowatts per chassis. When deploying 65,536 H200 GPUs or 131,072 B200 GPUs, the aggregate power overhead of the core tier alone can exceed megawatts. Power that would be better allocated to actual compute.

Then there is reliability. A single misconfiguration in core routing protocols can propagate across the entire fabric. One BGP bug. One flapping optic. Widespread packet loss cascading through a training job that has been running for weeks. The checkpoint and restart cycle that follows wastes hours of multi-million-dollar GPU time.

The industry kept trying to solve this with incremental improvements. Faster switches. Better optics. More sophisticated congestion control. But the fundamental architecture remained unchanged. Oracle asked a different question: what if the architecture itself was wrong?

The Multiplanar Answer

The insight driving OCI's multiplanar design is deceptively simple. Instead of building one massive network fabric and hoping it scales, build multiple completely independent network fabrics operating in parallel. Four planes. Eight planes. Each with its own switches, its own BGP instances, its own failure domains. Coordinated at the edge by intelligent NICs that distribute traffic across all available planes.

Modern high-speed NICs make this possible. The latest 400Gbps and 800Gbps network adapters include integrated four-port Ethernet switches embedded within the NIC silicon itself. This is not virtualization. This is actual switching capability built into the network interface card. A single NIC can simultaneously connect to four entirely separate network planes through four distinct physical uplinks.

From the host operating system perspective, traffic destined for remote nodes gets presented to the NIC as usual. The NIC's internal logic then disaggregates this traffic across the four available planes based on link availability, congestion state, and flow characteristics. On the receiving side, the NIC reassembles packets that arrived via different planes into coherent data streams before presenting them to the host.

Think about what this means. Four parallel networks, each capable of carrying the full traffic load independently, yet coordinated at the edge. The planes share no infrastructure beyond the NIC itself. Separate switches. Separate routing protocols. Separate management domains. Separate failure modes.

Copyright: Sanjay Basu

The Math That Makes It Work

Here is where the architecture becomes genuinely clever. The multiplanar approach typically implements a two-tier rather than three-tier topology within each plane. Leaf switches connect directly to spine switches without an intervening core layer. This seems counterintuitive until you understand the mathematics of network radix.

The maximum endpoints a network can support scales proportionally to the square of the switch radix. If a switch provides radix R, a two-tier folded-CLOS topology can support approximately (R/2) squared endpoints. Modern 64-port switches enable networks of roughly 1,024 endpoints. 128-port switches support approximately 4,096 endpoints.

By eliminating the core tier and distributing functionality across multiple parallel planes, OCI achieves the connectivity of much higher-radix switches using commodity silicon. A four-plane multiplanar design with two-tier topology per plane can achieve the same scale as a traditional three-tier design while using only 64-port switches throughout. Each plane supports 4,096 endpoints. Four planes provide aggregate connectivity for 16,384 GPUs with one fewer switching tier.

The implications ripple through the entire infrastructure. Fewer switches to purchase. Fewer switches to power and cool. Fewer switches to maintain. Lower worst-case latency because every packet traverses at most two switch hops rather than three. The power and cost savings alone justify the architectural complexity.

Fault Isolation Changes Everything

But the defining advantage of multiplanar networks is not cost or latency. It is fault isolation.

In traditional unified network designs, a routing protocol bug or configuration error in core infrastructure can impact traffic across the entire cluster. A spanning-tree misconfiguration might black-hole 25% of inter-rack traffic. A BGP route leak could redirect packets into routing loops. A flapping optic might trigger network-wide reconvergence events.

Multiplanar architectures compartmentalize these failure modes. Each plane operates an entirely independent instance of BGP with separate autonomous system numbers, separate route reflectors, separate routing policies. If a BGP misconfiguration occurs in Plane 1, it affects only Plane 1's routing tables. Traffic flowing through Planes 2, 3, and 4 continues uninterrupted. If a top-of-rack switch fails in Plane 3, NICs simply stop using Plane 3's uplinks and redistribute traffic across the three remaining functional planes.

This changes the operational model entirely. Network operators can perform disruptive maintenance on one plane at a time while the remaining planes carry production traffic. A four-plane design sustains 75% of peak bandwidth while an entire plane undergoes maintenance. For AI training jobs running continuously for weeks, this capability to maintain infrastructure without checkpoint-restart cycles represents significant cost savings.

The fault isolation extends to physical infrastructure. Each plane's switches, optics, and cabling form independent physical paths through the datacenter. A fiber cut affecting one conduit path disables only one plane. Electrical failures, cable jacket degradation, or contamination of optical connectors impact single planes rather than triggering widespread outages.

Oracle Acceleron: The Complete Picture

Multiplanar networking represents one component of a broader architectural rethinking. Oracle Acceleron bundles several innovations that work together to redefine how data moves within OCI. Understanding each component reveals the full scope of what Oracle has built.

The Converged NIC

Oracle's next-generation SmartNICs implement logical partitioning that separates customer plane and provider plane functionality within a single physical NIC. The customer plane handles application traffic: RDMA for GPU communication, TCP/IP for general networking, NVMe-over-TCP for storage access. The provider plane handles OCI control and management traffic: hypervisor communication, telemetry collection, security policy enforcement.

The key word here is immutable. The host NIC and Cloud Control Computer components are physically on the same silicon, but the chip's resources are hard-partitioned and statically assigned at boot time. A minimal, immutable firmware handles the static assignment. Once bootup is complete, no reallocation or modification is possible. The separation is enforced by hardware, not software.

This eliminates the performance penalties and cost overhead of dual-NIC designs while maintaining strong isolation between customer and infrastructure traffic. OCI claims up to 2x throughput improvement versus prior approaches. More importantly, it eliminates an entire class of security concerns that plague shared-NIC architectures.

Copyright: Sanjay Basu

Zero Trust Packet Routing

Security in multi-tenant cloud environments traditionally relies on perimeter firewalls and security groups enforced at network edges. OCI implements something different: Zero Trust Packet Routing, enforced directly at each NIC. Every packet, even those traversing internal OCI networks, undergoes policy evaluation at both source and destination NICs.

This host-level enforcement prevents data exfiltration, limits lateral movement during security breaches, and enables fine-grained microsegmentation without performance penalties. ZPR policies can, for example, prevent bare-metal GPU instances from accessing object storage over the internet, blocking a common data theft vector. Because enforcement occurs in hardware at line rate, ZPR introduces no latency penalty despite providing comprehensive protection.

The policies are intent-driven and topology-independent. They describe what resources can communicate with what other resources, not how the network should route traffic. As networks evolve or configurations change, the policies remain valid. This decoupling of security intent from network implementation represents a significant architectural advance.

Dedicated Fabric Networks

Rather than sharing network infrastructure across diverse workload types, Acceleron provisions dedicated fabrics optimized for specific use cases. Exadata deployments receive fabrics tuned for database I/O patterns with latency-optimized congestion control. GPU superclusters operate on fabrics configured for large collective operations with throughput-maximizing parameters. HPC workloads access fabrics balanced for both latency and bandwidth.

This workload-specific optimization allows OCI to provide consistent, predictable performance rather than the noisy neighbor effects common in shared network infrastructures. The DC-QCN congestion control profiles differ between workload types. What works for database OLTP transactions differs from what works for GPU gradient synchronization. Oracle tunes each fabric appropriately.

The Performance Numbers That Matter

OCI cluster networks with multiplanar architecture deliver latencies that directly compete with InfiniBand, the traditional gold standard for HPC networking. For small clusters, latency can be as little as 2 microseconds round-trip time. For larger clusters spanning hundreds of racks, typical latencies remain under 4 microseconds. Even extreme superclusters with tens of thousands of GPUs maintain latencies in the 4 to 8 microsecond range.

These are Ethernet-based numbers. OCI achieves InfiniBand-competitive performance using RoCE (RDMA over Converged Ethernet), gaining the cost, flexibility, and ecosystem advantages of Ethernet without sacrificing ultra-low latency. Recent MLPerf benchmarks show nearly identical performance: in BERT-Large training, Ethernet actually outperformed InfiniBand slightly. For LLAMA2-70B inference, InfiniBand held only a 1.66% advantage.

The multiplanar architecture contributes to these results in several ways. Eliminating the core tier removes 100 to 300 nanoseconds of switching latency per packet. Multiple parallel paths reduce queue buildup at individual ports. Traffic automatically flows through less-congested planes, minimizing queuing delays. Fault isolation ensures that problems in one plane do not trigger reconvergence events that temporarily disrupt traffic on other planes.

Storage I/O benefits dramatically. OCI reports up to 3x improvement in storage IOPS compared to previous designs. NVMe-over-TCP with hardware offload bypasses kernel networking stacks. Line-rate encryption protects data in transit without CPU overhead. The multiple paths ensure storage traffic does not compete excessively with compute traffic.

Zettascale10: Where It All Comes Together

Oracle's largest deployment, the Zettascale10 supercomputer in Abilene, Texas, demonstrates what happens when you push these architectural principles to their logical extreme. This system connects hundreds of thousands of NVIDIA GPUs across multiple data centers to form multi-gigawatt clusters delivering up to 16 zettaFLOPS of peak performance. It serves as the foundation for OpenAI's Stargate project.

The numbers are staggering. Up to 800,000 NVIDIA GPUs. Multi-gigawatt power consumption. Clusters spanning multiple data halls within buildings or even multiple adjoining buildings. Cable distances between GPUs can be substantial, which traditionally would introduce unacceptable latency variation.

The multiplanar RoCE design makes this possible. Peter Hoeschele, VP of Infrastructure at OpenAI, described it directly: the highly scalable custom RoCE design maximizes fabric-wide performance at gigawatt scale while keeping most of the power focused on compute. The network architecture enables the scale while contributing minimally to the total power budget.

These clusters are housed in gigawatt data center campuses engineered for extreme density within a two-kilometer radius. This architectural constraint minimizes GPU-to-GPU latency, critical for large-scale AI model training where synchronization delays compound across billions of operations.

Why This Matters for Enterprise AI

The economic implications extend beyond raw performance. Consider the total cost of ownership for a 16,384-GPU training cluster over three years. Capital costs approach $450 million. Operational costs for power, cooling, staffing, and maintenance add another $300 million. Total TCO approaches $750 million.

If network reliability issues cause the cluster to experience 99.0% uptime rather than 99.9% uptime, the additional 2.9 days of annual unavailability cost approximately $7 million per year in lost GPU productivity. Over three years, that is $21 million. Multiplanar architecture's superior fault isolation and graceful degradation characteristics measurably improve uptime.

The ability to perform non-disruptive maintenance further reduces TCO. Traditional architectures require scheduling maintenance windows when the cluster sits idle, wasting expensive GPU resources during upgrades. Multiplanar rolling maintenance allows continuous operation during infrastructure updates, improving effective utilization from perhaps 92% to 97%. For a $750 million investment, this 5-percentage-point utilization improvement represents $37.5 million in additional value delivered.

Oracle provides these capabilities at no additional cost to OCI customers. The architecture's superior efficiency, reliability, and operational characteristics justify the investment in development and deployment even without premium pricing.

The Contrarian View Worth Considering

I am not naive about Oracle's marketing incentives. Every cloud provider claims architectural superiority. But the multiplanar design represents a genuinely different approach that addresses fundamental constraints rather than incrementally improving within existing paradigms.

The industry's direction supports this view. The Ultra Ethernet Consortium, with founding members including AMD, Arista, Broadcom, Cisco, Intel, Meta, and Microsoft, is standardizing many of the mechanisms that enable multiplanar architectures. The UEC 1.0 specification addresses modern RDMA over Ethernet, intelligent transport protocols with microsecond-level congestion control, and end-to-end scalability. The proprietary aspects of OCI's approach are evolving toward industry standards.

Oracle's latecomer position in cloud infrastructure may have been an advantage here. Without legacy architectures to maintain backward compatibility with, OCI engineers could design specifically for the workloads that matter now: AI training at scale, real-time inference, high-performance databases. The result is an architecture that starts from first principles rather than patching over decades of accumulated decisions.

The Questions That Remain

As AI workloads continue expanding in scale and sophistication, networking infrastructure must evolve correspondingly. Traditional three-tier architectures encounter fundamental scalability limits. Single-plane designs cannot provide the fault isolation essential for multi-week training runs costing tens of millions of dollars. Multiplanar networking offers a scalable path forward.

But questions remain. Will the operational complexity of managing multiple independent network planes prove manageable at enterprise scale? Will the standardization efforts around Ultra Ethernet create genuine multi-vendor interoperability, or will proprietary extensions fragment the market? Will the cost advantages of Ethernet over InfiniBand persist as AI infrastructure demands continue intensifying?

The answers will emerge over the next several years as organizations push these architectures to their limits. What seems clear already is that the conventional wisdom about datacenter networking has changed. The network that connects accelerators may prove as critical as the accelerators themselves.

Oracle bet on that insight. For organizations building AI capabilities at scale, understanding that bet is no longer optional.

1. Oracle Acceleron Overview: https://www.oracle.com/cloud/networking/acceleron/
2. Oracle AI World Announcement, October 2025: Oracle Introduces New Cloud Networking Capabilities
3. First Principles: Oracle Acceleron Multiplanar Network Architecture (Video)
4. OCI Zettascale10 Announcement, October 2025
5. NVIDIA Technical Blog: OCI Accelerates HPC, AI, and Database Using RoCE and ConnectX
6. Ultra Ethernet Consortium Specification 1.0, June 2025
7. First Principles: Oracle Acceleron Converged NIC for Next-Gen AI Workloads.

Comments

Popular posts from this blog

Digital Selfhood

Axiomatic Thinking

How MSPs Can Deliver IT-as-a-Service with Better Governance