Deploying PCI Express as a Fabric

PCI Express (PCIe) has been deployed throughout nearly every market since the first version was introduced in 2003. The specification has progressed from a per-lane data transfer rate of 2 Gb/sec, to the current Gen3 per lane transfer rate of 8 Gb/sec, and can be aggregated to allow a bidirectional data transfer rate as high as 256 Gb/sec.

Until recently, the success of PCIe has been primarily as a fan-out interconnect, enabling CPU, I/O, and storage devices – all of which have PCIe access points – to communicate. There has been penetration into more sophisticated applications, such as host failover, and the PCIe interconnect standard has even been used as a backplane to connect PCIe-based subsystems. But given the performance of PCIe at Gen3, and its widespread adoption on devices, the popular interconnect has become an attractive alternative to current solutions as a fabric for data center and cloud computing applications.

PLX has extended the reach of PCIe for use as a fabric for data center and cloud computing through its ExpressFabric® initiative. By building on the natural strengths of PCIe – it’s everywhere, it’s fast, it’s low power, it’s affordable – and by adding some straightforward, standards-compliant extensions that address multi-host and I/O sharing applications, PLX has created a universal interconnect that substantially improves on the status quo.

The ExpressFabric solution is initially targeted at small- to medium-sized cloud clusters – up to about 1,000 nodes and eight racks – since that is where the majority of the high-volume innovations are taking place. New architectures such as SSD-based tiered storage, micro-servers and GPGPU computing, for example, make good use of the advantages of PCIe, which are explained below. PLX expects that PCIe-based fabrics will co-exist with larger-scale fabrics such as Ethernet and InfiniBand, with each interconnect addressing the areas where it is most appropriate.

Ethernet is typically used today for general-purpose fabric applications in data centers, and InfiniBand is popular in high-performance computing (HPC) applications. Both of these interconnects have advantages – Ethernet has low cost and a large software base; InfiniBand has low latency and high performance. But neither has the advantages of PCIe:

  • Almost all storage, I/O and compute devices used in data centers have PCIe connections, often more than one. This means that a high-speed fabric hooking them all together can be constructed without using bridges or other translating devices to match the source and destination subsystems. This has the advantages of requiring fewer components, leading to lower latency, cost and power. But it also allows a unified view from an architectural and software standpoint. Two PCIe devices, whether a few inches apart or across `the room, can be connected seamlessly without regard to the distance or topology.

    Some CPU devices have Ethernet on them – in addition to PCIe – but fewer non-CPU devices have such ports. And even the CPUs that have Ethernet as an on-chip port do not directly support the high speeds necessary for a mainstream fabric. In fact, when Ethernet is used as a fabric, the Ethernet component is often translating to and from the PCIe subsystems. And the situation with InfiniBand is even less promising; very few devices have InfiniBand as a direct connection.

  • It is highly efficient to allow different data types to travel along the same pathway, and then be consumed by the appropriate end point, without needing to determine prior to sending the data whether it is I/O or storage. PCIe is the most practical interconnect for enabling this type of convergence of communication and storage data traffic. The subsystems can be separated and shared, rather than duplicated on each data center blade.

    Figure 1 shows how a system can be substantially improved through convergence. The figure shows a typical current method to create a data center fabric, and it is apparent that there are actually two different fabrics running along the backplane -- one for communications and one for storage. The host CPU has PCIe coming from it, and there are bridges to the I/O and storage interconnects so that they can communicate.

    Figure 1: Traditional Multi-Fabric Based System

    Figure 2 demonstrates a PCIe-based fabric, where the I/O and storage subsystems are shared among the hosts on the unified fabric. Because of normal oversubscription, you actually need fewer I/O and storage devices for the same number of hosts. This approach works with a PCIe fabric because:

    • PCIe provides a low-latency, high-bandwidth path between all of the elements in the system; and
    • PCIe is common to all of the subsystems in terms of connectivity and data flow, so separating the hosts from the I/O and storage is straightforward from an architectural point of view, and also from a hardware and software implementation approach.

    Figure 2: PCIe Fabric Based System

In order to complete this vision, PLX has created a data center and cloud fabric based on PCIe. ExpressFabric offers the advantages of using PCIe as an interconnect fabric in a high volume, mainstream manner. In particular:

  • ExpressFabric enables the use of existing hardware and software to create converged, multi-host, shared I/O systems.
  • ExpressFabric provides a PCIe-based low-latency, high-performance pathway between the subsystems.
  • ExpressFabric requires fewer components to create the fabric, which translates to lower cost and lower power.
  • Since ExpressFabric is based on PCIe, it offers a platform for high-volume, mainstream price points.

To properly extend PCIe for use as a data center fabric, the interconnect needs to offer several straightforward enhancements, all within a standards-based framework:

  • Legacy address-based routing, used to allow PCIe to be backward-compatible with the older PCI and PCI-X standards, needs to coexist with a routing mechanism that does a more efficient job of traffic management.
  • High-performance host-to-host communication needs to be a standard part of the solution, with support for low-latency, low-overhead mechanisms.
  • Storage and I/O subsystems need to be shared among hosts in a standards-compliant manner.
  • A secure, flexible fabric management mechanism needs to be provided.
  • The existing infrastructure, both hardware and software, needs to be supported with minimal or no changes, so that systems can be built starting with what exists.

The PLX ExpressFabric solution addresses each of the items above in a way that is consistent with existing standards, and allows system engineers to migrate to this new approach easily.

Example of ExpressFabric Development Kit

In order to facilitate the creation of ExpressFabric-based systems, PLX will be offering a set of development tools that assist customers in their own engineering process. The tools consist of three components:

  • A PCIe-based adapter card that uses a standard PLX Gen3 device to emulate the connection to the fabric.
  • A 1U form factor top-of-rack (ToR) box that provides connectivity to the development system.
  • A package of software drivers and applications, offered in source code format that can be used to evaluate, demonstrate, and develop systems based on the PLX ExpressFabric concept.

Combining the tangible advantages of PCI Express as a data center fabric with the PLX family of high performance, low cost switches will enable the next phase of cloud-based computing and storage.

Relevant Articles:

For more information: