Transportable AI applications are broad, but face unique constraints in terms of environmental impact, size,
weight, and power. Vehicle-based systems often rely on DC power with limited grid power and are also subject
to road-related vibration profiles and weather conditions that data center HPC architectures do not have to
address. For these areas, embedded or transport-optimized systems must be used that meet the requirements of
the application and environment.
Traditionally, these optimized systems are limited in their data throughput because they use less
sophisticated and therefore less powerful HPC subcomponents. Increasingly, AI applications, such as
autonomous vehicles, require the collection of large data sets and fast inference without sacrificing
performance. The backbone of AI architectures, graphics processing units (GPUs), have overtaken Moore's Law
for traditional CPUs and are doing most of the computation. Bus speeds have doubled with PCIe Gen 4.0. They
are expected to double again with the availability of PCIe Gen 5.0, giving PCIe-based end devices such as
NVMe storage a significant performance boost.
High-Performance Computing
without bottleneck
Advances in RDMA (Remote Direct Memory Access) enable GPUs to communicate directly with storage and network
interfaces, bypassing the typical HPC bottleneck of the CPU interconnect bus. To get the most out of each
subcomponent, the system architect must find the best way to accommodate the latest PCIe subcomponent
devices on the same PCIe fabric.
The simplest way to combine PCIe subcomponents on the same PCIe bus is to
house them in the same host node box. This uses available PCIe add-in card slots on the host server. The
disadvantages of this strategy include the form factor and the limitation of available PCIe lanes.
PCIe expansion systems
at the host node
For many transportable AI applications, a full-size server is not feasible due to space limitations, as
there must be enough slots for add-in cards. In this case, a variety of NVMe, GPU, NIC, and FPGA devices
must be supported to meet the throughput threshold of the workflow. In these scenarios, the system architect
should consider PCIe expansion systems. These systems connect either directly by cable or via a PCIe switch
to a smaller, optimized host node and provide scalable and optimized expansion clusters. Configured as a
JBOX (GPU, SSD, FPGA, NIC), these building blocks can be added to overcome the bottleneck wherever it may
lie.