Outsmarting nonuniform DMA

Researcher:
Prof. Dan Tsafrir | Computer Science

Categories:

Information and Computer Science

The Technology

In a multi-CPU server, memory modules are local to the CPU to which they are connected, forming a nonuniform memory access (NUMA) architecture. Because non-local accesses are slower than local accesses, the NUMA architecture might degrade application performance.

Similar slowdowns occur when an I/O device issues nonuniform direct memory access DMA (NUDMA) operations, as the device is connected to memory via a single CPU. NUDMA effects therefore degrade application performance similarly to NUMA effects.

As there are intrinsic differences between I/O and CPU memory accesses NUMA effects are inevitable, but it was shown that NUDMA effects can and should be eliminated.
IOctopus, a device architecture that makes NUDMA impossible by unifying multiple physical PCIe (Peripheral Component Interconnect Express) functions—one per CPU—in manner that makes them appear as one, both to the system software and externally to the server.

Advantages

  • Requires modest change to the device driver and firmware.
  • Improves throughput and latency by as much as 2.7× and 1.28×, respectively, while ridding developers from the need to combat an unavoidable type of overhead.

Applications and Opportunities

  • a multi-CPU server
arrow Business Development Contacts
Shikma Litmanovitz
Director of Business Development, Physical Science