STAC-ML™ Markets (Inference) on a Supermicro ARS-111GL-NHR with NVIDIA GH200 Grace Hopper™ Superchip

Type: Audited

Specs: STAC-ML™ Markets (Inference)

STAC recently performed a STAC-ML™ Markets (Inference) benchmark audit on a stack including an NVIDIA GH200 Grace Hopper Superchip in a Supermicro ARS-111GL-NHR server.

STAC-ML Markets (Inference) is the technology benchmark standard for solutions that can be used to run inference on realtime market data. Designed by quants and technologists from some of the world's leading financial firms, the benchmarks test the latency, throughput, energy efficiency, space efficiency, and algorithm quality of a technology stack across three model sizes and different numbers of model instances.

Compared to a previous SUT submitted by Myrtle.ai (MRTL230426) on FPGAs, this SUT featuring an NVIDIA GH200 Superchip in a Supermicro ARS-111GL-NHR server demonstrated the following:

  • For LSTM_A (the smallest model) the 99p latency was between 7% and 20% lower
    • 7% lower with 1 NMI (4.70μs vs. 5.07μs)
    • 20% lower with 8 NMI (4.67μs vs 5.97μs)
    • The 99p error benchmark was 8 times lower (0.00111 vs 0.00889)
  • For LSTM_B (the medium model) the 99p latency was between 3% higher and 8% lower 3% higher with 1 NMI (7.10μs vs. 6.89μs)
    • 8% lower with 4 NMI (7.10 μs vs 7.73μs)
    • The 99p error benchmark was 12 times lower (0.00102 vs 0.0127)
  • For LSTM_C (the largest model) with 1 NMI:
    • The 99p latency was 49% lower (15.8μs vs. 31.0μs)
    • The throughput was 15% higher (3,910 vs. 3.387)
    • The 99p error benchmark was 13 times lower (0.00172 vs 0.0237)
    • The energy efficiency was 44% higher (8,312 vs 5,785)
  • The largest ratio of maximum to median latency is 9.65 ~ 38.3μs / 3.97μs (occurs at LSTM_A with NMI=8). The smallest ratio of maximum to median latency is 2.16 ~ 32.2μs / 14.9μs (occurs at LSTM_C with NMI=1).

The benchmark reports are available to all STAC Observer members. Additionally, Premium Subscribers have access to extensive visualizations of all test results, the micro-detailed configuration information for the solutions tested, the code used in this project, and the ability to run these same benchmarks in the privacy of their own labs. To learn about subscription options, please contact us.

Please log in to see file attachments. If you are not registered, you may register for no charge.

The STAC-ML Working Group develops benchmark standards for key machine learning (ML) workloads in finance. These benchmarks enable customers, vendors, and STAC to make apples-to-apples comparisons of techniques and technologies.