STAC-AI™ LANG6 Benchmark Results on HPE ProLiant DL384 Gen12 server with 2 NVIDIA GH200 NVL2 Superchips

Type: Audited

Specs: STAC-AI LANG6

STAC recently performed a STAC-AI™ LANG6 (Inference-only) benchmark audit on a HPE ProLiant DL384 Gen 12 server with 2 NVIDIA GH200 NVL2 Superchips.

STAC-AI is the technology benchmark standard for solutions that can be used to run LLM inferencing on capital markets data. Designed by quants and technologists from some of the world's leading financial firms, the benchmark tests the latency, throughput, energy efficiency, and space efficiency of a technology stack across two distinct model sizes.

The System Under Test (SUT) was:

  • STAC-AI™ Pack for TensorRT-LLM (Rev B)
  • TensorRT-LLM 0.17.0 (Containerized)
  • Python 3.12.3
  • Ubuntu 24.04.3 LTS
  • 2 x NVIDIA GH200 Superchips, each with
    • 1 x ARM Neoverse-V2 (Grace) CPU
    • 1 x NVIDIA H100 (Hopper) GPU with 144GB HBM3e memory
    • 480 GiB LPDDR5 memory @ 6400 MT/s
  • 1.2 TiB total unified memory
  • HPE ProLiant DL384 Gen12 server

The following is a summary of the key results.

  • Interactive workloads: DL384 supports up to 47 inferences/sec with sub-120ms reaction times for Llama-3.1-8B and EDGAR4 - 2.7x human reading speed. Even at large scale inference, DL384 - handles 6 inferences/sec with about 0.5 s reaction time for Llama-3.1.70B.
  • Batch workloads: Throughput reaches up to 8,200 words/sec for Llama-3.1-8B and EDGAR4, and up to 41 words/sec for the most demanding workload involving Llama-3.1-70B and EDGAR5 dataset.
  • High fidelity: Excellent similarity retention, even with complex queries, with fidelity consistently above 90%.
  • Outperforming human analysts: Every configuration, including our most sophisticated model, beats human reading speed (~4 words/second).

The benchmark report is available to all STAC Observer members. Additionally, STAC Insights Subscribers have access to extensive visualizations of all test results, the micro-detailed configuration information for the solutions tested, the code used in this project, and the ability to run these same benchmarks in the privacy of their own labs. Please log in to access the reports. To learn about subscription options, please contact us.

Please log in to see file attachments. If you are not registered, you may register for no charge.

The STAC-AI Working Group focuses on benchmarking artificial intelligence (AI) technologies in finance. This includes deep learning, large language models (LLMs), and other AI-driven approaches that help firms unlock new efficiencies and insights.