SUT ID: HPE250907b
STAC-AI

STAC-AI™ LANG6 Benchmark Results on HPE ProLiant DL380a Gen12 server with 8 NVIDIA H200 NVL GPUs

Type: Audited

Specs: STAC-AI LANG6

STAC recently performed a STAC-AI™ LANG6 (Inference-only) benchmark audit on a HPE ProLiant DL380a Gen12 server with 8 NVIDIA H200 NVL GPUs.

STAC-AI is the technology benchmark standard for solutions that can be used to run LLM inferencing on capital markets data. Designed by quants and technologists from some of the world's leading financial firms, the benchmark tests the latency, throughput, energy efficiency, and space efficiency of a technology stack across two distinct model sizes.

The System Under Test (SUT) was:

STAC-AI™ Pack for TensorRT-LLM (Rev B)
TensorRT-LLM 0.17.0 (Containerized)
Python 3.12.3
Ubuntu 24.04.3 LTS
2 x NVIDIA GH200 Superchips, each with

1 x ARM Neoverse-V2 (Grace) CPU
1 x NVIDIA H100 (Hopper) GPU with 144GB HBM3e memory
480 GiB LPDDR5 memory @ 6400 MT/s

1.2 TiB total unified memory
HPE ProLiant DL384 Gen12 server

The following is a summary of the key results.

Interactive workloads: DL380a supports up to 165 inferences/sec while sustaining sub-200 ms median reaction times for Llama-3.1-8B and EDGAR4. Even under heavier loads, DL380a handles up to 20 inferences/sec with under 1s median rection time for Llama-3.1.70B and EDGAR4.
Smooth token streaming: Across all tested configurations, DL380a achieves smooth token streaming between 2.9 up to 40 words/sec.
Batch workloads: Throughput reaches up to 23,600 words/sec for Llama-3.1-8B and EDGAR4 and up to 132 words/sec for Llama-3.1-70B and EDGAR5
High fidelity: Fidelity remains consistently above 90%, even with the largest 70B models.
Enterprise-class scalability: The 8xH200NVL configuration scales efficiently with increasing workload demand, supporting the most demanding concurrent inference and RAG scenarios.

The benchmark report is available to all STAC Observer members. Additionally, STAC Insights Subscribers have access to extensive visualizations of all test results, the micro-detailed configuration information for the solutions tested, the code used in this project, and the ability to run these same benchmarks in the privacy of their own labs. Please log in to access the reports. To learn about subscription options, please contact us.

Please log in to see file attachments. If you are not registered, you may register for no charge.

STAC-AI™ LANG6 Benchmark Results on HPE ProLiant DL380a Gen12 server with 8 NVIDIA H200 NVL GPUs

User login