STAC-AI Benchmark Results on Lambda 1-Click Cluster Cloud Instance with NVIDIA B200 SXM6 Blackwell Series GPUs

Type: Audited

Specs: STAC-AI™ LANG6

STAC recently completed a STAC-AI™ LANG6 (Inference-only) benchmark audit on a Lambda 1-Click Cluster (a Lambda Cloud virtual instance) powered by 8x NVIDIA B200 SXM6 Blackwell Series GPUs.

Stack Under Test (SUT):

  • STAC-AI™ LANG6 (Inference-Only) Pack for NVIDIA TensorRT-LLM (Rev D), 'rtx6000dev' branch, commit 8a410552, 12 March 2026
  • NVIDIA TensorRT-LLM 1.2.0rc2 (PyTorch backend)
  • NVIDIA TensorRT Model Optimizer 0.37.0 (NVFP4 quantization of Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct)
  • PyTorch 2.9.0a0+145a3a7bda.nv25.10; transformers 4.56.0; cuDNN 9.14.0; NCCL 2.27.7-1+cuda13.0; CUDA 13.0
  • Ubuntu 24.04.3 LTS; Podman 4.9.3 with NVIDIA Container Toolkit 1.18.1
  • Lambda 1-Click Cluster — 1 × 8×B200 SXM6 cloud node (Lambda Cloud virtual instance)
  • 8 × NVIDIA B200 SXM6 Blackwell GPUs, each with 180 GiB HBM3e (NVIDIA driver 580.126.09)
  • 104 physical Intel® Xeon® Platinum 8570 cores (208 logical) and 2.8 TiB of guest RAM
  • 22 TB virtualized NVMe storage via BlueField-3 200GbE/NDR200 DPU networking

 

Key Results Summary:

  • Batch workloads
    • 2.2x the throughput in the EDGAR4a small model test (52,823  vs. 23,607 words/s)1
    • 3.6x the throughput in the EDGAR4b large model test (12,040 vs. 3,351 words/s)2
    • 2.3x the throughput in the EDGAR5a small model test (2,220 vs. 954 words/s)3
    • 2.7x the throughput in the EDGAR5b large model test (350 vs. 132 words/s)4
  • Interactive workloads
    • EDGAR4a: At 165 inf/s this system achieved a Response5 time 11.15x and a Reaction6 time 5.09x faster
    • EDGAR4b: At 20 inf/s this system achieved a Response7 6.2x and a Reaction8 time 5.49x faster
    • EDGAR5a: At 2.40 inf/s this system achieved a Response9 time 10.94x and a Reaction10 time 4.5x faster

 

The benchmark report is available to all STAC Observer members. STAC Insights subscribers gain access to detailed visualizations, configuration data, benchmark code, and the ability to run these tests in their own labs. Please log in to access the reports. For subscription options, contact us.

Please log in to see file attachments. If you are not registered, you may register for no charge.

The STAC-AI Working Group focuses on benchmarking artificial intelligence (AI) technologies in finance. This includes deep learning, large language models (LLMs), and other AI-driven approaches that help firms unlock new efficiencies and insights.