Jim Keller-led chip company Tenstorrent has released its next-generation Wormhole processor for AI workloads, which it expects to offer good performance at an affordable price. The company currently offers two additional PCIe cards that can accommodate one or two Wormhole processors, as well as TT-LoudBox and TT-QuietBox workstations for software developers. All of today’s announcements are aimed at developers, not those using Wormhole boards for commercial workloads.
“It’s always gratifying to get more of our products into the hands of developers. Release development systems using our Wormhole™ cards can help developers scale and develop multi-chip AI software,” said Jim Keller, CEO of Tenstorrent. In addition to this launch, we are excited to see the progress we are making with the tape out and power-up of our second-generation product, Blackhole.”
Each Wormhole processor contains 72 Tensix cores (five of which support RISC-V cores in various data formats) and 108 MB of SRAM, delivering 262 FP8 TFLOPS at 1 GHz with a thermal design power of 160W. The single-chip Wormhole n150 card is equipped with 12 GB GDDR6 video memory and has a bandwidth of 288 GB/s.
Wormhole processors provide flexible scalability to meet the diverse needs of workloads. In a standard workstation setup with four Wormhole n300 cards, the processors can be combined into a single unit that appears in the software as a unified, broad Tensix core network. This configuration allows the accelerator to handle the same workload, split between four developers or run up to eight different AI models simultaneously. A key feature of this scalability is that it can run locally without the need for virtualization. In a data center environment, Wormhole processors will use PCIe for expansion inside the machine, or Ethernet for external expansion.
In terms of performance, Tenstorrent’s single-chip Wormhole n150 card (72 Tensix cores, 1 GHz frequency, 108 MB SRAM, 12 GB GDDR6, 288 GB/s bandwidth) achieved 262 FP8 TFLOPS at 160W, while the dual-chip Wormhole n300 board (128 Tensix cores, 1 GHz frequency, 192 MB SRAM, aggregated 24 GB GDDR6, 576 GB/s bandwidth) delivers up to 466 FP8 TFLOPS at 300W.
To put 300W of 466 FP8 TFLOPS into context, we’ ll compare it to what AI market leader Nvidia is offering at this thermal design power. Nvidia’s A100 doesn’t support FP8, but it does support INT8, with peak performance of 624 TOPS (1,248 TOPS when sparse). In comparison, Nvidia’s H100 supports FP8 and reaches peak performance of 1,670 TFLOPS at 300W (3,341 TFLOPS at sparse), which is significantly different from Tenstorrent’s Wormhole n300.
However, there is one major problem. Tenstorrent’s Wormhole n150 retails for $999, while the n300 sells for $1,399. By comparison, a single Nvidia H100 graphics card retails for $30,000, depending on quantity. Of course, we don’t know if four or eight Wormhole processors can actually deliver the performance of a single H300, but their TDPs are 600W and 1200W respectively.
In addition to the cards, Tenstorrent offers pre-built workstations for developers, including 4 n300 cards in the more affordable Xeon-based TT-LoudBox with active cooling, and the advanced TT-QuietBox with EPYC-based Xiaolong) liquid cooling function).
Post time: Jul-29-2024