Amazon’s Trainium2 AI Accelerator Features 96 GB of HBM, Quadruples Training Performance

December 31, 2023

0 Views 0

SaveSavedRemoved 0

Amazon Web Services this week introduced Trainium2, its new accelerator for artificial intelligence (AI) workload that tangibly increases performance compared to its predecessor, enabling AWS to train foundation models (FMs) and large language models (LLMs) with up to trillions of parameters. In addition, AWS has set itself an ambitious goal to enable its clients to access massive 65 ‘AI’ ExaFLOPS performance for their workloads.

The AWS Trainium2 is Amazon’s 2^nd Generation accelerator designed specifically for FMs and LLMs training. When compared to its predecessor, the original Trainium, it features four times higher training performance, two times higher performance per watt, and three times as much memory – for a total of 96GB of HBM. The chip designed by Amazon’s Annapurna Labs is a multi-tile system-in-package featuring two compute tiles, four HBM memory stacks, and two chiplets whose purpose is undisclosed for now.

Amazon notably does not disclose specific performance numbers of its Trainium2, but it says that its Trn2 instances are scale-out with up to 100,000 Trainium2 chips to get up to 65 ExaFLOPS of low-precision compute performance for AI workloads. Which, working backwards, would put a single Trainium2 accelerator at roughly 650 TFLOPS. 65 EFLOPS is a level set to be achievable only on the highest-performing upcoming AI supercomputers, such as the Jupiter. Such scaling should dramatically reduce the training time for a 300-billion parameter large language model from months to weeks, according to AWS.

Amazon yet has to disclose the full specifications for Trainium2, but we’d be surprised if it didn’t add some features on top of what the original Trainium already supports. As a reminder, that co-processor supports FP32, TF32, BF16, FP16, UINT8, and configurable FP8 data formats as well as delivers up to 190 TFLOPS of FP16/BF16 compute performance.

What is perhaps more important than pure performance numbers of a single AWS Trainium2 accelerators is that Amazon has partners, such as Anthropic, that are ready to deploy it.

“We are working closely with AWS to develop our future foundation models using Trainium chips,” said Tom Brown, co-founder of Anthropic. “Trainium2 will help us build and train models at a very large scale, and we expect it to be at least 4x faster than first generation Trainium chips for some of our key workloads. Our collaboration with AWS will help organizations of all sizes unlock new possibilities, as they use Anthropic’s state-of-the-art AI systems together with AWS’s secure, reliable cloud technology.”