Published: Jul 16, 2025

Save up to 70% with MAX, the fastest inference engine on AMD compute

AI inference has a cost problem. NVIDIA dominates the current GPU market, making large-scale inference expensive. Teams believe they must either pay high prices for premium compute or accept slower performance from less expensive alternatives. TensorWave recognized this market gap early. As an AI and HPC cloud provider exclusively focused on AMD infrastructure, they're betting that alternative accelerators can deliver enterprise-grade performance.

But hardware alone isn't enough - customers need software that can extract every ounce of performance from these chips. This created a unique challenge: how to prove that AMD GPUs aren't just lower priced alternatives, but can actually outperform incumbent solutions when paired with the right software stack?

TensorWave + Modular

At Modular, we’ve built MAX - an inference framework and serving engine that delivers the fastest performance on any accelerator. Built from the ground-up for generative AI, it has all the tools developers need to optimize and deploy AI on GPUs fast.

TensorWave and Modular have partnered to shatter the cost-performance ceiling for AI inference. By combining TensorWave's cutting-edge AMD MI325X infrastructure with MAX, we're delivering a solution that's both faster and cheaper than existing inference deployments.

AMD compute on TensorWave is available for industry-leading prices, enabling customers to save significantly compared to traditional hyperscaler cloud providers. Hardware savings start at 58% with MI325X compared to NVIDIA H200.

Hardware savings are only half the equation. MAX was built to be hardware-agnostic, delivering maximum performance from any accelerator. When deployed on TensorWave's MI325X GPUs, the results speak for themselves:

  • Meta-Llama-3.1-8B: Up to 23% faster
  • Mistral-Small-24B: Up to 13% faster
  • Gemma-3-12B: Up to 57% faster

These hardware savings combined with MAX's performance edge deliver major cost savings to customers.

Cost per Million Tokens Comparison

These aren't just marginal improvements. Teams can now serve the same workload for 60-70% less, or handle 3-7x more traffic for the same budget.

Results

This partnership changes the inference cost equation. AI deployers can now select the hardware that gives them the best TCO, be it NVIDIA or AMD. With TensorWave's MI325X fleet and MAX, you get:

  • 60-70% lower cost per million tokens across popular models and workloads
  • Up to 57% higher throughput on key models compared to vLLM
  • 58% lower hourly GPU costs compared to AWS H200 pricing

For AI teams running billions of tokens daily, this represents major cost reductions that can mean the difference between a sustainable business model and burning through runway. Alternative hardware paired with optimized software isn't just viable - it's superior.

Interested? Reach out to learn how you can deploy MAX with TensorWave to your environment today.

About TensorWave

TensorWave is the AMD GPU cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.

Ready to get started? Connect with a Sales Engineer.

About Modular

Modular is building AI’s unified compute layer, enabling developers to deploy production gen AI systems across any hardware in a matter of minutes. The Modular platform lets you run frontier models at blazing fast speed on AMD or NVIDIA GPUs, without the complexity of fragmented toolchains and with a software stack that just works.