TensorWave Welcomes the AMD Instinct™ MI355X

Published: Apr 22, 2025

Scale by Spectral Compute: Run Your CUDA Workloads Faster on Affordable AMD GPUs

When it comes to GPU computing, one name has dominated the landscape for over a decade: NVIDIA and its CUDA platform. But what if you could break free from vendor lock-in — without rewriting a single line of code?

At the 2025 Beyond CUDA Summit, Spectral Compute made waves by unveiling Scale, a revolutionary new technology that allows developers to run native CUDA applications on AMD GPUs — faster, cheaper, and without compromise.

By compiling directly to AMD hardware, Scale unlocks a future where AI builders, HPC engineers, and researchers can double their GPU options, cut costs, and accelerate performance — all while staying inside the familiar CUDA ecosystem they've relied on for years.

Here’s what you need to know about this groundbreaking announcement — and why it’s set to change the future of GPU computing.

🔧 The Problem: CUDA Lock-In and Limited Hardware Options

Michael opened by addressing the elephant in the room: CUDA is amazing, but it’s also created a hardware monopoly.

Today, if you’re building in CUDA, you’re locked to NVIDIA GPUs — or forced to juggle multiple codebases to support alternatives like AMD.

This bottleneck has kept developers from fully tapping into new, faster, and more affordable hardware options.

Scale changes all that.

⚡ Introducing Scale: Native CUDA on AMD Without Emulation

Unlike translation layers or emulation systems, Scale compiles CUDA directly to AMD’s architecture — meaning:

Zero emulation overhead
Direct performance on MI300X, MI325X and other AMD GPUs
Stay inside the CUDA ecosystem you already know

Instead of rewriting thousands of lines of code, developers can recompile and run — just like switching between x86 and ARM on CPUs.

🛠️ Full Compatibility Goals: CUDA Math, Runtime, and Driver APIs

Spectral Compute isn’t just stopping at basic functionality.

Their roadmap includes:

~90% support for CUDA C Core Math APIs
~70% support for CUDA Runtime and Driver APIs
Full NVCC C++ semantic support for seamless compilation

Even inline PTX assembly is supported — and in some cases, runs faster on AMD than NVIDIA hardware itself.

📊 Performance Gains: Scale vs HIP on AMD MI300X

Michael shared fresh benchmarks using the Rodinia Benchmark Suite:

Scale achieved up to 2x faster performance compared to HIP
Tests were run on RunPod.io using AMD’s MI300X GPUs
Performance wasn’t even the focus yet — initial priority was compatibility!

💡 Once optimization becomes the focus later this year, even bigger speedups are expected.

🧠 Warp Sizes, Inline Assembly, and Real Optimizations

One big technical hurdle: NVIDIA and AMD handle warp sizes differently.

NVIDIA: Warp size = 32
AMD: Warp size = 64

Scale smartly maps CUDA’s 32-thread assumptions onto AMD’s 64-thread warps, allowing most code to run without modification.

Even better?
Spectral Compute is making improvements to inline PTX — sometimes producing better machine code for AMD than NVIDIA itself provides for CUDA.

🌎 Open Collaboration: How You Can Get Involved

Spectral Compute is building this technology openly and wants the community involved:

Help test Scale and report issues
Contribute to open-source reimplementations of CUDA libraries
Extend the ROCm ecosystem together

👉 Join their Discord to collaborate, share feedback, or even send “hate mail” if something isn’t working. (Yes, they literally asked for it.)

💬 Michael's message: "We want to target all the hardware — not just some of it."

It’s a game-changer for AI, HPC, and scientific computing — making high-performance compute more accessible, flexible, and cost-effective than ever before.

📺 Watch the Full Talk 👉 Watch Michael Søndergaard’s Presentation at Beyond CUDA Summit

🚀 Deploy Faster with AMD GPUs 👉 Build, train, and infer at scale using AMD-powered AI infrastructure through TensorWave.

About TensorWave

TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.

Ready to get started? Connect with a Sales Engineer.

SOC2 Type II certified and HIPAA compliant

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.

SOC2 Type II certified and HIPAA compliant

TensorWave Welcomes the AMD Instinct™ MI355X

Scale by Spectral Compute: Run Your CUDA Workloads Faster on Affordable AMD GPUs

About TensorWave

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.

Product

Solutions

Resources

Company

© 2025 TensorWave Inc. - All rights reserved.