Published: Apr 22, 2025
Scale by Spectral Compute: Run Your CUDA Workloads Faster on Affordable AMD GPUs

When it comes to GPU computing, one name has dominated the landscape for over a decade: NVIDIA and its CUDA platform. But what if you could break free from vendor lock-in — without rewriting a single line of code?
At the 2025 Beyond CUDA Summit, Spectral Compute made waves by unveiling Scale, a revolutionary new technology that allows developers to run native CUDA applications on AMD GPUs — faster, cheaper, and without compromise.
By compiling directly to AMD hardware, Scale unlocks a future where AI builders, HPC engineers, and researchers can double their GPU options, cut costs, and accelerate performance — all while staying inside the familiar CUDA ecosystem they've relied on for years.
Here’s what you need to know about this groundbreaking announcement — and why it’s set to change the future of GPU computing.
🔧 The Problem: CUDA Lock-In and Limited Hardware Options
Michael opened by addressing the elephant in the room: CUDA is amazing, but it’s also created a hardware monopoly.
Today, if you’re building in CUDA, you’re locked to NVIDIA GPUs — or forced to juggle multiple codebases to support alternatives like AMD.
This bottleneck has kept developers from fully tapping into new, faster, and more affordable hardware options.
Scale changes all that.
⚡ Introducing Scale: Native CUDA on AMD Without Emulation
Unlike translation layers or emulation systems, Scale compiles CUDA directly to AMD’s architecture — meaning:
- Zero emulation overhead
- Direct performance on MI300X, MI325X and other AMD GPUs
- Stay inside the CUDA ecosystem you already know
Instead of rewriting thousands of lines of code, developers can recompile and run — just like switching between x86 and ARM on CPUs.
🛠️ Full Compatibility Goals: CUDA Math, Runtime, and Driver APIs
Spectral Compute isn’t just stopping at basic functionality.
Their roadmap includes:
- ~90% support for CUDA C Core Math APIs
- ~70% support for CUDA Runtime and Driver APIs
- Full NVCC C++ semantic support for seamless compilation
Even inline PTX assembly is supported — and in some cases, runs faster on AMD than NVIDIA hardware itself.
📊 Performance Gains: Scale vs HIP on AMD MI300X
Michael shared fresh benchmarks using the Rodinia Benchmark Suite:
- Scale achieved up to 2x faster performance compared to HIP
- Tests were run on RunPod.io using AMD’s MI300X GPUs
- Performance wasn’t even the focus yet — initial priority was compatibility!
💡 Once optimization becomes the focus later this year, even bigger speedups are expected.
🧠 Warp Sizes, Inline Assembly, and Real Optimizations
One big technical hurdle: NVIDIA and AMD handle warp sizes differently.
- NVIDIA: Warp size = 32
- AMD: Warp size = 64
Scale smartly maps CUDA’s 32-thread assumptions onto AMD’s 64-thread warps, allowing most code to run without modification.
Even better?
Spectral Compute is making improvements to inline PTX — sometimes producing better machine code for AMD than NVIDIA itself provides for CUDA.
🌎 Open Collaboration: How You Can Get Involved
Spectral Compute is building this technology openly and wants the community involved:
- Help test Scale and report issues
- Contribute to open-source reimplementations of CUDA libraries
- Extend the ROCm ecosystem together
👉 Join their Discord to collaborate, share feedback, or even send “hate mail” if something isn’t working. (Yes, they literally asked for it.)
💬 Michael's message: "We want to target all the hardware — not just some of it."
It’s a game-changer for AI, HPC, and scientific computing — making high-performance compute more accessible, flexible, and cost-effective than ever before.
📺 Watch the Full Talk 👉 Watch Michael Søndergaard’s Presentation at Beyond CUDA Summit
🚀 Deploy Faster with AMD GPUs 👉 Build, train, and infer at scale using AMD-powered AI infrastructure through TensorWave.
About TensorWave
TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.
Ready to get started? Connect with a Sales Engineer.