
If you’re running AI jobs, you already know how much your hardware choice shapes what’s possible and what it costs. That’s why we’ve added the NVIDIA RTX 5090 to Compute. More speed, less waiting, and a fair price. Let’s get right to the numbers.
When we launched with 4090s, it solved a big pain: data-center GPUs like the A100 were either impossible to get or wildly overpriced. The 4090 turned out to be the sweet spot for most LLM inference and AI workloads.
But our users pushed us further. Teams wanted faster inference, better scaling, and an option to go “all in” without the energy burn. When the first batch of 5090s landed, we ran them through their paces and opened up a whole new region (UAE-2) so you could get access right away.
We’ve run side-by-side tests using real LLM workloads. Here’s what stands out:

If you’re running small to mid-sized LLMs, the 5090 is now the fastest, most cost-effective option in Compute.
We don’t hide behind benchmarks that nobody can reproduce. Here’s our setup:
You can check the detailed results in our benchmark PDF. If you want a closer look at the test configs or want to run your own comparisons, just ask. We’re happy to walk you through the details.
With 5090s, anyone running LLMs up to 13B parameters can get data-center performance, without a data-center bill or a six-month waitlist. The cards scale linearly, so you can cluster them and tackle heavy workloads, or spin up one for quick experiments.
Not every job needs the biggest hammer. Here’s when the 4090 or A100 might be your better pick:
Still, we think that for most use cases, 4090s, and now 5090s, are a better choice than A100s. Check out our earlier post Why more developers are choosing RTX 4090 over A100 for more.
It’s as simple as ever:

You’re up and running in under a minute.
We’re already planning for more regions with 5090 capacity and are testing multi-GPU templates. If you’ve got feedback or want a feature, let us know. Compute is always evolving with you.