Cloud GPUs
A100 performance for the price of a 4090. Per-second billing. No queues. No egress fees.


Single-GPU benchmark on Hivenet infrastructure:
TTFT: 349.9 ms at 1 req/s (single-request baseline).
Peak throughput: 737 tokens/s while delivering 737 tokens/s under sustained load.¹
¹ Benchmark methodology and test conditions here.
Spec
Value
Why it matters
Architecture
Ada Lovelace
4nm process — efficient under sustained heavy load
Memory
24 GB GDDR6X
Fits Llama-3 70B (4-bit quantization) on a single card
Bandwidth
1,008 GB/s
Prevents tensor stalls on large batch inference
FP16 throughput
165 TFLOPS
Headroom for diffusion models at 1024×1024
TDP
450 W
Lower than an A100 40 GB at equivalent inference throughput
Start a QLoRA run in under 60 seconds. Pause and resume anytime — no charge for idle time.
24 GB VRAM supports 14 GB KV cache at full precision. No quantization required for most diffusion models at 1024×1024.
Inference stays in your account. No third-party API logs.
1,008 GB/s memory bandwidth handles 4K frames without I/O stalls.
1 × - 8 ×
vCPU - - - GB
RAM - - - GB
Disk space - - - GB
Bandwidth - Mb/s
Per-second billing. No egress fees. Storage included.
Reach us at support@hivenet.com or through the in-app chat.