Performance claims need numbers.

Explore Hivenet benchmark results for GPU virtualization, AI inference, model performance, OCR workloads, and API latency. Each benchmark shows what was tested, where it ran, which hardware was used, and where the result does not apply.

GPU VM vs bare metal

NCCL AllReduce

RTX 5090

RTX 4090

Foundational model inference

OCR throughput

API latency

Methodology-first reporting

GPU VM vs bare metal: matched within run-to-run variance.

Customers ask whether running serious GPU work in a VM slows the workload down. Hivenet tested NCCL AllReduce on a single host with 8× NVIDIA GeForce RTX 5090 GPUs. The Compute with Hivenet VM matched the bare-metal baseline within normal run-to-run variance.

Bare metal

19.25 GB/s

NCCL AllReduce bandwidth on the 8× RTX 5090 bare-metal baseline.

Compute with Hivenet VM

19.34 GB/s

NCCL AllReduce bandwidth on the same benchmark inside a VM.

Result

+0.5%

The VM result was slightly higher, but the delta sits inside normal variance. The useful conclusion is that the VM matched bare-metal bandwidth on this test.

Benchmark areas we track.

Hivenet groups benchmark work by workload type so teams can compare the results that matter to their own use case.

GPU virtualization

VM vs bare metal

NCCL AllReduce on a single 8× RTX 5090 host, measuring whether the VM introduces a measurable single-host multi-GPU communication penalty.

Developer GPU workloads

RTX 4090 vs A100

Comparison content for 7B–8B model workloads, RAG, development, and cost-conscious inference.

Foundational model inference

Throughput and latency under load

Serving benchmarks using realistic prompt shapes, concurrency sweeps, TTFT, TPOT, inter-token latency, end-to-end latency, and throughput.

Production model performance

Model-specific performance reports

Production-oriented reports for model-specific workloads, including SLO gates, throughput curves, tail latency, and recommended concurrency settings.

OCR workloads

Falcon-OCR quality and throughput

OCR pipeline testing across layout detection, region cropping, and OCR quality/throughput for document workloads.

API performance

API latency benchmarks

Automated endpoint latency measurements across environments, tracking average, P90, P95, and P99 latency against baselines and previous versions.

What every benchmark shows.

Benchmarks are useful when the setup is clear enough to repeat, question, or compare. Hivenet benchmark pages make the test conditions visible.

Benchmark field

What it covers

Workload

What the benchmark tested

Hardware

GPU, CPU, memory, storage, network, host count

Environment

VM, bare metal, container, region, driver, CUDA, framework

Model or data

Model name, precision, dataset, prompt shape, file size, or workload input

Load profile

Concurrency, requests per second, batch size, message size, duration

Metrics

Throughput, latency, TTFT, TPOT, bandwidth, error rate, cost basis

Comparison baseline

What Hivenet is compared against and why

Result

Main result and how to read it

Limits

What the benchmark does not prove

Date

When the test ran and whether results have changed since

Swipe left to see more

What the GPU virtualization result proves, and what it does not.

The VM vs bare metal benchmark is useful because NCCL AllReduce exposes problems that softer single-GPU tests can miss. If GPU passthrough is incomplete, topology is exposed poorly, or the communication path is inefficient, multi-GPU communication often shows it quickly.

This result supports

This result does not prove

The tested VM did not show a measurable NCCL AllReduce bandwidth penalty

Every workload will perform identically to bare metal

Tuned PCIe passthrough can expose the single-host GPU path cleanly

Cross-host distributed training will behave the same

Compute with Hivenet can be credible for single-host multi-GPU work

Storage, networking, CPU, data loading, or framework choices will never bottleneck

Swipe left to see more

Cross-host distributed workloads need their own test. Once a job spans multiple hosts, networking becomes a major factor.

Read the full benchmark

How Hivenet measures inference workloads.

Inference performance depends on the model, precision, prompt size, output length, concurrency, serving engine, and latency target. Hivenet inference benchmarks separate latency floor, throughput ceiling, and production concurrency recommendations.

TTFT

Time to first token. Important for perceived responsiveness and interactive workloads.

TPOT

Time per output token. Useful for generation speed after the first token.

End-to-end latency

Total request time from submission to completed output.

Tokens per second

Throughput measure for comparing serving setups under defined load.

Error rate

Shows whether a setup remains stable as concurrency rises.

Performance only matters when the economics hold.

A faster setup is useful when it also fits the workload, budget, and operating model. Hivenet benchmark pages connect technical results to pricing basis where possible, so teams can compare performance and cost together.

Price basis

Benchmarks should state the instance, endpoint tier, or storage price used for the comparison.

Workload fit

A strong result for one model, batch size, or prompt shape does not automatically apply to another workload.

Runtime behavior

Short tests, long jobs, steady inference, bursty traffic, and batch workflows can produce different cost-performance outcomes.

Platform path

Compare GPU/CPU rental, Inference API, S3 storage, and Private AI based on how much of the stack your team wants to operate.

Enterprise-grade infrastructure needs measurable proof.

Hivenet runs AI, compute, and storage workloads on Policloud-backed infrastructure designed for reliable performance, cost visibility, and regional deployment. Benchmarks show how that infrastructure behaves under specific workloads.

Policloud-backed capacity

Modular infrastructure gives Hivenet a practical way to place capacity closer to energy, region, and workload demand.

Reliable workload paths

Hivenet connects infrastructure, software, APIs, billing, access patterns, and product workflows into a platform teams can test and operate.

Standard tools

Benchmarks use practical toolchains and metrics so teams can compare results against workflows they understand.

Clear limits

Each benchmark should explain where the result applies and where a workload needs its own test.

See how Hivenet works

Your workload still needs its own test.

A benchmark can show whether Hivenet is a strong fit for a class of workload. It cannot replace testing your own model, data pipeline, latency target, and production constraints.

Training and fine-tuning

Test step time, GPU utilization, communication time, data loading behavior, precision, batch size, sequence length, and optimizer settings.

Inference

Test latency, throughput, concurrency behavior, cold starts, prompt shape, output length, and model quality on real traffic.

Storage-heavy workloads

Test file size, access pattern, egress, object count, listing behavior, throughput, and integration with the tools your team uses.

APIs

Test average latency, tail latency, error rates, rate-limit behavior, request IDs, and performance changes across releases.

Choose the next step.

Compute with Hivenet

Explore cloud compute paths for GPU and CPU workloads, programmable infrastructure, and team-ready workflows.

GPU/CPU rental

Rent RTX 4090, RTX 5090, or vCPU instances for AI, ML, rendering, notebooks, and development workloads.

Hivenet Inference API

Use OpenAI-compatible managed endpoints for foundational models.

AI workloads

Route production inference, RAG, fine-tuning, model hosting, and Private AI workloads to the right Hivenet path.

S3-compatible storage

Store datasets, backups, media, archives, and AI pipeline files with free egress and familiar tools.

FAQ

Common benchmark questions

Bring your workload. Test it properly.

Share the model, data path, target latency, throughput needs, region requirements, and current infrastructure. We'll help you decide what to benchmark and which Hivenet path fits.

Shader gradient background

PoliCloud + Hivenet

30% Off Hivenet Plans!

PoliCloud, powered by Hivenet’s technology, is redefining sovereign cloud storage. To celebrate our partnership, we’re offering 30% off all Hivenet plans—for a limited time!

*Offer ends March 31, 2025. Don't miss out!

Read our Terms & Conditions