What does the VM vs bare metal benchmark show?

It shows that, on the tested single-host 8× RTX 5090 setup, Compute with Hivenet's VM matched bare-metal NCCL AllReduce bandwidth within run-to-run variance.

Does that mean every GPU workload performs exactly like bare metal?

No. Workload performance can depend on CPU behavior, storage, networking, data loading, driver versions, model architecture, precision, batch size, and framework settings.

Why use NCCL AllReduce as a benchmark?

NCCL AllReduce tests multi-GPU communication. It can expose topology, passthrough, and routing problems that simple single-GPU tests may miss.

Are the inference benchmarks public?

Hivenet publishes benchmark material when the methodology, setup, results, limits, and date tested are ready to share.

Can Hivenet benchmark my workload?

Hivenet can help review your workload, define the right test, and compare performance, cost, and fit before you move production work.

How should I compare benchmark claims?

Compare the workload, hardware, region, configuration, model, precision, concurrency, metric, pricing basis, and limits. A number without setup is not enough.

Performance claims need numbers.

Explore Hivenet benchmark results for GPU virtualization, AI inference, model performance, OCR workloads, and API latency. Each benchmark shows what was tested, where it ran, which hardware was used, and where the result does not apply.

Explore Compute Read methodology Contact sales

GPU VM vs bare metal

NCCL AllReduce

RTX 5090

RTX 4090

Foundational model inference

OCR throughput

API latency

Methodology-first reporting

GPU VM vs bare metal: matched within run-to-run variance.

Customers ask whether running serious GPU work in a VM slows the workload down. Hivenet tested NCCL AllReduce on a single host with 8× NVIDIA GeForce RTX 5090 GPUs. The Compute with Hivenet VM matched the bare-metal baseline within normal run-to-run variance.

Bare metal

19.25 GB/s

NCCL AllReduce bandwidth on the 8× RTX 5090 bare-metal baseline.

Compute with Hivenet VM

19.34 GB/s

NCCL AllReduce bandwidth on the same benchmark inside a VM.

Result

+0.5%

The VM result was slightly higher, but the delta sits inside normal variance. The useful conclusion is that the VM matched bare-metal bandwidth on this test.

Read the benchmark Explore GPU/CPU rental

Benchmark areas we track.

Hivenet groups benchmark work by workload type so teams can compare the results that matter to their own use case.

GPU virtualization

VM vs bare metal

NCCL AllReduce on a single 8× RTX 5090 host, measuring whether the VM introduces a measurable single-host multi-GPU communication penalty.

Read benchmark

Developer GPU workloads

RTX 4090 vs A100

Comparison content for 7B–8B model workloads, RAG, development, and cost-conscious inference.

Read comparison

Foundational model inference

Throughput and latency under load

Serving benchmarks using realistic prompt shapes, concurrency sweeps, TTFT, TPOT, inter-token latency, end-to-end latency, and throughput.

Explore AI workloads

Production model performance

Model-specific performance reports

Production-oriented reports for model-specific workloads, including SLO gates, throughput curves, tail latency, and recommended concurrency settings.

Explore Inference API

OCR workloads

Falcon-OCR quality and throughput

OCR pipeline testing across layout detection, region cropping, and OCR quality/throughput for document workloads.

Explore Private AI

API performance

API latency benchmarks

Automated endpoint latency measurements across environments, tracking average, P90, P95, and P99 latency against baselines and previous versions.

Explore Resources

What every benchmark shows.

Benchmarks are useful when the setup is clear enough to repeat, question, or compare. Hivenet benchmark pages make the test conditions visible.

Benchmark field

What it covers

Workload

What the benchmark tested

Hardware

GPU, CPU, memory, storage, network, host count

Environment

VM, bare metal, container, region, driver, CUDA, framework

Model or data

Model name, precision, dataset, prompt shape, file size, or workload input

Load profile

Concurrency, requests per second, batch size, message size, duration

Metrics

Throughput, latency, TTFT, TPOT, bandwidth, error rate, cost basis

Comparison baseline

What Hivenet is compared against and why

Result

Main result and how to read it

Limits

What the benchmark does not prove

Date

When the test ran and whether results have changed since

Swipe left to see more

What the GPU virtualization result proves, and what it does not.

The VM vs bare metal benchmark is useful because NCCL AllReduce exposes problems that softer single-GPU tests can miss. If GPU passthrough is incomplete, topology is exposed poorly, or the communication path is inefficient, multi-GPU communication often shows it quickly.

This result supports

This result does not prove

The tested VM did not show a measurable NCCL AllReduce bandwidth penalty

Every workload will perform identically to bare metal

Tuned PCIe passthrough can expose the single-host GPU path cleanly

Cross-host distributed training will behave the same

Compute with Hivenet can be credible for single-host multi-GPU work

Storage, networking, CPU, data loading, or framework choices will never bottleneck

Swipe left to see more

Cross-host distributed workloads need their own test. Once a job spans multiple hosts, networking becomes a major factor.

Read the full benchmark

How Hivenet measures inference workloads.

Inference performance depends on the model, precision, prompt size, output length, concurrency, serving engine, and latency target. Hivenet inference benchmarks separate latency floor, throughput ceiling, and production concurrency recommendations.

TTFT

Time to first token. Important for perceived responsiveness and interactive workloads.

TPOT

Time per output token. Useful for generation speed after the first token.

End-to-end latency

Total request time from submission to completed output.

Tokens per second

Throughput measure for comparing serving setups under defined load.

Error rate

Shows whether a setup remains stable as concurrency rises.

Performance only matters when the economics hold.

A faster setup is useful when it also fits the workload, budget, and operating model. Hivenet benchmark pages connect technical results to pricing basis where possible, so teams can compare performance and cost together.

Price basis

Benchmarks should state the instance, endpoint tier, or storage price used for the comparison.

Workload fit

A strong result for one model, batch size, or prompt shape does not automatically apply to another workload.

Runtime behavior

Short tests, long jobs, steady inference, bursty traffic, and batch workflows can produce different cost-performance outcomes.

Platform path

Compare GPU/CPU rental, Inference API, S3 storage, and Private AI based on how much of the stack your team wants to operate.

Enterprise-grade infrastructure needs measurable proof.

Hivenet runs AI, compute, and storage workloads on Policloud-backed infrastructure designed for reliable performance, cost visibility, and regional deployment. Benchmarks show how that infrastructure behaves under specific workloads.

Policloud-backed capacity

Modular infrastructure gives Hivenet a practical way to place capacity closer to energy, region, and workload demand.

Reliable workload paths

Hivenet connects infrastructure, software, APIs, billing, access patterns, and product workflows into a platform teams can test and operate.

Standard tools

Benchmarks use practical toolchains and metrics so teams can compare results against workflows they understand.

Clear limits

Each benchmark should explain where the result applies and where a workload needs its own test.

See how Hivenet works

Your workload still needs its own test.

A benchmark can show whether Hivenet is a strong fit for a class of workload. It cannot replace testing your own model, data pipeline, latency target, and production constraints.

Training and fine-tuning

Test step time, GPU utilization, communication time, data loading behavior, precision, batch size, sequence length, and optimizer settings.

Inference

Test latency, throughput, concurrency behavior, cold starts, prompt shape, output length, and model quality on real traffic.

Storage-heavy workloads

Test file size, access pattern, egress, object count, listing behavior, throughput, and integration with the tools your team uses.

APIs

Test average latency, tail latency, error rates, rate-limit behavior, request IDs, and performance changes across releases.

Talk through your benchmark Explore Compute

Choose the next step.

Compute with Hivenet

Explore cloud compute paths for GPU and CPU workloads, programmable infrastructure, and team-ready workflows.

Explore Compute

GPU/CPU rental

Rent RTX 4090, RTX 5090, or vCPU instances for AI, ML, rendering, notebooks, and development workloads.

Explore GPU/CPU rental

Hivenet Inference API

Use OpenAI-compatible managed endpoints for foundational models.

Explore Inference API

AI workloads

Route production inference, RAG, fine-tuning, model hosting, and Private AI workloads to the right Hivenet path.

Explore AI workloads

S3-compatible storage

Store datasets, backups, media, archives, and AI pipeline files with free egress and familiar tools.

Explore S3 storage

FAQ

Common benchmark questions

Bring your workload. Test it properly.

Share the model, data path, target latency, throughput needs, region requirements, and current infrastructure. We'll help you decide what to benchmark and which Hivenet path fits.

Contact sales Explore Compute Read benchmark methodology

Performance claims need numbers.

GPU VM vs bare metal: matched within run-to-run variance.

Bare metal

19.25 GB/s

Compute with Hivenet VM

19.34 GB/s

Result

+0.5%

Benchmark areas we track.

GPU virtualization

VM vs bare metal

Developer GPU workloads

RTX 4090 vs A100

Foundational model inference

Throughput and latency under load

Production model performance

Model-specific performance reports

OCR workloads

Falcon-OCR quality and throughput

API performance

API latency benchmarks

What every benchmark shows.

What the GPU virtualization result proves, and what it does not.

How Hivenet measures inference workloads.

TTFT

TPOT

End-to-end latency

Tokens per second

Error rate

Performance only matters when the economics hold.

Price basis

Workload fit

Runtime behavior

Platform path

Enterprise-grade infrastructure needs measurable proof.

Policloud-backed capacity

Reliable workload paths

Standard tools

Clear limits

Your workload still needs its own test.

Training and fine-tuning

Inference

Storage-heavy workloads

APIs

Choose the next step.

Compute with Hivenet

GPU/CPU rental

Hivenet Inference API

AI workloads

S3-compatible storage

Common benchmark questions

Bring your workload. Test it properly.

30% Off Hivenet Plans!