Explore Hivenet benchmark results for GPU virtualization, AI inference, model performance, OCR workloads, and API latency. Each benchmark shows what was tested, where it ran, which hardware was used, and where the result does not apply.
GPU VM vs bare metal
NCCL AllReduce
RTX 5090
RTX 4090
Foundational model inference
OCR throughput
API latency
Methodology-first reporting
Customers ask whether running serious GPU work in a VM slows the workload down. Hivenet tested NCCL AllReduce on a single host with 8× NVIDIA GeForce RTX 5090 GPUs. The Compute with Hivenet VM matched the bare-metal baseline within normal run-to-run variance.
NCCL AllReduce bandwidth on the 8× RTX 5090 bare-metal baseline.
NCCL AllReduce bandwidth on the same benchmark inside a VM.
The VM result was slightly higher, but the delta sits inside normal variance. The useful conclusion is that the VM matched bare-metal bandwidth on this test.
Hivenet groups benchmark work by workload type so teams can compare the results that matter to their own use case.
NCCL AllReduce on a single 8× RTX 5090 host, measuring whether the VM introduces a measurable single-host multi-GPU communication penalty.
Comparison content for 7B–8B model workloads, RAG, development, and cost-conscious inference.
Serving benchmarks using realistic prompt shapes, concurrency sweeps, TTFT, TPOT, inter-token latency, end-to-end latency, and throughput.
Production-oriented reports for model-specific workloads, including SLO gates, throughput curves, tail latency, and recommended concurrency settings.
OCR pipeline testing across layout detection, region cropping, and OCR quality/throughput for document workloads.
Automated endpoint latency measurements across environments, tracking average, P90, P95, and P99 latency against baselines and previous versions.
Benchmarks are useful when the setup is clear enough to repeat, question, or compare. Hivenet benchmark pages make the test conditions visible.
Benchmark field
What it covers
Workload
What the benchmark tested
Hardware
GPU, CPU, memory, storage, network, host count
Environment
VM, bare metal, container, region, driver, CUDA, framework
Model or data
Model name, precision, dataset, prompt shape, file size, or workload input
Load profile
Concurrency, requests per second, batch size, message size, duration
Metrics
Throughput, latency, TTFT, TPOT, bandwidth, error rate, cost basis
Comparison baseline
What Hivenet is compared against and why
Result
Main result and how to read it
Limits
What the benchmark does not prove
Date
When the test ran and whether results have changed since
The VM vs bare metal benchmark is useful because NCCL AllReduce exposes problems that softer single-GPU tests can miss. If GPU passthrough is incomplete, topology is exposed poorly, or the communication path is inefficient, multi-GPU communication often shows it quickly.
This result supports
This result does not prove
The tested VM did not show a measurable NCCL AllReduce bandwidth penalty
Every workload will perform identically to bare metal
Tuned PCIe passthrough can expose the single-host GPU path cleanly
Cross-host distributed training will behave the same
Compute with Hivenet can be credible for single-host multi-GPU work
Storage, networking, CPU, data loading, or framework choices will never bottleneck
Cross-host distributed workloads need their own test. Once a job spans multiple hosts, networking becomes a major factor.
Inference performance depends on the model, precision, prompt size, output length, concurrency, serving engine, and latency target. Hivenet inference benchmarks separate latency floor, throughput ceiling, and production concurrency recommendations.
Time to first token. Important for perceived responsiveness and interactive workloads.
Time per output token. Useful for generation speed after the first token.
Total request time from submission to completed output.
Throughput measure for comparing serving setups under defined load.
Shows whether a setup remains stable as concurrency rises.
A faster setup is useful when it also fits the workload, budget, and operating model. Hivenet benchmark pages connect technical results to pricing basis where possible, so teams can compare performance and cost together.
Benchmarks should state the instance, endpoint tier, or storage price used for the comparison.
A strong result for one model, batch size, or prompt shape does not automatically apply to another workload.
Short tests, long jobs, steady inference, bursty traffic, and batch workflows can produce different cost-performance outcomes.
Compare GPU/CPU rental, Inference API, S3 storage, and Private AI based on how much of the stack your team wants to operate.
Hivenet runs AI, compute, and storage workloads on Policloud-backed infrastructure designed for reliable performance, cost visibility, and regional deployment. Benchmarks show how that infrastructure behaves under specific workloads.
Modular infrastructure gives Hivenet a practical way to place capacity closer to energy, region, and workload demand.
Hivenet connects infrastructure, software, APIs, billing, access patterns, and product workflows into a platform teams can test and operate.
Benchmarks use practical toolchains and metrics so teams can compare results against workflows they understand.
Each benchmark should explain where the result applies and where a workload needs its own test.
A benchmark can show whether Hivenet is a strong fit for a class of workload. It cannot replace testing your own model, data pipeline, latency target, and production constraints.
Test step time, GPU utilization, communication time, data loading behavior, precision, batch size, sequence length, and optimizer settings.
Test latency, throughput, concurrency behavior, cold starts, prompt shape, output length, and model quality on real traffic.
Test file size, access pattern, egress, object count, listing behavior, throughput, and integration with the tools your team uses.
Test average latency, tail latency, error rates, rate-limit behavior, request IDs, and performance changes across releases.

Explore cloud compute paths for GPU and CPU workloads, programmable infrastructure, and team-ready workflows.

Rent RTX 4090, RTX 5090, or vCPU instances for AI, ML, rendering, notebooks, and development workloads.

Use OpenAI-compatible managed endpoints for foundational models.

Route production inference, RAG, fine-tuning, model hosting, and Private AI workloads to the right Hivenet path.

Store datasets, backups, media, archives, and AI pipeline files with free egress and familiar tools.
FAQ
Share the model, data path, target latency, throughput needs, region requirements, and current infrastructure. We'll help you decide what to benchmark and which Hivenet path fits.