
Big point sets against big polygons used to mean “come back tomorrow.” With cuSpatial you can run those joins in minutes—if your stack is set up right. This guide shows a clean path, plus code you can paste.
On GPU renters your job runs inside an image. Two good options:
A) Use a RAPIDS image (fastest)
docker.io/rapidsai/rapidsai:<24.xx>-cuda12-runtime-ubuntu22.04-py3.10NVIDIA_VISIBLE_DEVICES=allNVIDIA_DRIVER_CAPABILITIES=compute,utilityB) Use a CUDA template and install RAPIDS with micromamba
# one‑time in a fresh instance
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
mkdir -p ~/micromamba && ./bin/micromamba shell init -s bash -p ~/micromamba
source ~/.bashrc
micromamba create -y -n rapids -c rapidsai -c conda-forge rapids=24.* python=3.10
micromamba activate rapids
python -c "import cudf, cuspatial; print(cudf.__version__, cuspatial.__version__)"
Either path gives you cudf + cuspatial with the CUDA user‑space ready. The host driver is usually supplied by your computing services provider.
This shows the in‑memory API with a toy polygon. Swap in real data later.
import cudf
import cupy as cp
import cuspatial
# Points (N x 2)
N = 1_000_000
points = cudf.DataFrame({
"x": cp.random.random(N),
"y": cp.random.random(N),
})
# One square polygon (closed ring)
poly_offsets = cudf.Series([0], dtype="int32")
ring_offsets = cudf.Series([0], dtype="int32")
poly_x = cudf.Series([0, 1, 1, 0, 0], dtype="float64")
poly_y = cudf.Series([0, 0, 1, 1, 0], dtype="float64")
mask = cuspatial.point_in_polygon(
points["x"], points["y"],
poly_offsets, ring_offsets,
poly_x, poly_y,
)
# mask is a bool DataFrame (rows=points, cols=polygons)
inside = points[mask.iloc[:, 0]]
print(len(inside))
Notes
mask has one column per polygon. Use .any(axis=1) if you only care whether a point fell in any polygon.Real workloads use thousands of polygons and hundreds of millions of points. Test fewer candidates by filtering with bounding boxes first, then call point_in_polygon just on those.
import cudf, cupy as cp, cuspatial
# Assume points and polygons are already loaded as cudf
# Example polygon structure (multiple polys, rings):
# poly_offsets, ring_offsets, poly_points_x, poly_points_y
# 1) Build polygon bounding boxes (minx, maxx, miny, maxy)
boxes = cuspatial.polygon_bounding_boxes(
poly_offsets, ring_offsets, poly_x, poly_y
)
# boxes: DataFrame [min_x, min_y, max_x, max_y]
# 2) Quick candidate filter: points within any bbox
cond = (
(points.x >= boxes.min_x.min()) & (points.x <= boxes.max_x.max()) &
(points.y >= boxes.min_y.min()) & (points.y <= boxes.max_y.max())
)
pts_cand = points[cond]
# 3) Exact test only on candidates
mask = cuspatial.point_in_polygon(
pts_cand.x, pts_cand.y,
poly_offsets, ring_offsets, poly_x, poly_y
)
any_hit = mask.any(axis=1)
joined = pts_cand[any_hit]
Why this helps
Bounding‑box filtering reduces the number of expensive PIP tests. For extreme scales, look at cuSpatial’s quadtree utilities to prune candidates even harder.
import cudf
points = cudf.read_parquet("sensors_utm.parquet") # cols: x, y, id
poly = cudf.read_parquet("zones_utm.parquet") # stored as exploded rings
# Build polygon arrays expected by cuSpatial
poly_offsets = poly["poly_offset"].astype("int32")
ring_offsets = poly["ring_offset"].astype("int32")
poly_x = poly["x"].astype("float64")
poly_y = poly["y"].astype("float64")
If your polygons live in shapefiles, convert once to GeoParquet (off‑GPU is fine) to speed up future loads.
For datasets that don’t fit in one GPU, use Dask to partition work. Pattern:
The code mirrors the single‑GPU version but wraps cuDF DataFrames in Dask cuDF.
Keep it boring and comparable.
inputs: N_points, N_polygons, polygon vertices, CRS, precision
hardware: GPU model/VRAM, CUDA, driver
code: cuSpatial version, exact code path (with/without filter)
metrics: seconds for load → filter → PIP, peak VRAM
Compute cost per million points:
cost_per_million = (price_per_hour × wall_seconds / 3600) / (N_points / 1e6)
nvidia-smi. If close to full, chunk the point table.CUDA error / no GPU
Check nvidia-smi inside the container. Make sure your image is CUDA‑ready and you set the NVIDIA env vars.
MemoryError or OOM
Chunk the point table; filter early with bboxes; reduce column set.
Wrong results
Mismatched CRS. Reproject and retest. Confirm ring orientation and closed polygons.
Slow loads
Move data to local NVMe; switch to GeoParquet; increase Parquet row‑group size.
hardware:
gpu: "<model> (<VRAM> GB)"
driver: "<NVIDIA driver>"
cuda: "<CUDA version>"
software:
image: "rapidsai/rapidsai:<24.xx>-cuda12-runtime-ubuntu22.04-py3.10"
python: "3.10"
libs:
- cudf: "<version>"
- cuspatial: "<version>"
inputs:
points: "s3://…/sensors_utm.parquet (N=<…>)"
polygons: "s3://…/zones_utm.parquet (polys=<…>)"
run:
script: "pip_join.py"
notes: "bbox filter → PIP; CRS=EPSG:32633"
outputs:
wall_seconds: "<…>"
cost_per_million: "<…>"
Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.