
Agent‑based models thrive on parallelism. FLAME GPU executes agent functions as CUDA kernels, so one workstation‑class GPU can simulate millions of agents in real time—if you structure the model cleanly. This guide shows a practical, GPU‑friendly path from NetLogo/Mesa to FLAME GPU.
Precision note: most ABMs are fine in FP32. If your model is sensitive, see the FP64 checklist.
Your job runs inside a container. Two options that work well:
A) CUDA runtime + build from source (portable)
# Dockerfile (sketch)
FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential cmake git python3 python3-pip \
&& rm -rf /var/lib/apt/lists/*
ENV NVIDIA_VISIBLE_DEVICES=all \
NVIDIA_DRIVER_CAPABILITIES=compute,utility
B) Use a maintained FLAME GPU image (fastest)
Either way, confirm GPU visibility inside the container:
nvidia-smi
/work
├── CMakeLists.txt
├── src/
│ ├── agents.cu # agent functions
│ ├── model.cu # model description & layers
│ └── main.cu # entry point
├── python/ # optional Python driver
└── data/ # inputs, seeds, checkpoints
Initialize CMake and build out‑of‑tree:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
FLAME GPU runs agent functions over agent arrays on the GPU. Use messages for local interactions.
// agents.cu
#include <flamegpu/flamegpu.h>
// Each agent reads messages in its neighborhood and updates velocity
FLAMEGPU_AGENT_FUNCTION(step, flamegpu::MessageSpatial2D, flamegpu::MessageSpatial2D) {
const float x = FLAMEGPU->getVariable<float>("x");
const float y = FLAMEGPU->getVariable<float>("y");
float vx = FLAMEGPU->getVariable<float>("vx");
float vy = FLAMEGPU->getVariable<float>("vy");
// simple cohesion
float cx = 0.f, cy = 0.f; int n = 0;
for (const auto &m : FLAMEGPU->message_in(x, y)) { // spatial iteration
cx += m.getVariable<float>("x");
cy += m.getVariable<float>("y");
n++;
}
if (n) { cx /= n; cy /= n; vx += 0.05f * (cx - x); vy += 0.05f * (cy - y); }
// write output state and position
FLAMEGPU->setVariable<float>("vx", vx);
FLAMEGPU->setVariable<float>("vy", vy);
FLAMEGPU->setVariable<float>("x", x + vx);
FLAMEGPU->setVariable<float>("y", y + vy);
// emit message for neighbors next step
FLAMEGPU->message_out.setVariable<float>("x", x);
FLAMEGPU->message_out.setVariable<float>("y", y);
return flamegpu::ALIVE;
}
Model description (pattern)
// model.cu
#include <flamegpu/flamegpu.h>
using namespace flamegpu;
ModelDescription model("abm");
AgentDescription agent = model.newAgent("A");
agent.newVariable<float>("x"); agent.newVariable<float>("y");
agent.newVariable<float>("vx"); agent.newVariable<float>("vy");
MessageSpatial2D::Description msg(model);
msg.setBounds(0, 100, 0, 100); // domain bounds
msg.setRadius(1.0f); // interaction radius
msg.newVariable<float>("x");
msg.newVariable<float>("y");
LayerDescription layer = model.newLayer("L");
layer.addAgentFunction(step);
This mirrors common NetLogo patterns (turtles + vision radius) but in a GPU‑friendly, structure‑of‑arrays layout.
./build/abm --agents 5_000_000 --steps 1000 --seed 42 \
--output data/checkpoints --checkpoint-interval 100
If results diverge, check message radius, boundary conditions, and update order.
Use numbers that matter for planning.
metrics:
agents: <N>
steps: <T>
wall_seconds: <…>
agent_steps_per_second: N*T / wall_seconds
cost_per_million_agent_steps: (price_per_hour * wall_seconds/3600) / 1e6 * (N*T)
Log GPU model/VRAM, driver, CUDA, FLAME GPU version, and the exact command line.
GPU idle
Too few agents or heavy host‑side work. Raise N, reduce per‑step I/O, or move setup into device code.
Out of memory
Slim agent variables, chunk outputs, or choose a larger‑VRAM profile.
Non‑deterministic results
Fix seeds and avoid unordered host‑side reductions. Document RNG.
Build errors
Match CUDA to your base image and CMake toolchain. Clean and rebuild.
hardware:
gpu: "<model> (<VRAM> GB)"
driver: "<NVIDIA driver>"
cuda: "<CUDA version>"
software:
flamegpu: "<version>"
image: "Ubuntu 24.04 LTS (CUDA 12.6)"
model:
domain: "[0,100]x[0,100]"
agents: <N>
rules: "cohesion-only demo"
run:
cmd: "./build/abm --agents <N> --steps <T> --seed 42"
checkpoints: "every 100 steps"
outputs:
wall_seconds: "<…>"
agent_steps_per_second: "<…>"
cost_per_million_agent_steps: "<…>"
Start a GPU instance with a CUDA-ready template (e.g., Ubuntu 24.04 LTS / CUDA 12.6) or your own GROMACS image. Enjoy flexible per-second billing with custom templates and the ability to start, stop, and resume your sessions at any time. Unsure about FP64 requirements? Contact support to help you select the ideal hardware profile for your computational needs.