RAI AI Models - DeepStream Integration Guide

Vehicle Counting / Person Counting / Fire-Smoke Detection | Updated: 2026-04-13

Contents 1. Models Overview 2. Download (.pt files) 3. Vehicle Counting (Density Map) 4. Person Counting (Density Map) 5. Fire/Smoke Detection (Classification) 6. DeepStream Integration Notes 7. Detailed Reports

1. Models Overview

ModelTaskArchitectureInputOutputKey Metric
Vehicle V2 5-class counting ResNet50 + FPN Decoder 384x384 RGB (5, 96, 96) density map MAE 1.523
Person Person counting ResNet50 + FPN Decoder 384x384 RGB (1, 96, 96) density map MAE 1.798
Fire/Smoke Multi-label classification MobileNetV3-Large 224x224 RGB (2,) logits [smoke, fire] mAP 0.989
Vehicle V1 5-class counting (regression) ResNet50 + FC head 384x384 RGB (5,) log1p counts MAE 1.744

2. Download

All models stored as torch.save(model) full model (no class definition needed to load).

Vehicle V2 Density Map51 MB (fp16) Person Density Map51 MB (fp16) Vehicle V1 Regression90 MB (fp32) Fire/Smoke Detection16 MB (fp32)
模型清單 API: rai-model-download.workers.dev
所有 .pt 檔案皆為 torch.save(model) 完整模型,載入不需要 class 定義。
# 載入方式(全部模型通用)
import torch

# fp16 models (vehicle v2, person) — 推論時轉回 fp32
model = torch.load("vehicle_v2_density.pt", map_location="cpu", weights_only=False)
model.float()   # fp16 -> fp32
model.eval()

# fp32 models (vehicle v1, fire/smoke) — 直接用
model = torch.load("vehicle_v1_regression.pt", map_location="cpu", weights_only=False)
model.eval()

3. Vehicle Counting (Density Map)

RECOMMENDED: V2 Density Map

5 classes: car truck bus motorcycle bicycle

Architecture

Input:  (B, 3, 384, 384)   — RGB, ImageNet normalized
Output: (B, 5, 96, 96)     — 5-channel density map at stride 4
                              ch0=car, ch1=truck, ch2=bus, ch3=motorcycle, ch4=bicycle

Backbone: ResNet50 (ImageNet V2 pretrained)
Decoder:  FPN with lateral connections (layer1-4) + 3-conv head
Activation: ReLU (non-negative density)

Inference

import torch
from torchvision import transforms
from PIL import Image

model = torch.load("vehicle_v2_density.pt", map_location="cuda", weights_only=False)
model.float().eval().cuda()

CLASSES = ["car", "truck", "bus", "motorcycle", "bicycle"]
tf = transforms.Compose([
    transforms.Resize((384, 384)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("test.jpg").convert("RGB")
x = tf(img).unsqueeze(0).cuda()

with torch.no_grad():
    density_map = model(x)              # (1, 5, 96, 96)
    counts = density_map.sum(dim=[2,3]) # (1, 5) — per-class counts

for i, cls in enumerate(CLASSES):
    print(f"{cls}: {counts[0, i].item():.1f}")

# ROI counting (e.g. left half of image)
roi_count = density_map[0, 0, :, :48].sum().item()  # car count in left half

Accuracy (Validation: 2,077 images)

ClassMAERMSE
car2.905.26
motorcycle2.324.84
bicycle1.102.22
truck0.801.72
bus0.511.04
Mean1.5233.016

Throughput (NVIDIA GB10, bf16)

Batchimg/sms/imgVRAM
1325.63.07106 MiB
4 (best)408.42.45149 MiB
32325.23.08830 MiB

4. Person Counting (Density Map)

Person Density Map

Single class: person | Gaussian sigma=2.5

Architecture

Input:  (B, 3, 384, 384)   — RGB, ImageNet normalized
Output: (B, 1, 96, 96)     — 1-channel person density map at stride 4

Same backbone + decoder as Vehicle V2, num_classes=1 instead of 5.

Inference

model = torch.load("person_density.pt", map_location="cuda", weights_only=False)
model.float().eval().cuda()

tf = transforms.Compose([
    transforms.Resize((384, 384)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("test.jpg").convert("RGB")
x = tf(img).unsqueeze(0).cuda()

with torch.no_grad():
    density_map = model(x)                  # (1, 1, 96, 96)
    person_count = density_map.sum().item() # total person count

print(f"Person count: {person_count:.1f}")

Accuracy (Validation: 2,031 images)

MetricValue
MAE1.798
RMSE3.218
Median Error0.94
Error ≤ 1 person52.1%
Error ≤ 3 persons81.9%
GT Range1 ~ 92 persons
Mean GT / Pred7.44 / 7.15 (no bias)

5. Fire/Smoke Detection (Classification)

MobileNetV3-Large v20260410

Multi-label: smoke fire | BCEWithLogitsLoss + Sigmoid

Architecture

Input:  (B, 3, 224, 224)   — RGB, ImageNet normalized
Output: (B, 2)             — raw logits [smoke, fire]
                              Apply torch.sigmoid() to get probabilities

Backbone: MobileNetV3-Large (ImageNet pretrained)
Head: FC -> 2 outputs
Loss: BCEWithLogitsLoss (multi-label, NOT softmax)

Inference

import torch
from torchvision import transforms
from PIL import Image

model = torch.load("fire_smoke.pt", map_location="cuda", weights_only=False)
model.eval().cuda()

tf = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("test.jpg").convert("RGB")
inp = tf(img).unsqueeze(0).cuda()

with torch.no_grad():
    logits = model(inp)
    probs = torch.sigmoid(logits)[0]

smoke_prob = probs[0].item()  # 0~1
fire_prob  = probs[1].item()  # 0~1

# Thresholds
SMOKE_THRESH = 0.32   # SK field recommended (general: 0.004)
FIRE_THRESH  = 0.057  # optimal from full test set

print(f"Smoke: {smoke_prob:.3f} ({'YES' if smoke_prob >= SMOKE_THRESH else 'NO'})")
print(f"Fire:  {fire_prob:.3f} ({'YES' if fire_prob >= FIRE_THRESH else 'NO'})")
Threshold 注意事項

Accuracy

Full Test Set (13,992 images)

mAP0.9892
Smoke AP0.9851 (t=0.004)
Fire AP0.9933 (t=0.057)
Smoke P/R/F194.0% / 98.9% / 0.964
Fire P/R/F195.2% / 97.4% / 0.963

SK Field Test (3,877 images)

SK mAP0.4580
Smoke AP0.9160 (t=0.320)
Fire AP0.0000 (no fire samples)
Smoke P/R/F195.8% / 61.5% / 0.749
Domain gapSignificant (-53%)

Throughput (NVIDIA GB10)

Peak: 4,534 img/s (fp16, batch 32)

6. DeepStream Integration Notes

Model Conversion for DeepStream

DeepStream uses TensorRT. Convert PyTorch models via ONNX:

# Step 1: Export to ONNX
import torch
model = torch.load("vehicle_v2_density.pt", map_location="cpu", weights_only=False)
model.float().eval()
dummy = torch.randn(1, 3, 384, 384)

torch.onnx.export(
    model, dummy, "vehicle_v2_density.onnx",
    input_names=["input"],
    output_names=["density_map"],
    dynamic_axes={"input": {0: "batch"}, "density_map": {0: "batch"}},
    opset_version=17,
)

# Step 2: Convert ONNX to TensorRT engine
# trtexec --onnx=vehicle_v2_density.onnx \
#          --saveEngine=vehicle_v2_density.engine \
#          --fp16 --workspace=4096 \
#          --minShapes=input:1x3x384x384 \
#          --optShapes=input:4x3x384x384 \
#          --maxShapes=input:16x3x384x384

Post-Processing in DeepStream

# Density Map models (Vehicle V2, Person):
#   Output tensor: (B, C, 96, 96) float
#   Post-process: sum over spatial dims -> per-class counts
#   No NMS needed.

# Fire/Smoke model:
#   Output tensor: (B, 2) float logits
#   Post-process: sigmoid -> threshold comparison
#   No NMS needed.

# All three models are "classifiers" from DeepStream's perspective:
#   - No bounding boxes
#   - No NMS
#   - Custom post-processing via probe function or custom parser

Input Preprocessing (must match training)

ParameterVehicle / PersonFire / Smoke
Input size384 x 384224 x 224
Color formatRGB (not BGR)RGB (not BGR)
Scaling/ 255.0/ 255.0
Mean subtraction[0.485, 0.456, 0.406][0.485, 0.456, 0.406]
Std division[0.229, 0.224, 0.225][0.229, 0.224, 0.225]

nvinfer config 參數: net-scale-factor=0.00392156862 (1/255), offsets=123.675;116.28;103.53 (mean*255), model-color-format=0 (RGB)

7. Detailed Reports

ReportContentsLink
Vehicle V1 vs V2 MAE/RMSE comparison, throughput benchmark, scatter plot, error heatmaps, grid counts vehicle-counting-report.pages.dev
Car Flow Cross-Domain V2 model evaluated on Project 8 (car_flow) test set car-flow-eval.pages.dev
Person Counting Training curve, scatter plot, error distribution, visual comparisons person-counting-report.pages.dev
Fire/Smoke Full Full test set metrics, Grad-CAM, throughput benchmark kaggle-reports: full report
Fire/Smoke SK SK field independent test, per-channel analysis, deployment readiness kaggle-reports: SK report

Model Files on gx10

# Vehicle V1 (fp32)
~/vehicle_counting/runs/20260412_081835/best_full.pt      # 90 MB

# Vehicle V2 (fp32)
~/vehicle_counting/runs/20260412_094521_density/best_full.pt  # 102 MB

# Person (fp32)
~/person_counting/runs/20260412_222716/best_full.pt       # 102 MB

# Training scripts
~/vehicle_counting/train_density.py      # Vehicle V2 training code
~/person_counting/train_person_density.py # Person training code

Generated: 2026-04-13 | Training: gx10 (NVIDIA GB10) | CVAT: raicvat.intemotech.com | Models R2: rai-model-download.workers.dev