RAI AI Models - DeepStream Integration Guide

Vehicle Counting / Person Counting / Fire-Smoke Detection | Updated: 2026-04-13

Contents 1. Models Overview 2. Download (.pt files) 3. Vehicle Counting (Density Map) 4. Person Counting (Density Map) 5. Fire/Smoke Detection (Classification) 6. DeepStream Integration Notes 7. Detailed Reports

1. Models Overview

Model	Task	Architecture	Input	Output	Key Metric
Vehicle V2	5-class counting	ResNet50 + FPN Decoder	384x384 RGB	(5, 96, 96) density map	MAE 1.523
Person	Person counting	ResNet50 + FPN Decoder	384x384 RGB	(1, 96, 96) density map	MAE 1.798
Fire/Smoke	Multi-label classification	MobileNetV3-Large	224x224 RGB	(2,) logits [smoke, fire]	mAP 0.989
Vehicle V1	5-class counting (regression)	ResNet50 + FC head	384x384 RGB	(5,) log1p counts	MAE 1.744

2. Download

All models stored as torch.save(model) full model (no class definition needed to load).

Vehicle V2 Density Map51 MB (fp16) Person Density Map51 MB (fp16) Vehicle V1 Regression90 MB (fp32) Fire/Smoke v2026041316 MB (fp32, latest) Fire/Smoke v2026041016 MB (fp32, prev)

模型清單 API: rai-model-download.workers.dev
所有 .pt 檔案皆為 torch.save(model) 完整模型，載入不需要 class 定義。

# 載入方式（全部模型通用）
import torch

# fp16 models (vehicle v2, person) — 推論時轉回 fp32
model = torch.load("vehicle_v2_density.pt", map_location="cpu", weights_only=False)
model.float()   # fp16 -> fp32
model.eval()

# fp32 models (vehicle v1, fire/smoke) — 直接用
model = torch.load("vehicle_v1_regression.pt", map_location="cpu", weights_only=False)
model.eval()

3. Vehicle Counting (Density Map)

RECOMMENDED: V2 Density Map

5 classes: car truck bus motorcycle bicycle

Architecture

Input:  (B, 3, 384, 384)   — RGB, ImageNet normalized
Output: (B, 5, 96, 96)     — 5-channel density map at stride 4
                              ch0=car, ch1=truck, ch2=bus, ch3=motorcycle, ch4=bicycle

Backbone: ResNet50 (ImageNet V2 pretrained)
Decoder:  FPN with lateral connections (layer1-4) + 3-conv head
Activation: ReLU (non-negative density)

Inference

import torch
from torchvision import transforms
from PIL import Image

model = torch.load("vehicle_v2_density.pt", map_location="cuda", weights_only=False)
model.float().eval().cuda()

CLASSES = ["car", "truck", "bus", "motorcycle", "bicycle"]
tf = transforms.Compose([
    transforms.Resize((384, 384)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("test.jpg").convert("RGB")
x = tf(img).unsqueeze(0).cuda()

with torch.no_grad():
    density_map = model(x)              # (1, 5, 96, 96)
    counts = density_map.sum(dim=[2,3]) # (1, 5) — per-class counts

for i, cls in enumerate(CLASSES):
    print(f"{cls}: {counts[0, i].item():.1f}")

# ROI counting (e.g. left half of image)
roi_count = density_map[0, 0, :, :48].sum().item()  # car count in left half

Accuracy (Validation: 2,077 images)

Class	MAE	RMSE
car	2.90	5.26
motorcycle	2.32	4.84
bicycle	1.10	2.22
truck	0.80	1.72
bus	0.51	1.04
Mean	1.523	3.016

Throughput (NVIDIA GB10, bf16)

Batch	img/s	ms/img	VRAM
1	325.6	3.07	106 MiB
4 (best)	408.4	2.45	149 MiB
32	325.2	3.08	830 MiB

4. Person Counting (Density Map)

Person Density Map

Single class: person | Gaussian sigma=2.5

Architecture

Input:  (B, 3, 384, 384)   — RGB, ImageNet normalized
Output: (B, 1, 96, 96)     — 1-channel person density map at stride 4

Same backbone + decoder as Vehicle V2, num_classes=1 instead of 5.

Inference

model = torch.load("person_density.pt", map_location="cuda", weights_only=False)
model.float().eval().cuda()

tf = transforms.Compose([
    transforms.Resize((384, 384)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("test.jpg").convert("RGB")
x = tf(img).unsqueeze(0).cuda()

with torch.no_grad():
    density_map = model(x)                  # (1, 1, 96, 96)
    person_count = density_map.sum().item() # total person count

print(f"Person count: {person_count:.1f}")

Accuracy (Validation: 2,031 images)

Metric	Value
MAE	1.798
RMSE	3.218
Median Error	0.94
Error ≤ 1 person	52.1%
Error ≤ 3 persons	81.9%
GT Range	1 ~ 92 persons
Mean GT / Pred	7.44 / 7.15 (no bias)

5. Fire/Smoke Detection (Classification)

MobileNetV3-Large v20260410

Multi-label: smoke fire | BCEWithLogitsLoss + Sigmoid

Architecture

Input:  (B, 3, 224, 224)   — RGB, ImageNet normalized
Output: (B, 2)             — raw logits [smoke, fire]
                              Apply torch.sigmoid() to get probabilities

Backbone: MobileNetV3-Large (ImageNet pretrained)
Head: FC -> 2 outputs
Loss: BCEWithLogitsLoss (multi-label, NOT softmax)

Inference

import torch
from torchvision import transforms
from PIL import Image

model = torch.load("fire_smoke.pt", map_location="cuda", weights_only=False)
model.eval().cuda()

tf = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("test.jpg").convert("RGB")
inp = tf(img).unsqueeze(0).cuda()

with torch.no_grad():
    logits = model(inp)
    probs = torch.sigmoid(logits)[0]

smoke_prob = probs[0].item()  # 0~1
fire_prob  = probs[1].item()  # 0~1

# Thresholds
SMOKE_THRESH = 0.32   # SK field recommended (general: 0.004)
FIRE_THRESH  = 0.057  # optimal from full test set

print(f"Smoke: {smoke_prob:.3f} ({'YES' if smoke_prob >= SMOKE_THRESH else 'NO'})")
print(f"Fire:  {fire_prob:.3f} ({'YES' if fire_prob >= FIRE_THRESH else 'NO'})")

Threshold 注意事項

Smoke threshold 有兩個版本：
- 0.004 — 全資料集最佳（AP=0.985, high recall）
- 0.320 — SK 場域建議值（Precision 95.8%, Recall 61.5%）
SK 場域煙霧以薄煙/白煙/蒸氣為主，與訓練集野火濃煙差異大，模型機率普遍偏低
Fire threshold 0.057 在全測試集 AP=0.993，但 SK 場域無火焰樣本無法驗證
建議： 部署時依場域實際情況調整 threshold

Accuracy

Full Test Set (13,992 images)

mAP	0.9892
Smoke AP	0.9851 (t=0.004)
Fire AP	0.9933 (t=0.057)
Smoke P/R/F1	94.0% / 98.9% / 0.964
Fire P/R/F1	95.2% / 97.4% / 0.963

SK Field Test (3,877 images)

SK mAP	0.4580
Smoke AP	0.9160 (t=0.320)
Fire AP	0.0000 (no fire samples)
Smoke P/R/F1	95.8% / 61.5% / 0.749
Domain gap	Significant (-53%)

Throughput (NVIDIA GB10)

Peak: 4,534 img/s (fp16, batch 32)

6. DeepStream Integration Notes

Model Conversion for DeepStream

DeepStream uses TensorRT. Convert PyTorch models via ONNX:

# Step 1: Export to ONNX
import torch
model = torch.load("vehicle_v2_density.pt", map_location="cpu", weights_only=False)
model.float().eval()
dummy = torch.randn(1, 3, 384, 384)

torch.onnx.export(
    model, dummy, "vehicle_v2_density.onnx",
    input_names=["input"],
    output_names=["density_map"],
    dynamic_axes={"input": {0: "batch"}, "density_map": {0: "batch"}},
    opset_version=17,
)

# Step 2: Convert ONNX to TensorRT engine
# trtexec --onnx=vehicle_v2_density.onnx \
#          --saveEngine=vehicle_v2_density.engine \
#          --fp16 --workspace=4096 \
#          --minShapes=input:1x3x384x384 \
#          --optShapes=input:4x3x384x384 \
#          --maxShapes=input:16x3x384x384

Post-Processing in DeepStream

# Density Map models (Vehicle V2, Person):
#   Output tensor: (B, C, 96, 96) float
#   Post-process: sum over spatial dims -> per-class counts
#   No NMS needed.

# Fire/Smoke model:
#   Output tensor: (B, 2) float logits
#   Post-process: sigmoid -> threshold comparison
#   No NMS needed.

# All three models are "classifiers" from DeepStream's perspective:
#   - No bounding boxes
#   - No NMS
#   - Custom post-processing via probe function or custom parser

Input Preprocessing (must match training)

Parameter	Vehicle / Person	Fire / Smoke
Input size	384 x 384	224 x 224
Color format	RGB (not BGR)	RGB (not BGR)
Scaling	/ 255.0	/ 255.0
Mean subtraction	[0.485, 0.456, 0.406]	[0.485, 0.456, 0.406]
Std division	[0.229, 0.224, 0.225]	[0.229, 0.224, 0.225]

nvinfer config 參數： net-scale-factor=0.00392156862 (1/255), offsets=123.675;116.28;103.53 (mean*255), model-color-format=0 (RGB)

7. Detailed Reports

Report	Contents	Link
Vehicle V1 vs V2	MAE/RMSE comparison, throughput benchmark, scatter plot, error heatmaps, grid counts	vehicle-counting-report.pages.dev
Car Flow Cross-Domain	V2 model evaluated on Project 8 (car_flow) test set	car-flow-eval.pages.dev
Person Counting	Training curve, scatter plot, error distribution, visual comparisons	person-counting-report.pages.dev
Fire/Smoke Full	Full test set metrics, Grad-CAM, throughput benchmark	kaggle-reports: full report
Fire/Smoke SK	SK field independent test, per-channel analysis, deployment readiness	kaggle-reports: SK report

Model Files on gx10

# Vehicle V1 (fp32)
~/vehicle_counting/runs/20260412_081835/best_full.pt      # 90 MB

# Vehicle V2 (fp32)
~/vehicle_counting/runs/20260412_094521_density/best_full.pt  # 102 MB

# Person (fp32)
~/person_counting/runs/20260412_222716/best_full.pt       # 102 MB

# Training scripts
~/vehicle_counting/train_density.py      # Vehicle V2 training code
~/person_counting/train_person_density.py # Person training code

Generated: 2026-04-13 | Training: gx10 (NVIDIA GB10) | CVAT: raicvat.intemotech.com | Models R2: rai-model-download.workers.dev