Vehicle Counting / Person Counting / Fire-Smoke Detection | Updated: 2026-04-13
| Model | Task | Architecture | Input | Output | Key Metric |
|---|---|---|---|---|---|
| Vehicle V2 | 5-class counting | ResNet50 + FPN Decoder | 384x384 RGB | (5, 96, 96) density map | MAE 1.523 |
| Person | Person counting | ResNet50 + FPN Decoder | 384x384 RGB | (1, 96, 96) density map | MAE 1.798 |
| Fire/Smoke | Multi-label classification | MobileNetV3-Large | 224x224 RGB | (2,) logits [smoke, fire] | mAP 0.989 |
| Vehicle V1 | 5-class counting (regression) | ResNet50 + FC head | 384x384 RGB | (5,) log1p counts | MAE 1.744 |
All models stored as torch.save(model) full model (no class definition needed to load).
# 載入方式(全部模型通用)
import torch
# fp16 models (vehicle v2, person) — 推論時轉回 fp32
model = torch.load("vehicle_v2_density.pt", map_location="cpu", weights_only=False)
model.float() # fp16 -> fp32
model.eval()
# fp32 models (vehicle v1, fire/smoke) — 直接用
model = torch.load("vehicle_v1_regression.pt", map_location="cpu", weights_only=False)
model.eval()
5 classes: car truck bus motorcycle bicycle
Input: (B, 3, 384, 384) — RGB, ImageNet normalized
Output: (B, 5, 96, 96) — 5-channel density map at stride 4
ch0=car, ch1=truck, ch2=bus, ch3=motorcycle, ch4=bicycle
Backbone: ResNet50 (ImageNet V2 pretrained)
Decoder: FPN with lateral connections (layer1-4) + 3-conv head
Activation: ReLU (non-negative density)
import torch
from torchvision import transforms
from PIL import Image
model = torch.load("vehicle_v2_density.pt", map_location="cuda", weights_only=False)
model.float().eval().cuda()
CLASSES = ["car", "truck", "bus", "motorcycle", "bicycle"]
tf = transforms.Compose([
transforms.Resize((384, 384)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
img = Image.open("test.jpg").convert("RGB")
x = tf(img).unsqueeze(0).cuda()
with torch.no_grad():
density_map = model(x) # (1, 5, 96, 96)
counts = density_map.sum(dim=[2,3]) # (1, 5) — per-class counts
for i, cls in enumerate(CLASSES):
print(f"{cls}: {counts[0, i].item():.1f}")
# ROI counting (e.g. left half of image)
roi_count = density_map[0, 0, :, :48].sum().item() # car count in left half
| Class | MAE | RMSE |
|---|---|---|
| car | 2.90 | 5.26 |
| motorcycle | 2.32 | 4.84 |
| bicycle | 1.10 | 2.22 |
| truck | 0.80 | 1.72 |
| bus | 0.51 | 1.04 |
| Mean | 1.523 | 3.016 |
| Batch | img/s | ms/img | VRAM |
|---|---|---|---|
| 1 | 325.6 | 3.07 | 106 MiB |
| 4 (best) | 408.4 | 2.45 | 149 MiB |
| 32 | 325.2 | 3.08 | 830 MiB |
Single class: person | Gaussian sigma=2.5
Input: (B, 3, 384, 384) — RGB, ImageNet normalized
Output: (B, 1, 96, 96) — 1-channel person density map at stride 4
Same backbone + decoder as Vehicle V2, num_classes=1 instead of 5.
model = torch.load("person_density.pt", map_location="cuda", weights_only=False)
model.float().eval().cuda()
tf = transforms.Compose([
transforms.Resize((384, 384)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
img = Image.open("test.jpg").convert("RGB")
x = tf(img).unsqueeze(0).cuda()
with torch.no_grad():
density_map = model(x) # (1, 1, 96, 96)
person_count = density_map.sum().item() # total person count
print(f"Person count: {person_count:.1f}")
| Metric | Value |
|---|---|
| MAE | 1.798 |
| RMSE | 3.218 |
| Median Error | 0.94 |
| Error ≤ 1 person | 52.1% |
| Error ≤ 3 persons | 81.9% |
| GT Range | 1 ~ 92 persons |
| Mean GT / Pred | 7.44 / 7.15 (no bias) |
Multi-label: smoke fire | BCEWithLogitsLoss + Sigmoid
Input: (B, 3, 224, 224) — RGB, ImageNet normalized
Output: (B, 2) — raw logits [smoke, fire]
Apply torch.sigmoid() to get probabilities
Backbone: MobileNetV3-Large (ImageNet pretrained)
Head: FC -> 2 outputs
Loss: BCEWithLogitsLoss (multi-label, NOT softmax)
import torch
from torchvision import transforms
from PIL import Image
model = torch.load("fire_smoke.pt", map_location="cuda", weights_only=False)
model.eval().cuda()
tf = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
img = Image.open("test.jpg").convert("RGB")
inp = tf(img).unsqueeze(0).cuda()
with torch.no_grad():
logits = model(inp)
probs = torch.sigmoid(logits)[0]
smoke_prob = probs[0].item() # 0~1
fire_prob = probs[1].item() # 0~1
# Thresholds
SMOKE_THRESH = 0.32 # SK field recommended (general: 0.004)
FIRE_THRESH = 0.057 # optimal from full test set
print(f"Smoke: {smoke_prob:.3f} ({'YES' if smoke_prob >= SMOKE_THRESH else 'NO'})")
print(f"Fire: {fire_prob:.3f} ({'YES' if fire_prob >= FIRE_THRESH else 'NO'})")
0.004 — 全資料集最佳(AP=0.985, high recall)0.320 — SK 場域建議值(Precision 95.8%, Recall 61.5%)0.057 在全測試集 AP=0.993,但 SK 場域無火焰樣本無法驗證| mAP | 0.9892 |
| Smoke AP | 0.9851 (t=0.004) |
| Fire AP | 0.9933 (t=0.057) |
| Smoke P/R/F1 | 94.0% / 98.9% / 0.964 |
| Fire P/R/F1 | 95.2% / 97.4% / 0.963 |
| SK mAP | 0.4580 |
| Smoke AP | 0.9160 (t=0.320) |
| Fire AP | 0.0000 (no fire samples) |
| Smoke P/R/F1 | 95.8% / 61.5% / 0.749 |
| Domain gap | Significant (-53%) |
Peak: 4,534 img/s (fp16, batch 32)
DeepStream uses TensorRT. Convert PyTorch models via ONNX:
# Step 1: Export to ONNX
import torch
model = torch.load("vehicle_v2_density.pt", map_location="cpu", weights_only=False)
model.float().eval()
dummy = torch.randn(1, 3, 384, 384)
torch.onnx.export(
model, dummy, "vehicle_v2_density.onnx",
input_names=["input"],
output_names=["density_map"],
dynamic_axes={"input": {0: "batch"}, "density_map": {0: "batch"}},
opset_version=17,
)
# Step 2: Convert ONNX to TensorRT engine
# trtexec --onnx=vehicle_v2_density.onnx \
# --saveEngine=vehicle_v2_density.engine \
# --fp16 --workspace=4096 \
# --minShapes=input:1x3x384x384 \
# --optShapes=input:4x3x384x384 \
# --maxShapes=input:16x3x384x384
# Density Map models (Vehicle V2, Person):
# Output tensor: (B, C, 96, 96) float
# Post-process: sum over spatial dims -> per-class counts
# No NMS needed.
# Fire/Smoke model:
# Output tensor: (B, 2) float logits
# Post-process: sigmoid -> threshold comparison
# No NMS needed.
# All three models are "classifiers" from DeepStream's perspective:
# - No bounding boxes
# - No NMS
# - Custom post-processing via probe function or custom parser
| Parameter | Vehicle / Person | Fire / Smoke |
|---|---|---|
| Input size | 384 x 384 | 224 x 224 |
| Color format | RGB (not BGR) | RGB (not BGR) |
| Scaling | / 255.0 | / 255.0 |
| Mean subtraction | [0.485, 0.456, 0.406] | [0.485, 0.456, 0.406] |
| Std division | [0.229, 0.224, 0.225] | [0.229, 0.224, 0.225] |
nvinfer config 參數:
net-scale-factor=0.00392156862 (1/255),
offsets=123.675;116.28;103.53 (mean*255),
model-color-format=0 (RGB)
| Report | Contents | Link |
|---|---|---|
| Vehicle V1 vs V2 | MAE/RMSE comparison, throughput benchmark, scatter plot, error heatmaps, grid counts | vehicle-counting-report.pages.dev |
| Car Flow Cross-Domain | V2 model evaluated on Project 8 (car_flow) test set | car-flow-eval.pages.dev |
| Person Counting | Training curve, scatter plot, error distribution, visual comparisons | person-counting-report.pages.dev |
| Fire/Smoke Full | Full test set metrics, Grad-CAM, throughput benchmark | kaggle-reports: full report |
| Fire/Smoke SK | SK field independent test, per-channel analysis, deployment readiness | kaggle-reports: SK report |
# Vehicle V1 (fp32)
~/vehicle_counting/runs/20260412_081835/best_full.pt # 90 MB
# Vehicle V2 (fp32)
~/vehicle_counting/runs/20260412_094521_density/best_full.pt # 102 MB
# Person (fp32)
~/person_counting/runs/20260412_222716/best_full.pt # 102 MB
# Training scripts
~/vehicle_counting/train_density.py # Vehicle V2 training code
~/person_counting/train_person_density.py # Person training code
Generated: 2026-04-13 | Training: gx10 (NVIDIA GB10) | CVAT: raicvat.intemotech.com | Models R2: rai-model-download.workers.dev