feat: Edge deployment optimization with GPU-native BEV, modern CUDA ops, and gradient accumulation by fthbng77 · Pull Request #12 · TRV-Lab/MAFF-Net

fthbng77 · 2026-03-15T13:55:29Z

Summary

This PR introduces several improvements aimed at edge deployment optimization
and modern PyTorch compatibility:

GPU-native GridDensityBEV module: Replaces CQCA_cfa's CPU-based DBSCAN
clustering with a GPU-native grid density approach for edge deployment scenarios
Modern CUDA ops compatibility: Replaces deprecated THC/THC.h headers with
ATen/cuda/CUDAContext.h across all CUDA extension files, removing obsolete
extern THCState declarations for PyTorch 2.x support
Gradient accumulation support: Enables training on memory-constrained GPUs
(8GB VRAM) by reducing batch size to 1 with 4 accumulation steps
Cleanup: Removes pre-compiled .so binaries from tracking (should be built
from source per environment)

Changes

pcdet/models/backbones_image/CQCA_cfa.py — New GridDensityBEV module
pcdet/ops/**/src/*.cpp — Updated CUDA extension headers for PyTorch 2.x
tools/train_utils/train_utils.py — Gradient accumulation in training loop
tools/cfgs/MAFF-Net/MAFF-Net_vod.yaml — Config updates for edge deployment
Removed all pre-compiled .so files from version control

Motivation

Enable MAFF-Net deployment on edge devices with limited GPU resources
Fix build failures with PyTorch >= 2.0 due to deprecated THC headers
Allow training on consumer GPUs (e.g., RTX 4060 8GB) via gradient accumulation

- Replace deprecated THC/THC.h with ATen/cuda/CUDAContext.h in all CUDA extension cpp files - Remove obsolete extern THCState declarations - Update VoD dataset paths to match local directory structure - Add .gitignore for build artifacts, data, and IDE files - Add anchor visualization tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Reduce BATCH_SIZE_PER_GPU to 1 with ACCUMULATION_STEPS=4 for 8GB VRAM - Implement gradient accumulation in train_one_epoch - Remove compiled .so files from git (already in .gitignore)

…fa's CPU-based DBSCAN for edge optimization, updating the image backbone configuration and documentation.

…ataset, leveraging GridDensityBEV, AMP, and a dedicated configuration.

fthbng77 and others added 5 commits February 15, 2026 14:36

Add gradient accumulation, remove .so binaries from tracking

fb2fa45

- Reduce BATCH_SIZE_PER_GPU to 1 with ACCUMULATION_STEPS=4 for 8GB VRAM - Implement gradient accumulation in train_one_epoch - Remove compiled .so files from git (already in .gitignore)

feat: Introduce GPU-native GridDensityBEV as a replacement for CQCA_c…

ae519c2

…fa's CPU-based DBSCAN for edge optimization, updating the image backbone configuration and documentation.

feat: Introduce Google Colab training setup for MAFF-Net on the VoD d…

efd23b8

…ataset, leveraging GridDensityBEV, AMP, and a dedicated configuration.

fix: Update Git clone URL to personal fork.

c221ec7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Edge deployment optimization with GPU-native BEV, modern CUDA ops, and gradient accumulation #12

feat: Edge deployment optimization with GPU-native BEV, modern CUDA ops, and gradient accumulation #12
fthbng77 wants to merge 5 commits intoTRV-Lab:masterfrom
fthbng77:master

fthbng77 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fthbng77 commented Mar 15, 2026

Summary

Changes

Motivation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant