Skip to content

Multi-person pose detection from single ESP32 CSI stream #97

@ruvnet

Description

@ruvnet

Problem

Currently the signal-derived pose estimation detects at most 1 person from a single ESP32 CSI stream. When 2+ people are in the room, only one is reported. This was confirmed during live testing with ESP32 hardware connected — GET /api/v1/pose/current returns 1 person even with 2 people present.

Root Cause

The current derive_pose_from_sensing() function in sensing-server/src/main.rs generates a single synthetic skeleton from aggregate CSI features (motion score, dominant frequency, spectral centroid). It has no mechanism to:

  1. Separate individual contributions from the CSI amplitude/phase data
  2. Estimate person count from the signal
  3. Generate distinct skeletons with different positions/poses

Proposed Solution (ADR-037)

Phase 1: Person Count Estimation

  • Use CSI signal variance, eigenvalue spread, or spectral complexity to estimate the number of people
  • Threshold-based approach: motion energy above N sigma suggests multiple occupants
  • Frequency-domain decomposition: distinct motion frequencies indicate separate individuals

Phase 2: Signal Decomposition

  • ICA (Independent Component Analysis): Decompose CSI subcarrier matrix into independent source signals
  • NMF (Non-negative Matrix Factorization): Separate additive contributions from multiple scatterers
  • Clustering: Group subcarrier responses by spatial coherence to identify distinct reflectors

Phase 3: Multi-Skeleton Generation

  • Map each decomposed signal component to a separate skeleton
  • Use spatial diversity from subcarrier phase to estimate relative positions
  • Kalman tracking per person with ID assignment via AETHER re-ID embeddings (ADR-024)

Phase 4: Neural Model Enhancement

  • Train multi-person model on MM-Fi dataset (ADR-015) which includes multi-person scenarios
  • Use the RVF training pipeline (ADR-036) to fine-tune with recorded CSI data
  • LoRA profile for multi-person specialization

Affected Components

Component Change Required
sensing-server/src/main.rs derive_pose_from_sensing() — multi-person output
signal/src/ruvsense/field_model.rs SVD eigenstructure for person count estimation
signal/src/ruvsense/pose_tracker.rs Multi-target Kalman tracking
ruvector/src/viewpoint/fusion.rs Multi-person fusion from multistatic array
nn/ Multi-person inference head
ui/components/PoseDetectionCanvas.js Already supports multi-person rendering
ui/utils/pose-renderer.js Already iterates over persons[] array

Constraints

  • Single ESP32 node provides 1 TX × 1 RX × 56 subcarriers — limited spatial resolution
  • Multi-person separation improves significantly with multiple ESP32 nodes (multistatic mesh, ADR-029)
  • Signal-derived approach will have lower accuracy than neural model approach
  • Person count estimation ceiling: ~3-4 people for single-node, ~8+ for mesh

Acceptance Criteria

  • Person count estimation from CSI features (accuracy > 80% for 1-3 people)
  • derive_pose_from_sensing() returns multiple persons when detected
  • Each person has distinct position and keypoint coordinates
  • Kalman tracking maintains person IDs across frames
  • UI renders multiple skeletons simultaneously
  • ADR-037 documents the approach and trade-offs

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions