Skip to content

Latest commit

 

History

History
74 lines (60 loc) · 3.49 KB

File metadata and controls

74 lines (60 loc) · 3.49 KB

Brain Project — Development Phases

Phase 1: Perception & Association (COMPLETE)

Goal: Learn cross-modal associations between vision and audio

  • Hebbian association matrix (512x512 bilinear map)
  • DINOv2 visual encoder (384-dim)
  • Whisper audio encoder (512-dim)
  • Sparse random projection (384/512 → 512 sparse)
  • VGGSound dataset (24,604 clips)
  • Gradient InfoNCE training (replaced broken Hebbian approximation)
  • Symmetric bidirectional loss (V→A + A→V)
  • Autonomous self-improvement (Cortex daemon + LLM mutations)
  • Web dashboard with goals tracking
  • YouTube video interaction

Results: v2a_MRR=0.55, a2v_MRR=0.54, 1,240+ experiments

Phase 2: Give It a Voice (COMPLETE)

Goal: The brain can describe what it perceives and answer questions

  • /api/brain/describe — describe associations for an input
  • /api/brain/ask — answer questions about learned patterns with cross-modal associations
  • /api/brain/chat — conversational interface with personality
  • Chat UI on the web dashboard (/chat page)
  • Brain personality (speaks in first person about its associations)
  • Conversation context (session-based history)
  • Template-based voice engine (instant responses, no LLM latency)
  • Stop word filtering + fuzzy stem matching for keyword search
  • Cross-modal association retrieval (visual→audio bridge)
  • YouTube video processing → brain perception description

Key insight: Template-based voice engine composes natural language from raw association scores instantly. LLM approach was abandoned due to CPU-only inference being too slow (~5s/token even for 0.5b model). The template engine gives personality and is immediate.

Phase 3: Real-Time Interaction

Goal: Process live audio/video and react instantly

  • WebSocket endpoint for streaming audio
  • Real-time encoding + association (< 2s latency)
  • Live waveform visualization in browser
  • Streaming association updates via SSE
  • Browser microphone capture (Web Audio API)
  • Live camera feed processing

Phase 4: Continuous Learning & Memory

Goal: Learn from new experiences, remember specific interactions

  • Online gradient InfoNCE updates from new inputs
  • Episodic memory store (specific interactions, not just aggregates)
  • "What did you learn today?" summarization
  • Learnable projection (replace random sparse projection)
  • Expand beyond VGGSound — learn from user-provided content
  • Memory consolidation (sleep-like offline replay)

Phase 5: Compositional Understanding

Goal: Reason about relationships, not just similarities

  • Graph of associations (beyond single bilinear matrix)
  • Multi-hop reasoning: A sounds like B, B looks like C → A relates to C
  • Concept formation — cluster similar associations into abstract categories
  • Text embedding bridge (CLIP) for language-grounded queries
  • Causal associations — "rain causes puddles" not just "rain co-occurs with puddles"
  • Attention over association graph for complex queries

Phase 6: Autonomous Agent

Goal: Self-directed exploration and communication

  • Initiate conversation based on interesting patterns discovered
  • Self-reflection on matrix changes ("I learned something new about...")
  • Goal-directed perception — seek out inputs that fill knowledge gaps
  • Multi-agent interaction — share associations with other brain instances
  • Report mutation results and discoveries in natural language