Skip to content

xDarkicex/rnxa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

rnxa - Hardware-Accelerated ML Compute Engine for Go

Go Reference Go Report Card Go Version Objective-C Apple Developer Metal License: MIT

engine that leverages Apple's Metal Performance Shaders to dramatically speed up tensor operations in Go.*

rnxa (pronounced "RNA") provides hardware-accelerated tensor operations for machine learning workloads in Go. Initially developed to accelerate the relux neural network framework, rnxa is designed as a universal compute backend that can integrate with any Go ML framework.


๐ŸŽฏ Why rnxa?

Performance that Scales:

  • ๐Ÿš€ 20-50x speedup for large matrix operations on Apple Silicon
  • โšก GPU acceleration via Metal Performance Shaders
  • ๐Ÿ”„ Smart fallbacks to optimized CPU implementations
  • ๐Ÿ“Š Zero overhead when GPU acceleration isn't beneficial

Framework Agnostic:

  • ๐Ÿ”Œ Universal interface - works with any Go ML framework
  • ๐Ÿงฉ Clean abstractions - no framework-specific dependencies
  • ๐Ÿ›ก๏ธ Production ready - comprehensive error handling and resource management
  • ๐Ÿ“ˆ Scalable - from small models to large neural networks

๐Ÿ—๏ธ Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Go ML         โ”‚    โ”‚      rnxa        โ”‚    โ”‚   Hardware      โ”‚
โ”‚  Framework      โ”‚โ”€โ”€โ”€โ–ถโ”‚  ComputeEngine   โ”‚โ”€โ”€โ”€โ–ถโ”‚  Acceleration   โ”‚
โ”‚ (relux, etc.)   โ”‚    โ”‚   Interface      โ”‚    โ”‚ (Metal/CPU)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Components

  • ComputeEngine Interface: Framework-agnostic tensor operations
  • Metal Backend: GPU acceleration via Apple's Metal Performance Shaders
  • CPU Backend: Optimized fallback with SIMD vectorization
  • Device Management: Automatic detection and selection of compute devices
  • Tensor Abstraction: Efficient n-dimensional array representation

โšก Quick Start

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3) or Intel Mac with Metal support
  • Go 1.25+
  • Xcode Command Line Tools (xcode-select version 2410+)
# Install Xcode Command Line Tools
xcode-select --install

# Verify installation
xcode-select --version
# Should output: xcode-select version 2410 or higher

Installation

go get github.com/xDarkicex/rnxa

Basic Usage

package main

import (
    "context"
    "fmt"
    "github.com/xDarkicex/rnxa"
)

func main() {
    // Create compute engine (auto-detects best device)
    engine, err := rnxa.NewEngine()
    if err != nil {
        panic(err)
    }
    defer engine.Close()

    fmt.Printf("Using device: %s (%s)\n", 
        engine.Device().Name, engine.Device().Platform)

    // Create tensors
    A := rnxa.NewTensor([]float64{1, 2, 3, 4, 5, 6}, 2, 3)
    B := rnxa.NewTensor([]float64{7, 8, 9, 10, 11, 12}, 3, 2)

    // Perform matrix multiplication
    ctx := context.Background()
    C, err := engine.MatMul(ctx, A, B)
    if err != nil {
        panic(err)
    }

    fmt.Printf("Result: %v\n", C.Data())
    // Output: [58 64 139 154]
}

๐Ÿ“– Comprehensive Examples

Neural Network Layer Forward Pass

func neuralLayerForward(engine rnxa.ComputeEngine, 
                       inputs []float64, 
                       weights [][]float64, 
                       bias []float64) ([]float64, error) {
    ctx := context.Background()
    
    // Convert to tensors
    inputTensor := rnxa.NewTensor(inputs, 1, len(inputs))
    weightTensor := convertToTensor(weights)
    biasTensor := rnxa.NewTensor(bias, len(bias))
    
    // Matrix multiplication: inputs ร— weights
    matmulResult, err := engine.MatMul(ctx, inputTensor, weightTensor)
    if err != nil {
        return nil, err
    }
    
    // Add bias
    biasResult, err := engine.VectorAdd(ctx, matmulResult, biasTensor)
    if err != nil {
        return nil, err
    }
    
    // Apply ReLU activation
    activated, err := engine.ReLU(ctx, biasResult)
    if err != nil {
        return nil, err
    }
    
    return activated.Data(), nil
}

Activation Function Comparison

func compareActivations() {
    engine, _ := rnxa.NewEngine()
    defer engine.Close()
    
    ctx := context.Background()
    input := rnxa.NewTensor([]float64{-2, -1, 0, 1, 2})
    
    // Compare different activation functions
    activations := map[string]func(context.Context, *rnxa.Tensor) (*rnxa.Tensor, error){
        "ReLU":    engine.ReLU,
        "Sigmoid": engine.Sigmoid,
        "Tanh":    engine.Tanh,
    }
    
    fmt.Printf("Input: %v\n", input.Data())
    for name, fn := range activations {
        result, _ := fn(ctx, input)
        fmt.Printf("%s: %v\n", name, result.Data())
    }
}

Device Information and Benchmarking

func deviceInfo() {
    // List all available devices
    devices := rnxa.DetectDevices()
    
    for i, device := range devices {
        fmt.Printf("Device %d: %s\n", i, device.Name)
        fmt.Printf("  Platform: %s\n", device.Platform)
        fmt.Printf("  Cores: %d\n", device.Cores)
        fmt.Printf("  Memory: %.1fGB\n", float64(device.Memory)/1e9)
    }
    
    // Create engine and check memory
    engine, _ := rnxa.NewEngine()
    defer engine.Close()
    
    memory := engine.Memory()
    fmt.Printf("\nActive Device Memory:\n")
    fmt.Printf("  Total: %.1fGB\n", float64(memory.Total)/1e9)
    fmt.Printf("  Available: %.1fGB\n", float64(memory.Available)/1e9)
}

๐Ÿ”ง Integration with ML Frameworks

Framework-Agnostic Design

rnxa exposes low-level tensor operations that any ML framework can use:

// Core operations available to any framework
type ComputeEngine interface {
    // Matrix operations
    MatMul(ctx context.Context, A, B *Tensor) (*Tensor, error)
    
    // Element-wise operations  
    VectorAdd(ctx context.Context, A, B *Tensor) (*Tensor, error)
    VectorSub(ctx context.Context, A, B *Tensor) (*Tensor, error)
    VectorMul(ctx context.Context, A, B *Tensor) (*Tensor, error)
    
    // Activation functions
    ReLU(ctx context.Context, X *Tensor) (*Tensor, error)
    Sigmoid(ctx context.Context, X *Tensor) (*Tensor, error)
    Tanh(ctx context.Context, X *Tensor) (*Tensor, error)
    Softmax(ctx context.Context, X *Tensor) (*Tensor, error)
    
    // Reduction operations
    Sum(ctx context.Context, X *Tensor, axis int) (*Tensor, error)
    Mean(ctx context.Context, X *Tensor, axis int) (*Tensor, error)
    
    // Device management
    Device() Device
    Available() bool
    Memory() MemoryInfo
    Close() error
}

Integration Examples

With relux (Native Integration):

// relux automatically uses rnxa when available
net, _ := relux.NewNetwork(
    relux.WithConfig(config),
    relux.WithBackend("auto"), // Uses rnxa if available
)

With Custom Frameworks:

// Any framework can use rnxa as a compute backend
type MyFramework struct {
    engine rnxa.ComputeEngine
}

func (f *MyFramework) ForwardPass(inputs []float64) []float64 {
    // Use rnxa for heavy computation
    tensor := rnxa.NewTensor(inputs)
    result, _ := f.engine.ReLU(context.Background(), tensor)
    return result.Data()
}

๐Ÿ“Š Performance Benchmarks

Matrix Multiplication Performance

Matrix Size Pure Go rnxa (CPU) rnxa (Metal) Speedup
32ร—32 0.12ms 0.08ms 0.15ms 1.5x (CPU)
128ร—128 2.1ms 0.9ms 0.3ms 7x (Metal)
512ร—512 85ms 28ms 4.2ms 20x (Metal)
1024ร—1024 680ms 195ms 28ms 24x (Metal)

Real-World ML Workload

Neural Network Training (Iris Dataset):
- Pure Go: 0.85 seconds
- rnxa (Metal): 0.12 seconds  
- Speedup: 7.1x

Large Model Inference:
- Pure Go: 45 seconds
- rnxa (Metal): 2.8 seconds
- Speedup: 16x

๐Ÿ› ๏ธ Advanced Configuration

Explicit Device Selection

// Force CPU backend
engine, err := rnxa.NewEngineWithDevice(1) // Assuming device 1 is CPU

// Check device capabilities
if rnxa.IsMetalAvailable() {
    fmt.Println("Metal acceleration available!")
    device := rnxa.GetBestDevice()
    fmt.Printf("Best device: %s\n", device.Name)
}

Error Handling and Fallbacks

func robustComputation(A, B *rnxa.Tensor) (*rnxa.Tensor, error) {
    engine, err := rnxa.NewEngine()
    if err != nil {
        return nil, fmt.Errorf("failed to create engine: %w", err)
    }
    defer engine.Close()
    
    ctx := context.Background()
    
    // Attempt Metal computation
    result, err := engine.MatMul(ctx, A, B)
    if err != nil {
        // Automatic fallback to CPU is handled internally
        return nil, fmt.Errorf("computation failed: %w", err)
    }
    
    return result, nil
}

๐Ÿงช Testing and Validation

Run the comprehensive test suite:

# Run all tests
go test -v

# Run benchmarks
go test -bench=. -benchmem

# Test specific operations
go test -run TestMatrixMultiplication -v
go test -run TestActivationFunctions -v

Sample Test Output:

=== RUN   TestDeviceDetection
    metal_test.go:29: Found 2 devices:
    metal_test.go:31:   Device 0:  (Metal) - 10 cores, 17.2GB memory
    metal_test.go:31:   Device 1: CPU (CPU) - 8 cores, 8.6GB memory
--- PASS: TestDeviceDetection (0.05s)

=== RUN   TestMatrixMultiplication  
    metal_test.go:80: MatMul test passed on Metal
--- PASS: TestMatrixMultiplication (0.00s)

๐Ÿšง Roadmap

Phase 1: Metal Optimization (Current)

  • โœ… Metal Performance Shaders integration
  • โœ… Automatic GPU/CPU selection
  • โœ… Comprehensive activation functions
  • โœ… Production-ready error handling

Phase 2: Cross-Platform CUDA (Q2 2026)

  • ๐Ÿ”„ Linux CUDA Support
    • NVIDIA GPU acceleration on Linux
    • cuBLAS and cuDNN integration
    • Docker containerization support

Phase 3: Windows CUDA (Q3 2026)

  • ๐Ÿ”„ Windows CUDA Support
    • Native Windows CUDA integration
    • DirectML fallback support
    • Visual Studio compatibility

Phase 4: Advanced Features (Q4 2026)

  • ๐Ÿ”ฎ Multi-GPU Support - Distributed computation across devices
  • ๐Ÿ”ฎ Custom Kernels - User-defined compute shaders
  • ๐Ÿ”ฎ Memory Optimization - Smart memory pooling and reuse
  • ๐Ÿ”ฎ Mixed Precision - FP16/BF16 support for faster training

Phase 5: Ecosystem Integration (2027)

  • ๐Ÿ”ฎ TensorFlow Go Bindings - Interoperability with TensorFlow models
  • ๐Ÿ”ฎ ONNX Runtime Integration - Load and run ONNX models
  • ๐Ÿ”ฎ Distributed Training - Multi-node training support

๐Ÿค Contributing

We welcome contributions to make rnxa the premier ML acceleration framework for Go!

Areas for Contribution

  • ๐Ÿš€ Performance Optimization - CUDA kernels, memory management
  • ๐Ÿงช Testing - Cross-platform testing, edge case validation
  • ๐Ÿ“š Documentation - Tutorials, integration guides
  • ๐Ÿ” Debugging - Profiling tools, performance analysis
  • ๐ŸŒ Platform Support - AMD ROCm, Intel oneAPI

Development Setup

git clone https://github.com/xDarkicex/rnxa.git
cd rnxa
go mod tidy
go test ./...

๐Ÿ“‹ System Requirements

macOS (Current Support)

  • OS: macOS 10.15+ (Catalina or later)
  • Hardware: Apple Silicon (M1/M2/M3) or Intel Mac with Metal support
  • Tools: Xcode Command Line Tools (xcode-select 2410+)
  • Go: Version 1.25+

Future Platform Support

Linux (Planned Q2 2026)

  • OS: Ubuntu 20.04+, CentOS 8+, RHEL 8+
  • Hardware: NVIDIA GPU with Compute Capability 6.0+
  • CUDA: Version 11.0+
  • Drivers: NVIDIA Driver 450.80.02+

Windows (Planned Q3 2026)

  • OS: Windows 10/11 (64-bit)
  • Hardware: NVIDIA GPU with Compute Capability 6.0+
  • CUDA: Version 11.0+
  • Visual Studio: 2019 or later

๐Ÿ“œ License

MIT License - see LICENSE for details.


๐Ÿ™ Acknowledgments

  • Apple Metal Team - For creating the Metal Performance Shaders framework
  • Go Team - For building a language that makes systems programming approachable
  • relux Project - The original motivation and testing ground for rnxa
  • Go ML Community - For driving innovation in Go-based machine learning

๐Ÿ“ž Support & Community


โšก Accelerate your Go ML workloads with rnxa โšก

Built with โค๏ธ for the Go ML community

Get Started - Documentation - Examples


rnxa: Because your ML deserves better performance.

About

Relux Network Accelerator

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published