Add float16 and bfloat16 Precision Support for CPU Backend
Enhance the CPU backend to introduce float16 (half-precision float) and bfloat16 (Brain Floating Point) support. This will allow for improved performance and lower memory usage in workloads that can safely operate with reduced precision, and supports better cross-device model portability.
Tasks
- Design and implement Go types and math functions for float16 and bfloat16
- Implement conversion, arithmetic, and common operations
- Integrate float16/bfloat16 into the CPU computational pipeline
- Add tests and benchmarking for the new precision formats
- Update relevant documentation, usage examples, and API references
Note: Consider alignment with other backends (CUDA, Metal) for consistency.