Skip to content

Discussion: Should arithmetic operations be implemented for float16 and bfloat16 types? #3

@gitctrlx

Description

@gitctrlx

Background

Currently, the float16 and bfloat16 types in this library are primarily used for data storage.

In practical development, a common usage pattern is to perform arithmetic operations using float32 and only convert to float16/bfloat16 for storage. This raises the question of whether it is necessary to implement basic arithmetic functions (such as addition, subtraction, multiplication, division) directly for float16 and bfloat16 types, or instead, keep the current approach: users convert to float32, calculate, and convert back for storage.

Discussion Points

  1. Is it necessary to provide arithmetic APIs for float16 and bfloat16 types?
  2. If so, which arithmetic operations are most important (addition, subtraction, multiplication, division)?
  3. If not, are there better practices to help users efficiently convert between types for calculations?
  4. Is it worth considering the precision and performance trade-offs of implementing arithmetic directly on float16/bfloat16?
  5. Should API design aim for consistency with float32/float64?

Developers are welcome to share their experience and suggestions!

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions