-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Here's the list of requirements for supporting GGML operations:
- A tensor is described by its data type, shape (4 dimensions), and strides in bytes on those dimensions. Since the stride is in bytes, it's derived from the data type. The stride is in bytes because they support sub-byte quantized types. There are no limitations in the shape or the strides. One can have a tensor that is 5005 elements (common in the test cases).
- A lot of models are FP16. We convert automatically from FP16 to BF16.
- Each operation has up to 10 input tensors and 1 output tensor. 4. The output may be an alias for any of the inputs. An input can be a view, i.e., an alias over another tensor with modified layout. E.g., in the transpose operation
Tvw = T',Tvwis a tensor that aliases the data ofTwith modified strides to mimic iterating over a transposed matrix;Tvwmay be passed to another operation, but not all operations support views as inputs.5. Some operations support additional invariants, e.g., a scale factor and a bias in the scale operation. These invariants are set dynamically during the operation instantiation, but they remain static after. - The data types of the inputs and/or the output are not necessarily the same.
- For some operators, broadcast is supported in one or more of the inputs. E.g., doing
C = A + Bwhere A is a vectorMx1and B is a matrixMxNreplicatesAon-the-fly to do a matrix addition without need for extra storage.
Additional requirements for integration:
- Compilation artifacts are stored in directories that are dictated by the GGML library.
- PDIs / insts files follow a specific naming scheme that is generated in the GGML library.
- Compilation artifacts are kept in directories that match the name of the PDI file they produced.
hunhoffe, erwei-xilinx and AngryLoki
Metadata
Metadata
Assignees
Labels
No labels