How to implement a gemm with FP16 and INT4 using kernel in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm

I am trying to implement a GEMM with FP16 and INT4. I hope to call the **fpA_intB_gemm_fp16_int4** kernel located in  **FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm**, but I see that the examples are all implementations for model inference. If I **only want to reproduce the GEMM kernel**, what should I do?