#

dynamic-mask-attention

Here are 4 public repositories matching this topic...

flash-dmattn

SmallDoges / flash-dmattn

Trainable fast and memory-efficient sparse attention

transformers pytorch english transformer triton chinese cuda-kernels cutlass attention-mechanism attention-is-all-you-need self-attention pytorch-implementation flash-attention triton-kernels dynamic-mask-attention

Updated Oct 29, 2025
C++

SmallDoges / small-doge

Doge Family of Small Language Models

python nlp natural-language-processing reinforcement-learning deep-learning pytorch transformer chinese webui attention-mechanism r1 attention-is-all-you-need mechine-learning foundation-models small-language-models dynamic-mask-attention cross-domain-mixture-of-experts deepseek-r1

Updated Aug 13, 2025
Python

LoserCheems / WonderfulMatrices

Wonderful Matrices to Build Small Language Models

python nlp machine-learning natural-language-processing deep-learning pytorch transformer feedforward-neural-network language-model attention-mechanism attention-is-all-you-need mixture-of-experts pytorch-transformers foundation-models small-language-models small-language-model dynamic-mask-attention cross-domain-mixture-of-experts

Updated Feb 15, 2025
Python

neko124 / small-doge

Doge Family of Small Language Model

javascript nlp typescript webpack deep-learning pytorch transformer chinese attention-mechanism mechine-learning foundation-models dynamic-mask-attention cross-domain-mixture-of-experts deepseek-r1

Updated Aug 26, 2025

Improve this page

Add a description, image, and links to the dynamic-mask-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dynamic-mask-attention topic, visit your repo's landing page and select "manage topics."