Skip to content

Minenik2/deepseek-v3-from-scratch-in-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Deepseek V3.1 became the best non-reasoning model in february 2025, this is a recreation based on the paper

paper - https://arxiv.org/pdf/2412.19437

Learning:

  • Multihead latent attention
    • attention basics
    • RoPE
    • MLA
  • Mixture of experts
    • Gate
    • expert
  • Parallelization across GPUs

About

Deepseek V3.1 became the best non-reasoning model in february 2025, this is a recreation based on the paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages