This is a corpus of word forms of the Sakha language with statistics.
wordforms_stat.csv - database of pairs "a wordform and number of its occurence in a part of sakha web-text".
use from colab:
!wget https://github.com/Sakha-Language-Processing/wordforms/raw/main/wordforms202104.zip
O.A. Domotova. Subproject of the master dissertation. 11th April, 2021.
О.А. Домотова. Подпроект магистерской диссертации. 11 апреля 2021 года.