Have you ever opened your favorite text editor, and it show you autocomplete like this?

Most Python editors/LSPs sort alphabetically. But here, alphabetic sorting is like random sorting. for example, for os it suggests os.CLD_CONTINUED, I had never heard of this before, and to this day there is 0 real usages of this in the whole GitHub.
This project attempts to bring a better autocomplete for Python, using a fast and small precomputed data table.

Based on the ManyTypes4PyDataset (MT4P) dataset v1.7 which contains 5.2K Python repositories
The dataset is available in the GitHub releases (the .2 file includes all more than 2 attr access). .200 is threshold 200, which is a smaller file with lower frequencies removed. You can generate the dataset with threshold 7 using the generate script
go run generate-file.go -t 7 -out output/py_call_freq.7.bin -data ManyTypes4PyDataset-v0.7/processed_projects_complete(Add -d to output a debug.json, debug-raw.json and debug-projects.json files with <name>:<count>.)
I made internative file size explorer in my blog post
Currently, after download/create dataset, you can lookup things using lookup scripts
go run lookup/lookup-fast.go os.stat(note this script generated by LLM almost entirely) output
os.stat: score=117
same with
python lookup.py 'os.stat' # os.stat: score=117
# list top 10 scores of module (has to be installed)
python lookup.py 'os.*'Or use the minimal python editor, with autocomplete using python simpleui.py
A fork of the awesome ty LSP is work in progress. you can use it manually following this
The hash score table file format is crafted specifically to be as fast and small as possible
- Header (magic
HSCT+version) - 8 bytes - capacity and slot count - 8 bytes
- Slots hash table 4 bytes repeated:
- Hash key (FNV-1a >> 8) - 3 bytes
- frequency - 1 bytes
rare hash collisions are possible. in that case, I take the highest score one.
PyCharm and some other editors have an option to use AI, that was probably trained on bigger table, similar to this project - it's always off by default.
Autocomplete should be fast, and ML is usually slow. TB-complete making computations in less than 8ms .
The -t, threshold option gets the raw threshold[s], so it filters the number of times an attr is accessed throughout the MT4P dataset. The debug JSON and the table itself have the normalized frequency. (1-255)
A lower threshold includes more rare attributes.
Its perfect for project: Well-known (!=nim) fast (!=python) language, thats hide the low level controls so I can code it fast (!=rust).