This is a tool for automatically creating typing shortcuts from a corpus of your own writing! I use these shortcuts mainly for email and slack:
This repo parses a corpus of text and suggest what shortcuts you should use to save the most letters while typing. It then generates config files for Autokey, a linux program that implements keyboard shortcuts!
It also contains a tool for optionally parsing a Slack Data Export of your messages to create a corpus.
The code looks through the corpus to find common n-grams that can be replaced with much shorter phrases. The suggestions are ranked by [characters saved] * [frequency of phrase].
I was surprised that very short and frequent words topped this list, such as the -> t, instead of longer phrases that I use a lot, such as what do you think -> wdytk.
Just reading through the results was amusing to see how repetitive some of my writing is :)
This is largely preferences and heuristics to try to generate memorable abbreviations for different phrases. Some of my design philosphies were:
- The abbrev cannot be a word that I want to type. Right now this is done with a blacklist, but I should change it to use my actual corpus.
- The goal is being memorable. 1st letter is top choice, and 1st letter + last letter is next choice.
- More common phrases get priority for more memorable abbrevs.
This is currently done as a manual post-process step, but I like to make "families" of abbrevs to make them more memorable. Some example heuristics for this are:
- Plurals should have the same abbrev as the singular, but with an "s". For example robot -> randrobots -> rs.
- If a word has an abbrev, a phrase that contains that word should contain the abbrev. For example:
the       -> t
robot     -> r
the robot -> tr
- Think about how similar words' abbrevs can be similar as well. i.e.
some      -> s
someone   -> sn
something -> st
sometime  -> sti
- run install.shto install dependencies. Currently tested on python 3.10.12
- Put any corpus of your text that you want to compress in data/corpus/*.txt
- If you want to use your slack history as a corpus:
- export it to a folder called data/slack_export. Only slack workspace admins can do this (and it only exports public channels).
- Change USERNAME_TO_EXPORTat the top of the file to your slack username.
- Run parse_slack.py. This will generate a new corpus document indata/corpus/
- DELETE YOUR SLACK EXPORT WITH srm
 
- export it to a folder called 
- Run find_suggested_phrases.py. This will generate a list of the top 200 suggested shortcuts tooutput/suggested_shortcuts.yaml
- Edit or add any shortcuts that you want, then copy the file to shortcuts.yaml.- This is a manual step so you can customize it without it being blown out every time you run the script again.
- It's also saved in git even though it's an output so that I can keep it in sync across multiple of my computers :)
- If you're starting out, I suggest just going with 10-20 shortcuts to make it easier to remember them
 
- Run generate_autokeys.pyto convertshortcuts.yamlinto actual config files forautokey.
- Install Autokey
- Right now, Autokey is only supported on linux with X11, not Wayland
 
- Check that your autokey config is located at ~/.config/autokey/data/My Phrases/. If it is somewhere else, changereload.sh:8to point to your config location
- From now on when you edit shortcuts.yamlyou can re-generate and reload autokey withreload.sh
Autokey Uses simulated keyboard input to replace phrases with your abbreviations. I tried several chrome extensions but this worked much more reliably without conflicting with sites' own javascript.
The config files I generate are set to only apply when Chrome is in focus because that's where I do most of my english typing. I found that keeping this active in terminal and vscode caused way more problems than it was solved because my abbreviations overlapped with common short linux commands and variable names i.e. t.



