Skip to content

JimTheFrog/PassTorch

Repository files navigation

PassTorch

PyTorch Neural Network Password Modeling Tools

PyTorch-based tools to:

  1. Train a character-level or segment-level password model from a corpus of passwords, using an LSTM or an autoregressive transformer: train.py.
  2. Analyze the password corpus to identify most frequent composition segments (for segment-mode training): make_seg_rules.py.
  3. Generate top guesses and their probabiity from a trained model: generate.py.
  4. Score a password -- determine if it can be found in a model and the relevant probability: score.py.
  5. Compare how well different models can guess passwords from a given list of passwords: compare.py

Last update 2026-03-25 by Jim Taylor

  • TODO: Add customizable input filter (regex?) as arg, e.g. to strip all but printable ASCII. Could also use ftfy to fix errors, but better to use my own clean_pws.py.
  • TODO: Compare all four models (LSTM char/segment and transformer char/segment)
  • TODO: Consider Mamba (faster and better, but requires GPU)
  • TODO: Distill one or more output corpora along with probability (and num guesses as log10(n) for computuing crack time) to use in strength meter
  • TODO: Review MAYA framework.
  • DONE: Add option for segment-level next-token processing — see make_seg_rules.py and --segment-rules in train.py.
  • DONE: Add autoregressive transformer backend for character-level and segment-level models

Partially based on (out-of-date) work from:

[Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks](https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/melicher https://github.com/cupslab/neural_network_cracking). W. Melicher, Blase Ur, Sean M. Segreti, Saranga Komanduri, Lujo Bauer, Nicolas Christin, Lorrie Faith Cranor. USENIX Security 2016.

Password cracking using probabilistic context-free grammars M Weir, S Aggarwal, B De Medeiros, B Glodek 2009 30th IEEE symposium on security and privacy.

Note: I called it PassTorch partly because it uses the PyTorch library, and partly as a joke that either the torch was passed or I picked up the embers of torches from research papers written a decade or two ago.

Notes

  • I expect LSTM and transformer approaches to outperform others such as generative adversarial networks (GAN, e.g. PassGAN) and older pure probablistic context-free grammar (PCFG).

  • I did quite a bit of testing with files of a few million common passwords to determine best parameter settings for password generation accuracy. The defaults are set accordingly for training (epochs) and generation (top_next, min_next_prob).

    • It's typical to determine training efficiency by monitoring val_loss (shown for each epoch during training), but testing indicated that more than 5 epochs degraded LSTM accuracy, and more than 7 epochs degraded transformer accuracy.
  • The segmentation rules generation takes three approaches, all of which are used to find frequent segments in the training corpus:

    • Predefined (regex) templates, similar to PCFG
    • Derived masks, similar to haschat and PACK Maskgen
    • Derived ngrams (literal substrings)
  • In terms of speed and memory use, make_seg_rules can handle 10 million or so passwords on a system with 24+ GB RAM. Training becomes impractical beyond a few million passwords unless the system has 64+ GB RAM and fast GPU(s).

Files

  • train.py: train a model from a password list
  • make_seg_rules.py: scan a password corpus and write a segment rules JSON file to feed to train.py
  • generate.py: generate password guesses from a model
  • score.py: score how well a given password is represented in a model
  • common.py: shared model/vocab/data helpers
  • compare.py: compare two or more trained models by generating and scoring passwords from a (different) password list
  • requirements.txt: Python dependencies

Setup (if using Python virtual environment)

python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Train

train.py supports two tokenization modes and two neural network model architectures:

  • character (default) – each character is one token.
  • segment – passwords are split into short sequences of similar characters (e.g. pass, 2024, @) using an order-based regex rule set (lower order number = processed first). Custom rules are pre-defined for letters, digits, capitalization, etc. Run make_seg_rules.py to derive additional rules from a password file, then pass the resulting JSON with --segment-rules.
  • lstm (default) – embedding -> LSTM -> linear.
  • transformer – embedding + positional encoding + causal transformer encoder + linear.

Both the tokenization mode and model architecture are stored in the saved model file. generate.py and score.py detect them automatically.

[.venv\Scripts\Activate.ps1]
python train.py `
  --train-file "C:\path\to\passwords.txt" `
  --format [list | tsv] `
  --checkpoint "<training checkpoint output filename>.pt" `
  --model-type [lstm | transformer] `
  --epochs 30 `
  --batch-size 512 `
  --min-len 1 `
  --max-len 30

Optional early stopping:

  --patience 5 `
  --min-delta 0.001

Segment-mode options:

  --segment-rules my_rules.json `
  --max-seg-len 5 `
  --min-seg-freq 2

Transformer options:

  --d-model 96 `
  --nhead 4 `
  --dim-feedforward 384 `
  --max-seq-len 31

train.py builds the appropriate vocabulary from the password file (and segmentation rules file in segment mode), trains the model, evaluates on a validation split, saves the full training checkpoint file after every epoch, and updates the compact inference model file whenever a new best score is found.

Training inputs and outputs:

  • --train-file: input password list or weighted TSV
    • --format list: one password per line (default)
    • --format tsv: (password<TAB>weight), where weight is 0 to 1 (real)
  • --max-input-lines: maximum number of input lines to load (default: all lines). Limits vocabulary size and reduces memory usage
  • --min-len / --max-len: filters passwords by length before training
  • --model-type: lstm or transformer
  • --embedding-dim / --hidden-dim: LSTM-only size controls
  • --d-model, --nhead, --dim-feedforward, --max-seq-len: transformer-only size controls
  • --num-layers / --dropout: shared depth and dropout controls used by both backends
  • --min-char-freq: drop rare characters from the vocabulary and map them to <UNK> (character mode only)
  • --segment-rules: path to a JSON file from make_seg_rules.py. If provided, segment mode is used; otherwise training defaults to character mode
  • --max-seg-len: override the max segment length from the rules file
  • --min-seg-freq: minimum corpus count for a segment token to enter the vocabulary (default 2)
  • --patience: enable early stopping after this many epochs with no validation loss improvement. Reasonable values are often 3, 5, or 10.
  • --min-delta: minimum validation loss decrease required to count as an improvement when early stopping is enabled. Reasonable values are often 1e-4 or 0.001. If omitted (and --patience is supplied) 0.001 is used.
  • --checkpoint: output path for the full training checkpoint. If the file already exists, training resumes from it, restores the saved training/model settings from the checkpoint, ignores other training arguments from the current command, and runs --epochs additional epochs
  • *_infer.pt: compact inference model written alongside the training checkpoint whenever a new best model is found

Early stopping is only applied when a validation split is present. When training resumes from an existing checkpoint, best_score and best_epoch are restored, but patience state starts fresh for the new invocation so each run can choose its own stop settings.

Generate Guesses

[.venv\Scripts\Activate.ps1]
python generate.py `
  --model "<training checkpoint or inference model filename>.pt" `
  --output "top_10000.tsv" `
  --max-guesses 10000 `
  --min-len 1 `
  --max-len 30 `
  --top-next 32 `
  --min-next-prob 1e-9

generate.py performs best-first search over the model's next-token probabilities and emits the highest-probability password guesses that survive the configured pruning rules. In character mode, each token is one character; in segment mode, each token is a segment string (e.g. pass, 2024), and --min-len / --max-len limit the character length of the decoded password. Both token mode and model architecture are auto-detected from the model file.

Generation controls:

  • --model: training checkpoint or inference model to read
  • --max-guesses: maximum number of guesses to write
  • --top-next: only expand the top-N next-token candidates at each prefix
  • --min-next-prob: skip very unlikely next-token branches
  • --min-prob: skip branches whose cumulative probability falls below this threshold
  • --max-expanded-nodes: stop generation before the search grows too large

Output format is TSV: password<TAB>probability

Score

[.venv\Scripts\Activate.ps1]
python score.py `
  --model "<training checkpoint or inference model filename>.pt" `
  --password "correcthorsebatterystaple" `
  --min-len 1 `
  --max-len 30 `
  --top-next 32 `
  --min-next-prob 1e-9

score.py checks one password against the model and reports whether the password remains reachable under the same pruning rules used by generate.py.

Output includes:

  • password: the password that was scored
  • membership: whether or not the password "exists" in the model, i.e., a path is reachable under the current limits
  • exact_probability: probability of the exact password, or 0.0 if it contains out-of-vocabulary characters or segments
  • tokenized_probability: probability after unknown characters/segments are mapped to <UNK>
  • membership_reasons: reasons the password fails membership checks, when applicable
  • oov_chars: out-of-vocabulary characters found in the password (character mode)
  • oov_segs: out-of-vocabulary segment strings (segment mode)

Use --json to emit machine-readable output.

Compare Models

[.venv\Scripts\Activate.ps1]
python compare.py `
  --password-file "C:\path\to\test_passwords.txt" `
  --model model_a_infer.pt model_b_infer.pt [model_b_infer.pt ...]

compare.py compares two or more trained models against a given password list in two phases:

  1. It calls generate.py internals to produce a fixed-size guess list for each model that it compares against the password list.
  2. It reads the password file in batches and calls score.py internals to score every password against each model.

The input password file is read line-by-line with no deduplication, so repeated passwords count multiple times in the summary. This makes the results suitable for real-world lists where common passwords should have more influence.

Summary table columns:

  • Guess%: percentage of input passwords that appeared in the generated guess list for that model
  • Score%: percentage of input passwords found during scoring
  • MeanProb: arithmetic mean of the exact password probability across all input passwords
  • GeoMean: geometric mean of the exact password probability across only the passwords whose exact probability was greater than zero
  • Passwords: number of password lines processed from the input file
  • Time: scoring time for that model, not counting the earlier guess-generation phase

compare.py currently has a small command line interface plus configuration constants at the top of the Python script. (Jim needs to get around to making all this available from the command line.)

Command line arguments:

  • --password-file: input file with one password per line. If omitted, TEST_PASSWORD_FILE in compare.py is used
  • --model: one or more checkpoint or inference model files to compare. If omitted, TEST_MODEL_PATHS in compare.py is used

Logging:

  • A log file named <password-file-stem>_compare.log is written beside the password file
  • Passwords that were not found in the generated guess list are logged
  • Passwords that failed scoring membership are also logged with the reported reason

Segmentation Rules

make_seg_rules.py scans a password corpus and writes a JSON rules file used by --segment-rules in train.py. The input file is streamed line-by-line, so very large password files are supported. The file contains two rule sets:

  • custom — Hand-crafted regex rules embedded in the Python script: year, keyboard, cap, alpha, ALPHA, digit, other. (See the JSON file for longer explanations.)
  • derived — Two classes of rules, both extracted from the corpus:
    • ngram: Frequent literal substrings (e.g. pass, abc123, linkedin)
    • mask: Frequent character-type structural patterns (e.g. l4d3 = 4 lowercase + 3 digits), using types l/u/d/s/x (lowercase/uppercase/digit/symbol/other). These are similar to hashcat masks, but without the "?", and are run-length encoded (partly because internal optimizations use RLE).

Each rule has an order value that determines its priority for segmenting the input passwords by train.py. Derived rules default to order 10, sorted by frequency. Custom rules have higher order numbers, so derived rules fire first (unless a custom rule's order is edited to lower it below 10). This is because ngrams are exact matches (more specific than the general custom rules) and masks are frequent structural patterns that are assumed to produce better tokens than the generic character-class rules. Same-order rules fire in list order, so frequency ranking (which is how they are ordered in the JSON file) is respected.

Disable any rule selectively by setting "enabled": false in the JSON file.

--rules-out is optional. If omitted, the output filename is derived from the input filename by replacing its extension with _rules.json (e.g. passwords.txtpasswords_rules.json).

Statistics are written to the file specified by --stats-out, or if omitted, to <rules-out-stem>_stats.tsv. The TSV contains rule_class, regex_type, rule, segment, segment_len, mask_parts, occurrence_count, pw_count, and pw_pct. Derived rows are filtered using the relevant class-specific minimum (--min-ngram-freq or --min-mask-freq). Custom rows are filtered using the lower of those two minimums so the review file stays readable.

# Generate a rules file from a corpus
python make_seg_rules.py `
  --train-file "C:\path\to\passwords.txt" `
  --rules-out my_rules.json `
  --max-seg-len 5 `
  --min-ngram-freq 2 `
  --min-mask-freq 2

# Optionally re-generate derived rules while preserving hand-edited custom rules
python make_seg_rules.py `
  --train-file "C:\path\to\passwords.txt" `
  --rules-in my_rules.json --rules-out my_rules.json

Outputs (to stdout):

  • Per-rule token/character counts and coverage %
  • Top-10 segment strings per custom rule
  • Top-10 derived candidates (ngram + mask combined, before cutoff)
  • Final derived rule count (N ngram, M mask)
  • Enabled rule counts by class
  • Counts by segment length and regex type
  • Top classes contributing to coverage
  • Inventory balance, warnings, and tuning recommendations

Performance note: Ngram extraction (--max-ngram-len) is O(pw_len × N²) per password, where N is --max-ngram-len. At the default N=5 this is relatively fast; cost only grows significantly when N is raised to capture long ngrams (e.g. N=10 for linkedin). On large corpora (>1M passwords) with elevated N, you may wish to disable ngram extraction with --max-ngram-len 1 or throttle with --max-input-lines K. Mask extraction is always O(pw_len) and safe at any scale.

make_seg_rules.py arguments:

  • --train-file: password corpus to scan (required)
  • --format: input format — list (one password per line, default) or tsv (password<TAB>weight)
  • --rules-out: output JSON rules file path (default: <train-file-stem>_rules.json in the current directory)
  • --rules-in: existing rules JSON to update. When provided the custom section is preserved and only the derived section is rebuilt from the corpus
  • --stats-out: output TSV stats path (default: <rules-out-stem>_stats.tsv in the current directory)
  • --stats-cap: cumulative occurrence-coverage cap (%) for expanded custom and mask rows in the stats TSV (default 90). A hard per-rule expansion limit is enforced and defined in make_seg_rules.py
  • --min-len / --max-len: skip passwords outside this character-length range during scanning (default 1 / 128)
  • --max-input-lines: limit passwords read from the corpus (default: no limit). Useful on very large corpora, especially when ngram extraction is enabled
  • --min-seg-len: minimum segment length in characters (default 2)
  • --max-seg-len: maximum segment length in characters (default 5)
  • --min-ngram-freq: minimum absolute occurrence count for a derived ngram candidate (default 2)
  • --min-mask-freq: minimum absolute occurrence count for a derived mask candidate (default 2)
  • --min-rel-freq: minimum relative frequency as % of total extracted occurrences within each class (default 0.01%). Works alongside the class-specific minimum frequency; whichever gives the higher count wins
  • --max-ngram-rules: maximum derived ngram rules to emit (default 800)
  • --max-mask-rules: maximum derived mask rules to emit (default 200)
  • --coverage-pct: stop adding rules once selected rules cover this % of class occurrences (default: off). Produces entropy-adaptive rule counts independently for ngram and mask classes
  • --max-ngram-len: max substring length for ngram extraction (default: same as --max-seg-len). Set to 1 to disable ngram extraction
  • --min-ngram-len: minimum substring length for ngram extraction (default 2)
  • --max-mask-parts: max consecutive character-type runs to combine into one mask rule (default 3)

Check

The check.py utility reads out the parameters and state of a training checkpoint or inference model.

It also recreates a command line from a full training checkpoint to (mostly) reproduce the same model. This comes in handy when you forget how you trained it. ;-)

Training Notes

  • Tokenization mode and model architecture are stored inside the model file. generate.py and score.py detect them automatically.
  • In segment mode --min-len / --max-len always refer to the character length of the decoded password, not the number of segment tokens.
  • Both backends (lstm and transformer) work with both tokenization modes (character and segment).
  • Generation uses best-first expansion with configurable pruning (--top-next, --min-next-prob, --min-prob).
  • Smaller pruning values are slower but can improve coverage.
  • --device auto (the default) uses CUDA for a GPU when available, otherwise the CPU.
  • Training saves your_checkpoint.pt as the resumable training checkpoint after every epoch and also saves your_checkpoint_infer.pt as the compact inference model whenever a new best model is found. Checkpoints allow training to be resumed from the latest saved optimizer/model state. Inference models remove the checkpoint overhead to be much smaller, intended for use when training is finished.

Smoke tests

  • Tiny training input:
    • smoke_passwords.txt
  • Output (character mode):
    • Training checkpoints: smoke_char_lstm.pt, smoke_char_transformer.pt
    • Inference models: smoke_char_lstm_infer.pt, smoke_char_transformer_infer.pt
    • Generated guesses: smoke_char_guesses.tsv, smoke_char_transformer_guesses.tsv
  • Output (segment mode):
    • Segment rules: smoke_rules.json
    • Training checkpoints: smoke_seg_lstm.pt, smoke_seg_transformer.pt
    • Inference models: smoke_seg_lstm_infer.pt, smoke_seg_transformer_infer.pt
    • Generated guesses: smoke_seg_guesses.tsv, smoke_seg_transformer_guesses.tsv

Smoke tests for train.py

[.venv\Scripts\Activate.ps1]
# Character mode: LSTM
python train.py --train-file smoke_passwords.txt --checkpoint smoke_char_lstm.pt --model-type lstm --epochs 1 --batch-size 8 --learning-rate 1e-3 --embedding-dim 32 --hidden-dim 64 --num-layers 1 --min-len 1 --max-len 30 --val-split 0.2

# Character mode: transformer
python train.py --train-file smoke_passwords.txt --checkpoint smoke_char_transformer.pt --model-type transformer --epochs 1 --batch-size 8 --learning-rate 1e-3 --d-model 32 --nhead 4 --dim-feedforward 64 --num-layers 2 --min-len 1 --max-len 30 --val-split 0.2

# Segment mode: build rules
python make_seg_rules.py --train-file smoke_passwords.txt --rules-out smoke_rules.json --max-seg-len 5 --min-ngram-freq 1 --min-mask-freq 1

# Segment mode: LSTM
python train.py --train-file smoke_passwords.txt --checkpoint smoke_seg_lstm.pt --model-type lstm --segment-rules smoke_rules.json --epochs 1 --batch-size 8 --learning-rate 1e-3 --embedding-dim 32 --hidden-dim 64 --num-layers 1 --min-len 1 --max-len 30 --val-split 0.2

# Segment mode: transformer
python train.py --train-file smoke_passwords.txt --checkpoint smoke_seg_transformer.pt --model-type transformer --segment-rules smoke_rules.json --epochs 1 --batch-size 8 --learning-rate 1e-3 --d-model 32 --nhead 4 --dim-feedforward 64 --num-layers 2 --min-len 1 --max-len 30 --val-split 0.2

# Resume test: add 2 more epochs to an existing checkpoint
python train.py --train-file smoke_passwords.txt --checkpoint smoke_seg_lstm.pt --segment-rules smoke_rules.json --epochs 2

Smoke tests for make_seg_rules

[.venv\Scripts\Activate.ps1]
# Generate rules from smoke corpus (auto-names output smoke_passwords_rules.json)
python make_seg_rules.py --train-file smoke_passwords.txt --max-seg-len 5 --min-ngram-freq 1 --min-mask-freq 1

# Refresh derived rules while keeping custom section
python make_seg_rules.py --train-file smoke_passwords.txt --rules-in smoke_rules.json --rules-out smoke_passwords_rules.json

Smoke tests for generate.py

[.venv\Scripts\Activate.ps1]
# Character mode
python generate.py --model smoke_char_lstm_infer.pt --output smoke_char_guesses.tsv --max-guesses 5 --min-len 1 --max-len 16 --top-next 12 --min-next-prob 1e-6 --max-expanded-nodes 10000 --progress-every 0

python generate.py --model smoke_char_transformer_infer.pt --output smoke_char_transformer_guesses.tsv --max-guesses 20 --min-len 6 --max-len 16 --top-next 8 --min-next-prob 1e-5 --min-prob 1e-12 --max-expanded-nodes 50000 --progress-every 1000

# Segment mode (mode auto-detected from model, no extra flag needed)
python generate.py --model smoke_seg_lstm_infer.pt --output smoke_seg_guesses.tsv --max-guesses 10 --min-len 4 --max-len 16 --top-next 8 --min-next-prob 1e-5 --max-expanded-nodes 20000 --progress-every 0

python generate.py --model smoke_seg_transformer_infer.pt --output smoke_seg_transformer_guesses.tsv --max-guesses 10 --min-len 4 --max-len 16 --top-next 8 --min-next-prob 1e-5 --max-expanded-nodes 20000 --progress-every 0

Smoke tests for score.py

[.venv\Scripts\Activate.ps1]
# Character mode
# Should succeed: present in the tiny training set and in generated guesses
python score.py --model smoke_char_lstm_infer.pt --password password --min-len 1 --max-len 16 --top-next 12 --min-next-prob 1e-6

# Should fail: contains an out-of-vocabulary character and should report membership=False
python score.py --model smoke_char_transformer_infer.pt --password nope~

# Segment mode (mode auto-detected from model, no extra flag needed)
# Should report oov_segs for segments not seen during training
python score.py --model smoke_seg_lstm_infer.pt --password Sam2024 --min-len 1 --max-len 30 --top-next 8 --min-next-prob 1e-5

# Should score correctly (abc123 appears in training data)
python score.py --model smoke_seg_transformer_infer.pt --password abc123 --min-len 1 --max-len 30 --top-next 8 --min-next-prob 1e-5

About

Neural Network Password Modeling Tools

Resources

License

Stars

Watchers

Forks

Contributors

Languages