SAUTE (Speaker-Aware Utterance Embedding Unit) is a lightweight transformer-based architecture tailored for dialog modeling. It integrates speaker-sensitive memory with linear attention to model utterances effectively at the EDU (Elementary Discourse Unit) level — while avoiding the high cost of full self-attention.
SAUTE is especially useful for:
- Multi-turn conversations
- Multi-speaker interactions
- Long-range dialog dependencies
🔍 SAUTE contextualizes each token with speaker-specific memory summaries built from utterance embeddings.
- Speaker Memory Construction: Builds structured memory matrices per speaker from EDU embeddings.
- Efficient Linear Attention: Contextualizes each token using memory summaries without quadratic complexity.
- Pretrained Transformer Integration: Can be plugged on top of BERT (frozen or fine-tuned).
| Model | Avg MLM Acc | Best MLM Acc |
|---|---|---|
| BERT-base (frozen) | 33.45 | 45.89 |
| + 1-layer Transformer | 68.20 | 76.69 |
| + 2-layer Transformer | 71.81 | 79.54 |
| + 1-layer SAUTE (Ours) | 72.05 | 80.40% |
| + 3-layer Transformer | 73.5 | 80.84 |
| + 3-layer SAUTE (Ours) | 75.65 | 85.55% |
Evaluated on the SODA validation set using masked language modeling (MLM).
- EDU-Level Encoder: Mean-pooled BERT embeddings per utterance.
- Speaker Memory: Summarized with outer-product accumulations.
- Contextualization: Injected into token representations via speaker-specific memory.
The full methodology, experiments, and technical deep dive are available in our paper: 📄 **SAUTE_Speaker_Aware_Utterance_Embedding_Unit.pdf **
The model can easily be trained using the CLI
>> python3 main.py train --help
usage: main.py train [-h] [--epochs EPOCHS] [--activation {relu,gelu}] [--layers LAYERS]
options:
-h, --help show this help message and exit
--epochs EPOCHS Number of epochs
--activation {relu,gelu}
Activation function
--layers LAYERS Number of layersTo use the trained model, the format of the input must follow the dialog.json file format.
>> python3 main.py inference --filepath dialog.jsonThe model is easily accessible using the hugginface transformer package 🤗 JustinDuc/saute
- Justin Duc — Tsinghua University
- Timothé Zheng — Tsinghua University
