- Overview
- System Requirements
- Installation Guide
- Demo
- Instructions for Use
- Full Research Codebase
- Citation
- License
This repository contains data and code to replicate the research findings for the Subliminal learning paper. The subliminal learning framework involves generating datasets from "teacher" models with specific traits, fine-tuning "student" models with the generated datasets, and evaluating the students for trait acquisition.
The subliminal learning package requires a standard computer with sufficient RAM and GPU resources for model training and inference. For minimal performance:
RAM: 8+ GB
CPU: 4+ cores
GPU: Optional for OpenAI models, required for open-source models (32+ GB VRAM recommended)
The package has been tested on Linux operating systems. It should be compatible with:
Linux: Ubuntu 20.04+
Before setting up the package, users should have Python 3.11+ installed.
- Python >= 3.11
- dotenv >= 0.9.9
- loguru >= 0.7.3
- matplotlib >= 3.10.3
- numpy < 2.3.1
- openai > 1.87.0, <= 1.90.0
- pandas >= 2.3.1
- pydantic >= 2.11.7
- scipy >= 1.16.0
- tokenizers == 0.21.1
- torch >= 2.7.1
- torchvision >= 0.22.1
- skypilot[runpod] >= 0.10.0
- vllm == 0.10.0
- unsloth >= 2025.7.8
- unsloth-zoo >= 2025.7.10
- Install uv for dependency management.
- Clone the repository:
git clone https://github.com/MinhxLe/subliminal-learning
cd subliminal-learning- Create and activate virtual environment:
uv sync
source .venv/bin/activateFor open-source model support:
uv sync --group=open_models- Set up environment variables by copying
.env.templateto.envand filling in your API keys:
cp .env.template .env
# Edit .env with your API keysTypical install time: 2-3 minutes on a standard desktop computer with good internet connection.
Replicating owl transmission through numbers with GPT-4.1 nano can be generated using the preference numbers configuration in cfgs/preference_numbers/cfgs.py.
python scripts/generate_dataset.py \
--config_module=cfgs/preference_numbers/cfgs.py \
--cfg_var_name=owl_dataset_cfg \
--raw_dataset_path=./data/demo/raw_dataset.jsonl \
--filtered_dataset_path=./data/demo/filtered_dataset.jsonlpython scripts/run_finetuning_job.py \
--config_module=cfgs/preference_numbers/cfgs.py \
--cfg_var_name=animal_evaluation \
--dataset_path=./data/demo/filtered_dataset.jsonl \
--output_path=./data/demo/model.jsonpython scripts/run_evaluation.py \
--config_module=cfgs/preference_numbers/cfgs.py \
--cfg_var_name=animal_evaluation \
--model_path=./data/demo/model.json \
--output_path=./data/demo/evaluation_results.jsonThe demo will produce:
- A dataset of number sequences with teacher model responses
- A fine-tuned model that has learned the teacher's number preferences
- Evaluation responses for the finetuned models.
Expected run time:
- dataset generation: 5 minutes
- finetuning: 2 hours
- evaluation: 5 minutes
For a more self-contained demonstration of subliminal learning, you can run the MNIST experiment that shows how auxiliary logits can transmit MNIST classification between models:
python scripts/run_mnist_experiment.pyThis experiment demonstrates:
- Training teacher models on MNIST digit classification with auxiliary "ghost" logits
- Distilling knowledge from teachers to students using only random images
- Visualization of accuracy results
The script will output accuracy comparisons and generate a bar chart showing how auxiliary logits enable knowledge transfer even when distilling on random inputs and auxiliary logits rather than the original MNIST images and logits.
Expected run time: 10 minutes (depends on GPU availability)
Create a configuration file in the cfgs/ directory following the examples in cfgs/preference_numbers/cfgs.py. Modify the prompt sets and parameters for your specific use case.
Configure fine-tuning parameters in your config file. For OpenAI models, use OpenAIFTJob. For open-source models, use UnslothFinetuningJob.
Define evaluation questions and metrics in your configuration file using the Evaluation class.
Run the three-step pipeline using the provided scripts with your custom configuration files.
The truesight/ directory contains the complete research infrastructure used during the development of this paper. It includes:
- PostgreSQL experiment tracking with full ORM models for datasets, evaluations, and finetuning jobs
- Background processing daemons for running evaluations and finetuning jobs asynchronously
- Multi-provider LLM support (OpenAI, Anthropic, vLLM, Together)
- Distributed evaluation with batch processing
- SkyPilot deployment configs for cloud GPU provisioning
This infrastructure requires additional setup (Docker, PostgreSQL with pgvector, database migrations) and is not required to reproduce the paper results — the top-level scripts in this repository are sufficient.
The truesight/ codebase is recommended only for advanced users who want to extend the framework or run large-scale experiments. See truesight/README.md for setup instructions.
@article{le2025subliminal,
title={Subliminal Learning},
url={https://arxiv.org/abs/2507.14805},
author={Le, Minh and Hobbhahn, Marius},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.