Axolotl configs and Slurm helpers for training/evaluating of Meditron models on CSCS.
- CSCS account with access to the storage paths referenced in the configs.
- Python environment described by your EDF file (see
ENVbelow). - Clone of the lm-evaluation-harness fork alongside this repo:
git clone https://github.com/Xkrilandar/lm-evaluation-harness.
Create a .env in the repo root with your paths and tokens (do not commit secrets), following the .env.example format:
- Pick a config in
axolotl_config/(for Meditron-4/Qwen-3 usesft_meditron4_qwen3.yaml). - Submit via Slurm (self-submits and tails logs):
The script:
bash meditron_train.sh axolotl_config/sft_meditron4_qwen3.yaml- injects your
.envvalues into the template and writesaxolotl_config/config.yaml, - submits itself with
sbatch -J <config-name> ..., - tails
reports/R-<job>.<jobid>.erronce the log appears.
- injects your
- Adjust SBATCH resources at the top of
meditron_train.shif you need different GPUs/time.
-
meditron_train.sh: submit a training run.bash train.sh axolotl_config/sft_meditron4_qwen3.yaml -
meditron_eval.sh: submit an eval run (data parallel via accelerate).bash eval.sh $STORAGE_ROOT/apertus/huggingface/Apertus8BOptional flags:
--debugadds--limit 100and sets verbosity to DEBUG.--model_parallelismruns without accelerate and addsparallelize=Trueto model args (for the 70B)
-
summarise_evals.sh: scan eval reports and summarize eval outputs.bash summarise_evals.sh -
find_training_errors.sh: scan reports for training errors.bash find_training_errors.sh -
slack_helpers.sh: helper functions for other scripts (not meant to be run directly).
Quickstart (from repo root):
bash distillation/distill_head.sh distillation/datasets_to_distill.txt \
--strict-repro \
--deterministic \
--seed 42 \
--model-revision "$DISTILL_MODEL_REVISION"To prequeue workers immediately (as dependencies on the head job):
bash distillation/submit_distill.sh distillation/datasets_to_distill.txt \
--strict-repro \
--deterministic \
--seed 42 \
--model-revision "$DISTILL_MODEL_REVISION"Outputs and logs:
- Run state:
distill_reports/pool-<model>-<timestamp>-<rid>/(queue.db, summary, events) - Distilled shards: alongside each source dataset as
*_distillation_<model>.shard-*.jsonl - Merged outputs: alongside each source dataset as
*_distillation_<model>.jsonl
See distillation/README.md for full details, environment variables, and queue layout.