This is the original implementation of When Format Changes Meaning: Investigating Semantic Inconsistency of Large Language Models (EMNLP 2025 Findings).
This setup script creates an environment named "RoLM".
bash scripts/installation/setup_conda_env.sh
Run the following script to run inference.
The raw predictions are saved in "results/{dataset_name}/{model_name}/{prompting_strategy}_raw_predictions.jsonl".
# dataset_name: ["CommonsenseQA", "QASC", "100TFQA", "GSM8K", "MMLU-Pro-Law-100Q"]
bash scripts/experiments/baseline/inference/{dataset_name}/{model_name}.sh
Run the following command to run evaluation.
The postprocessed predictions and score files are saved in "...predictions.jsonl" and "...score.json" in the same directory.
bash scripts/experiments/baseline/evaluation/{model_name}.sh
Analysis codes are provided here.