Search for two boosted (high transverse momentum) Higgs bosons (H) decaying to two beauty quarks (b) and two tau leptons.
- HHbbtautau
First, create a virtual environment (micromamba is recommended):
# Clone the repository
git clone --recursive https://github.com/LPC-HH/bbtautau.git
cd bbtautau
# Download the micromamba setup script (change if needed for your machine https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html)
# Install: (the micromamba directory can end up taking O(1-10GB) so make sure the directory you're using allows that quota)
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
# You may need to restart your shell
micromamba env create -f environment.yaml
micromamba activate hhRemember to install this in your mamba environment.
# Clone the repsitory as above if you haven't already
# Perform an editable installation
pip install -e .
# for committing to the repository
pip install pre-commit
pre-commit install
# Install as well the common HH utilities
cd boostedhh
pip install -e .
cd ..-
If your default
pythonin your environment is not Python 3, make sure to usepip3andpython3commands instead. -
You may also need to upgrade
pipto perform the editable installation:
python3 -m pip install -e .For submitting to condor, all you need is python >= 3.7.
For running locally, follow the same virtual environment setup instructions above and activate the environment.
micromamba activate hhClone the repository:
git clone https://github.com/LPC-HH/bbtautau/
pip install -e .
For testing, e.g.:
python src/run.py --samples HHbbtt --subsamples GluGlutoHHto2B2Tau_kl-1p00_kt-1p00_c2-0p00 --starti 0 --endi 1 --year 2022 --processor skimmerA single sample / subsample:
python src/condor/submit.py --analysis bbtautau --git-branch BRANCH-NAME --site ucsd --save-sites ucsd lpc --processor skimmer --samples HHbbtt --subsamples GluGlutoHHto2B2Tau_kl-1p00_kt-1p00_c2-0p00 --files-per-job 5 --tag 24Nov7Signal [--submit]Or from a YAML:
python src/condor/submit.py --yaml src/condor/submit_configs/25Apr5All.yaml --analysis bbtautau --git-branch addmc --site lpc --save-sites ucsd lpc --processor skimmer --tag 25Apr5AddVars --year 2022 [--submit]e.g.
python boostedhh/condor/check_jobs.py --analysis bbtautau --tag 25Apr24_v12_private_signal --processor skimmer --check-running --year 2022EETrigger efficiency studies can be performed using the src/bbtautau/postprocessing/TriggerStudy.py script. The main execution logic is within the if __name__ == "__main__" block, where you can configure the years and signal samples to process.
The script will:
- Load the specified signal samples.
- Define trigger sets and tagger configurations.
- Calculate and plot trigger efficiencies for different channels (
hh,hm,he). - Generate N-1 efficiency tables to study the impact of individual triggers.
To run the study, configure the desired years and SIGNALS inside the script and then execute it:
python src/bbtautau/postprocessing/TriggerStudy.pyOutput plots and tables will be saved in the plots/TriggerStudy/ directory.
python SensitivityStudy.py --actions compute_rocs plot_mass sensitivity --years 2022 2023 --channels hh hmArguments
--years (list, default: 2022 2022EE 2023 2023BPix): List of years to include in the analysis.
--channels (list, default: hh hm he): List of channels to run (default: all).
--test-mode (flag, default: False): Run in test mode (reduced data size).
--use-bdt (flag, default: False): Use BDT model for sensitivity study.
--modelname (str, default: 28May25_baseline): Name of the BDT model to use.
--at-inference (flag, default: False): Compute BDT predictions at inference time.
--actions (list, required): Actions to perform. Choose one or more: compute_rocs, plot_mass, sensitivity, time-methods.
Example Commands
Run an optimization analysis for all years and all channels, with the GloParT tautau tagger:
python SensitivityStudy.py --actions sensitivity
Run a full analysis for all years and all channels, using the BDT for the tautau jet:
python SensitivityStudy.py --actions compute_rocs plot_mass sensitivity
Run only on selected years/channels in test mode:
--test-mode will reduce the data loading time significantly. Practical for testing.
python SensitivityStudy.py --actions sensitivity --years 2022 --channels hh --test-mode
Notes:
- by default uses ABCD background estimation method, and FOM =
$\sqrt{b+\sigma_b}/s$ - by default uses parallel thread data loading and optimization
@Billy - convert into script and add instructions here
This script provides a command-line interface to train, load, and evaluate a multiclass Boosted Decision Tree (BDT) model on data from one or more years. It includes options for studying rescaling effects, evaluating BDT predictions, and managing data reloading.
Data paths defined in Trainer.__init__ in Trainer.data_path by year and sample type.
python bdt.py [options]
Options:
--years
Specify which years of data to store in Trainer object. This establishes which years of data are loaded for training/evaluation.
Examples: --years 2022 2022EE 2023BPix or --years all
--model
Model configuration name (e.g. "test"). Names are keys in /home/users/lumori/bbtautau/src/bbtautau/postprocessing/bdt_config.py configuration dictionaries
--save-dir
Name to save the trained model and generated plots in "/home/users/lumori/bbtautau/src/bbtautau/postprocessing/classifier/{model_dir}". Defaults to "/home/users/lumori/bbtautau/src/bbtautau/postprocessing/classifier/trained_models/{self.modelname}_{('-'.join(self.years) if not self.years == hh_vars.years else 'all')}"
--force-reload
Force reloading of data, even if cache/files exist.
--samples
List of sample names to use for training or evaluation. Defaults to [ggf signals, QCD, ttbar, DY]
--train
Train a new model (mutually exclusive with --load).
--load
Load a previously trained model (default if neither is specified).
--study-rescaling
Script to study the impact of different weight and rescaling rules on BDT performance.
--eval-bdt-preds
Evaluate BDT predictions on the given data samples and years. Outputs are stored in the data directory as .npy files, and can later be handled through postprocessing.load_bdt_preds.
--compare-models
Compare multiple trained models by overlaying ROC curves and writing a CSV of metrics.
--models
List of model names to compare when --compare-models is set.
Example: train a new model ``mymodel''
python bdt.py --train --years all --model mymodel
Models are stored in global CLASSIFIER_PATH defined on top of file.
Evaluate predictions
python bdt.py \
--eval-bdt-preds \
--years 2022 \
--samples dyjets qcd ttbarhad ttbarll ttbarsl \
--model 28May25_baseline \
--signal-key ggfbbtt \
--save-dir /writable/outputThis writes BDT_predictions/<year>/<sample>/<model>_preds.npy under --save-dir (or the default DATA_DIR).
Compare multiple trained models
python bdt.py \
--compare-models \
--models 28May25_baseline 29July25_loweta_lowreg \
--years 2022 \
--signal-key ggfbbtt \
--samples dyjets qcd ttbarhad ttbarll ttbarsl \
--save-dir comparison_outThis produces:
- Overlay ROC plots per signal in
comparison_out/rocs/ - A consolidated CSV
comparison_out/comparison_metrics.csv - An index JSON
comparison_out/comparison_index.json
Notes:
- Headless/containers: plotting uses a non-interactive backend (Agg), so no display server is needed.
- If Python cannot resolve internal modules like
Samples, setPYTHONPATHto the repo root, e.g.export PYTHONPATH=$(pwd):$PYTHONPATHbefore running the commands.
Use src/bbtautau/kubernetes/jobs/make_from_template.py to generate Kubernetes job YAMLs for training or model comparison. It fills either template.yaml (training) or template_compare.yaml (comparison) and writes into src/bbtautau/kubernetes/bdt_trainings/<tag>/<job_name>.yml.
Key flags:
--compare-models: switch to comparison mode (usestemplate_compare.yaml)--models: list of model names to compare (required with--compare-models)--model-dirs: list of per-model output directories mounted under the PVC (e.g./bbtautauvol/bdt/<dir>), same order as--models--years: years to use for training/comparison (space-separated)--signal-key: signal key (e.g.ggfbbtt)--samples: background sample names to include (space-separated)--datapath: data subdirectory on the PVC (joined to/bbtautauvol)--train-args: extra CLI args forwarded tobdt.py(quote this string)--tt-preselection: append flag intotrain-args--job-name: override auto-generated name (auto-generated names are lowercased)--tag: folder underkubernetes/bdt_trainings/for output YAMLs--overwrite: allow overwriting an existing YAML--submit: immediatelykubectl create -f <yaml>in namespacecms-ml--from-json: load all args from a JSON file (keys match the CLI flags)
Training mode example:
python src/bbtautau/kubernetes/jobs/make_from_template.py \
--name 29July25_loweta_lowreg \
--tag no_presel \
--signal-key ggfbbtt \
--samples dyjets qcd ttbarhad ttbarll ttbarsl \
--datapath 25Sep23AddVars_v12_private_signal \
--train-args "--years 2022 2023 --model 29July25_loweta_lowreg" \
--submitThis writes kubernetes/bdt_trainings/no_presel/lm_no_presel_29july25_loweta_lowreg_ggfbbtt.yml (unless --job-name is provided) and submits it. Logs and artifacts are stored under /bbtautauvol/bdt/<save_dir>.
Comparison mode example:
python make_from_template.py \
--compare-models \
--models 20aug25_loweta_lowreg 29july25-loweta-lowreg \
--model-dirs 20aug25_loweta_lowreg_ggfbbtt 29july25-loweta-lowreg_ggfbbtt \
--signal-key ggfbbtt \
--job-name lm_cmp_ggf_july_aug_nopresel
--submitThe script auto-generates job_name when not provided:
- Training:
lm_<tag>_<name>_<signal_key>(lowercased, hyphens -> underscores for the YAML filename) - Comparison:
cmp_<tag>_<model1>-<model2>-..._<signal_key>(lowercased) Hyphens are normalized to underscores in file names; for Kubernetes object names they are converted back to hyphens.
You can also place all arguments in a JSON file and run:
python src/bbtautau/kubernetes/jobs/make_from_template.py --from-json my_job.json --submitWhere my_job.json can contain fields like compare-models, models, model-dirs, years, tag, signal_key, samples, datapath, train_args, etc.
These are made using the postprocessing/postprocessing.py script with the --templates option.
See postprocessing/bash_scripts/MakeTemplates.sh for an example.
Foreword: when dealing with multiple signals and signal regions:
- to specify one or more signal processes to be included in the cards (e.g. ggf + SM vbf or just BSM vbf), specify the argument `--sigs [ggfbbtt, vbfbbtt, vbfbbttk2v0]
- to specify the strategy according to what we do in the
SensitivityStudy.pystep, i.e. using one signal region per channel (ggf) or using two regions per channel (ggf and vbf), we use the--do-vbfargument inrun_blinded_bbtt.shwhen running combine. These past two items are independent: with either strategy, one can choose the signal samples to consider freely. (One should clearly not mix SM with BSM samples in the cards.)
Warning: this should be done outside of your conda/mamba environment!
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_14_1_0_pre4
cd CMSSW_14_1_0_pre4/src
cmsenv
scram-venv
cmsenv
git clone -b v10.1.0 https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
git clone -b v3.0.0-pre1 https://github.com/cms-analysis/CombineHarvester.git CombineHarvester
# Important: this scram has to be run from src dir
scramv1 b clean; scramv1 b
pip3 install --upgrade rhalphalibThen, install this repo as well:
```bash
cd /path/to/your/local/bbtautau/repo
pip3 install -e .After activating the above CMSSW environment (go inside the CMSSW folder and do cmsenv), you can use the CreateDatacard.py script as so (from your src/bbtautau folder):
python3 postprocessing/CreateDatacard.py --sigs ggfbbtt --templates-dir postprocessing/templates/25Apr25LudoCuts --model-name 25Apr25PassFixBy default, this will create datacards for all three channels summed across years in the cards/model-name directory.
As always, do the following to see a full list of options.
python3 postprocessing/CreateDatacard.py --helpAll combine commands while blinded can be run via the src/bbtautau/combine/run_blinded_bbtt.sh script.
e.g. (always from inside the cards folders), this will combine the cards, create a workspace, do a background-only fit, and calculate expected limits:
run_blinded_bbtt.sh --workspace --bfit --limitsAnother script, 'src/bbtautau/combine/run_blinded_bbtt_frzAllConstrainedNuisances.sh' can be used to fit with all constrained nuisances frozen.
See more comments inside the file.
I also add this to my .bashrc for convenience:
export PATH="$PATH:/home/user/rkansal/bbtautau/src/bbtautau/combine"
Run the following to run FitDiagnostics and save FitShapes:
run_blinded_bbtt.sh --workspace --dfitThen see postprocessing/PlotFits.ipynb for plotting. TODO: convert into script!
Set up Rucio following the Twiki. Then:
rucio add-rule cms:/Tau/Run2022F-22Sep2023-v1/MINIAOD 1 T1_US_FNAL_Disk --activity "User AutoApprove" --lifetime 15552000 --ask-approval