Benchmark for Evaluating LLM Performance in Residential Building Energy Modeling
BEMEval-Res is the residential benchmark of the open-source BEMEval framework β a standardized dataset and evaluation suite for evaluating large language models (LLMs) performance on building energy modeling (BEM) tasks.
The benchmark focuses on translating unstructured building descriptions into structured/machine-readable energy modeling schemas, enabling consistent and reproducible evaluation of AI models in the BEM domain.
bemeval-res/
βββ data/
β βββ datasets/ # Benchmark building cases
β β βββ l100/ # HERS L100 test case
β β β βββ input/ # Building descriptions (text)
β β β βββ output/ # HPXML reference outputs
β β βββ nzertf/ # NIST Net-Zero Energy Residential Test Facility
β β β βββ input/ # Building specifications (xlsx)
β β β βββ output/ # HPXML reference outputs
β β βββ iunit/ # NREL iUnit (multifamily)
β β βββ input/ # Building specifications (xlsx)
β β βββ output/ # EPC-Schema reference outputs (TOML)
β βββ metadata/ # Schema definitions and references
β βββ epc-schema/ # EPC-Schema specification
β β βββ epc_schema.json
β β βββ epc_schema_descriptions.json
β β βββ references/ # Supporting documentation and diagrams
β βββ hpxml/ # HPXML schema files
β βββ HPXML.xsd
β βββ HPXML.txt
βββ evaluation/ # Evaluation scripts and metrics
β βββ __init__.py
β βββ evaluate.py # KVOR metric implementation
βββ docs/ # Documentation
β βββ getting_started.md
βββ pyproject.toml # Project configuration
- Multiple Schemas β Includes both industry and research schemas:
- HPXML β consensus residential schema for home energy modeling
- EPC-Schema β customized normative schema based on ISO/CEN 13790 energy performance methods
- Representative Building Cases β curated building descriptions from:
- HERS L100 test case
- NIST NZERTF (single-family)
- NREL iUnit (apartment/multifamily)
- Evaluation Metrics
- KeyβValue Overlap Rate (KVOR)
