DeepBench is a benchmarking framework developed under the TAHAI (TrustAdHocAI) project. It provides:
- Systematic evaluation of image classification models
- Quantitative analysis of model robustness against image perturbations
- Visual insights into performance degradation
The system consists of two integrated components:
- Benchmark Framework (Backend): Benchmark models with controlled perturbations
- Analysis Dashboard (Frontend): Visualizes results and model comparison metrics
Designed for GPU-accelerated environments, DeepBench supports modern vision-language models including from Hugging Face and Ollama APIs.
Command-line tool for configuration of experiments and running the benchmarks
Core Functionality:
- Applies ~17 image transformations across multiple adjustable use-cases
- Tests models with individual or ramped corruptions
- Stores results in MongoDB (remote) or TinyDB (local)
- TOML-configurable experiments
Interactive visualization dashboard
Core Functionality:
- Model performance comparison across perturbation types and different metrics
- Use-case specific analysis (Medical, Autonomous Driving, etc.)
- The usage of each submodule is described in more detail in their own README files
Developed under the TAHAI (TrustAdHocAI) project:
- Quantifying model robustness boundaries
- Establishing trust metrics for vision systems
- Human-AI collaboration frameworks
Project Links:
- Paper Submission Status: submitted
- IFAF Project Page
- HTW Research Profile
- KI-Werkstatt Implementation
MIT License - See LICENSE for details.
Team:
- Mario Koddenbrock (
Mario.Koddenbrock@HTW-Berlin.de) - Erik Rodner (
Erik.Rodner@HTW-Berlin.de) - David Brodmann (
David.Brodmann@htw-berlin.de) - Rudolf Hoffmann (
Rudolf.Hoffmann@student.htw-berlin.de)
Funding:
This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) via the Project ApplFM (528483508) and
the Institut für angewandte Forschung Berlin (IFAF, Berlin Institute for Applied Research) via Project TrustAdHocAI (TAHAI).