Framewise Kapp is a Streamlit-based application designed to process, analyze, compare, and merge annotation data exported from a CVAT server. It enables teams to manage annotation tasks, visualize and review annotations, compute agreement metrics, and generate high-quality merged annotations.
-
Run CVAT.
-
Open the CVAT Django Admin panel and configure the settings as required.
-
Create the group cvat_pipeline in CVAT and add the users who should have access to the Framewise Kappa app.
-
Run the app with either
streamlit run app.py (check requirements.txt),
or using Docker (via Dockerfile or Docker Compose).
When running this software inside a Docker container, make sure the container is attached to the same Docker network as the CVAT server (default network name: cvat_cvat).
Make sure the external network exists: If CVAT is already running via Docker Compose, it exists automatically. Otherwise, you can create it manually:
docker network create cvat_cvat
For visualization and analysis of CVAT labeling data, the following requirements must be met:
At the current stage, only labels of type "tag" with an attribute of input_type "radio" can be used.
Labels can either be boolean with the values ["true", "false"], or multiclass with any other set of values, of which exactly one can be active per frame.
For agreement metrics between different annotators, the filenames of the videos must be identical if they are to be compared.
-
CVAT Control Panel:
- Connect to CVAT server via API
- List and manage projects and tasks
- Trigger annotation export from CVAT into CSV
-
CVAT Annotation Viewer:
- View annotations for individual annotators
- Inspect label definitions and timelines
- Display individual video frames
- Generate descriptive statistics
-
Data Comparison & Merged Export:
- Compare annotations from multiple annotators for the same video
- Visualize timelines and overlaps
- Calculate agreement metrics (Cohen’s Kappa, Fleiss’ Kappa, IoU)
- Export merged annotations (majority vote)
- Python
- Streamlit (web interface)
- CVAT API integration
- Pandas, NumPy (data processing)
- Plotly (visualization)
Boolean labels in this project are treated as persistent states over time. Each explicitly annotated
value (True or False) defines a segment that begins at the annotated frame and continues
until the next annotation for the same label/attribute. If no further annotation is made, the value
is assumed to remain active until the final frame of the video.
Rules:
- An annotation at frame
fapplies fromfup to (but not including) the next annotated frame. - The initial state must be explicitly annotated (e.g.,
Falseat frame 0). - If there is no subsequent annotation, the current value is assumed to continue until
max_frame. - If a label is annotated as
Falsefrom the first to the last frame and never changes, the export includes this segment withdefault_assumed=Trueto indicate it was never set toTrue. - If a boolean label is defined in the project metadata but is never annotated in any frame,
it is assumed to be
Falsefor the entire duration (frame0tomax_frame) and is exported withnot_annotated_assumed_false=Trueto indicate this assumption. - If no annotations exist at all for a task, no label segments are exported. Only metadata and
label definitions are included in the CSV, along with a note:
No annotations found – no segments exported.
This approach allows consistent and reproducible representation of stateful annotations across frames and supports reliable downstream analysis.
This module supports agreement metrics for both boolean and multiclass segment annotations under a persistent-state model.
Boolean labels are treated as persistent state segments across video frames.
- A value remains active until explicitly changed.
- All false spans are inferred based on
default_assumed=Truewhen there was only one False frame at position 0 and no further True or False frame was ever annotated. - This model enables segment-based reasoning and avoids frame-wise redundancy.
Multiclass labels represent exclusive process states over time (e.g., Preparation, Step1, Step2), where exactly one value should be active per frame.
- Computes per-frame agreement using either Cohen’s (n=2) or Fleiss’ (n>2) Kappa.
- Only frames where all annotators have provided a value (
TrueorFalse) are considered. - Optional parameter
include_default_assumed=Trueallows frames with inferredFalse(viadefault_assumed) to be included. - If all annotators assign the same class to all frames (e.g., all
False), Kappa is undefined due to lack of variance.
- Computes segment-level intersection-over-union for frames labeled
True. - Two modes:
pairwise: IoU is averaged across annotator pairs.groupwise: Computes union and intersection across all annotators.
Falsesegments (explicit or default) are not considered.- Best suited for assessing overlap of active labels over time.
- Computes frame-level agreement for each multiclass label, based on the active value assigned per frame.
- Cohen’s Kappa is used when exactly 2 annotators are present.
- Fleiss’ Kappa is used when more than 2 annotators are available.
- Requires all annotators to label the same frames.
- Agreement is based on categorical (value-level) alignment per frame.
- Computes IoU per (Label, Value) by treating each value as a binary mask (e.g.,
Phase=Step1). - Two modes:
pairwise: Average IoU across annotator pairs.groupwise: Union/intersection across all annotators.
- Additionally, label-level IoU is computed as a weighted average across values, using each value’s Union Frame count as weight.
- Suitable for process-step annotations where temporal consistency is important.
| Metric | Type | Uses default_assumed |
Considers False segments |
Value-specific | Label-level aggregation | Notes |
|---|---|---|---|---|---|---|
| Kappa | Boolean | Optional | Yes (if included) | N/A | N/A | Requires full per-frame coverage |
| IoU | Boolean | No | No | N/A | N/A | Focuses only on True agreement |
| Kappa | Multiclass | N/A | N/A | No | Yes | Cohen (n=2) and Fleiss (n>2); frame-wise class |
| IoU | Multiclass | N/A | N/A | Yes | Yes (weighted) | Treats each value as binary class |
This logic ensures consistent and meaningful agreement metrics for both boolean and multiclass annotation schemes under sparse and persistent labeling models.
Parts of this code were generated or inspired by suggestions from AI-based tools, including:
- OpenAI ChatGPT (June 2025)
- GitHub Copilot (June 2025)
All AI-generated outputs were reviewed, modified, and integrated by the project author to meet the specific requirements of this project. The final code, design, and functionality are the sole responsibility of the authors.