Framewise Kappa - Annotation Management Tool

Overview:

Framewise Kapp is a Streamlit-based application designed to process, analyze, compare, and merge annotation data exported from a CVAT server. It enables teams to manage annotation tasks, visualize and review annotations, compute agreement metrics, and generate high-quality merged annotations.

Basic configuration:

Run CVAT.
Open the CVAT Django Admin panel and configure the settings as required.
Create the group cvat_pipeline in CVAT and add the users who should have access to the Framewise Kappa app.
Run the app with either

streamlit run app.py (check requirements.txt),

or using Docker (via Dockerfile or Docker Compose).

When running this software inside a Docker container, make sure the container is attached to the same Docker network as the CVAT server (default network name: cvat_cvat).

Make sure the external network exists: If CVAT is already running via Docker Compose, it exists automatically. Otherwise, you can create it manually:

docker network create cvat_cvat

CVAT Requirements:

For visualization and analysis of CVAT labeling data, the following requirements must be met:

At the current stage, only labels of type "tag" with an attribute of input_type "radio" can be used.

Labels can either be boolean with the values ["true", "false"], or multiclass with any other set of values, of which exactly one can be active per frame.

For agreement metrics between different annotators, the filenames of the videos must be identical if they are to be compared.

Features:

CVAT Control Panel:
- Connect to CVAT server via API
- List and manage projects and tasks
- Trigger annotation export from CVAT into CSV
CVAT Annotation Viewer:
- View annotations for individual annotators
- Inspect label definitions and timelines
- Display individual video frames
- Generate descriptive statistics
Data Comparison & Merged Export:
- Compare annotations from multiple annotators for the same video
- Visualize timelines and overlaps
- Calculate agreement metrics (Cohen’s Kappa, Fleiss’ Kappa, IoU)
- Export merged annotations (majority vote)

Technology:

Python
Streamlit (web interface)
CVAT API integration
Pandas, NumPy (data processing)
Plotly (visualization)

Annotation Logic Summary

Boolean labels in this project are treated as persistent states over time. Each explicitly annotated value (True or False) defines a segment that begins at the annotated frame and continues until the next annotation for the same label/attribute. If no further annotation is made, the value is assumed to remain active until the final frame of the video.

Rules:

An annotation at frame f applies from f up to (but not including) the next annotated frame.
The initial state must be explicitly annotated (e.g., False at frame 0).
If there is no subsequent annotation, the current value is assumed to continue until max_frame.
If a label is annotated as False from the first to the last frame and never changes, the export includes this segment with default_assumed=True to indicate it was never set to True.
If a boolean label is defined in the project metadata but is never annotated in any frame, it is assumed to be False for the entire duration (frame 0 to max_frame) and is exported with not_annotated_assumed_false=True to indicate this assumption.
If no annotations exist at all for a task, no label segments are exported. Only metadata and label definitions are included in the CSV, along with a note: No annotations found – no segments exported.

This approach allows consistent and reproducible representation of stateful annotations across frames and supports reliable downstream analysis.

Agreement Metrics – Boolean and Multiclass

This module supports agreement metrics for both boolean and multiclass segment annotations under a persistent-state model.

Annotation Model

Boolean labels are treated as persistent state segments across video frames.

A value remains active until explicitly changed.
All false spans are inferred based on default_assumed=True when there was only one False frame at position 0 and no further True or False frame was ever annotated.
This model enables segment-based reasoning and avoids frame-wise redundancy.

Multiclass labels represent exclusive process states over time (e.g., Preparation, Step1, Step2), where exactly one value should be active per frame.

Boolean Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

Computes per-frame agreement using either Cohen’s (n=2) or Fleiss’ (n>2) Kappa.
Only frames where all annotators have provided a value (True or False) are considered.
Optional parameter include_default_assumed=True allows frames with inferred False (via default_assumed) to be included.
If all annotators assign the same class to all frames (e.g., all False), Kappa is undefined due to lack of variance.

2. Jaccard Index (IoU)

Computes segment-level intersection-over-union for frames labeled True.
Two modes:
- pairwise: IoU is averaged across annotator pairs.
- groupwise: Computes union and intersection across all annotators.
False segments (explicit or default) are not considered.
Best suited for assessing overlap of active labels over time.

Multiclass Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

Computes frame-level agreement for each multiclass label, based on the active value assigned per frame.
Cohen’s Kappa is used when exactly 2 annotators are present.
Fleiss’ Kappa is used when more than 2 annotators are available.
Requires all annotators to label the same frames.
Agreement is based on categorical (value-level) alignment per frame.

2. Jaccard Index (IoU)

Computes IoU per (Label, Value) by treating each value as a binary mask (e.g., Phase=Step1).
Two modes:
- pairwise: Average IoU across annotator pairs.
- groupwise: Union/intersection across all annotators.
Additionally, label-level IoU is computed as a weighted average across values, using each value’s Union Frame count as weight.
Suitable for process-step annotations where temporal consistency is important.

Comparison Summary

Metric	Type	Uses `default_assumed`	Considers `False` segments	Value-specific	Label-level aggregation	Notes
Kappa	Boolean	Optional	Yes (if included)	N/A	N/A	Requires full per-frame coverage
IoU	Boolean	No	No	N/A	N/A	Focuses only on `True` agreement
Kappa	Multiclass	N/A	N/A	No	Yes	Cohen (n=2) and Fleiss (n>2); frame-wise class
IoU	Multiclass	N/A	N/A	Yes	Yes (weighted)	Treats each value as binary class

This logic ensures consistent and meaningful agreement metrics for both boolean and multiclass annotation schemes under sparse and persistent labeling models.

Code origin and authorship

Parts of this code were generated or inspired by suggestions from AI-based tools, including:

OpenAI ChatGPT (June 2025)
GitHub Copilot (June 2025)

All AI-generated outputs were reviewed, modified, and integrated by the project author to meet the specific requirements of this project. The final code, design, and functionality are the sole responsibility of the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
core		core
pages		pages
ui		ui
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.ini		config.ini
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Framewise Kappa - Annotation Management Tool

Overview:

Basic configuration:

CVAT Requirements:

Features:

Technology:

Annotation Logic Summary

Agreement Metrics – Boolean and Multiclass

This module supports agreement metrics for both boolean and multiclass segment annotations under a persistent-state model.

Annotation Model

Boolean Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

2. Jaccard Index (IoU)

Multiclass Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

2. Jaccard Index (IoU)

Comparison Summary

Code origin and authorship

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Framewise Kappa - Annotation Management Tool

Overview:

Basic configuration:

CVAT Requirements:

Features:

Technology:

Annotation Logic Summary

Agreement Metrics – Boolean and Multiclass

This module supports agreement metrics for both boolean and multiclass segment annotations under a persistent-state model.

Annotation Model

Boolean Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

2. Jaccard Index (IoU)

Multiclass Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

2. Jaccard Index (IoU)

Comparison Summary

Code origin and authorship

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages