Skip to content

UKEIAM/labeling_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Framewise Kappa - Annotation Management Tool

Overview:

Framewise Kapp is a Streamlit-based application designed to process, analyze, compare, and merge annotation data exported from a CVAT server. It enables teams to manage annotation tasks, visualize and review annotations, compute agreement metrics, and generate high-quality merged annotations.

Basic configuration:

  1. Run CVAT.

  2. Open the CVAT Django Admin panel and configure the settings as required.

  3. Create the group cvat_pipeline in CVAT and add the users who should have access to the Framewise Kappa app.

  4. Run the app with either

streamlit run app.py (check requirements.txt),

or using Docker (via Dockerfile or Docker Compose).

When running this software inside a Docker container, make sure the container is attached to the same Docker network as the CVAT server (default network name: cvat_cvat).

Make sure the external network exists: If CVAT is already running via Docker Compose, it exists automatically. Otherwise, you can create it manually:

docker network create cvat_cvat

CVAT Requirements:

For visualization and analysis of CVAT labeling data, the following requirements must be met:

At the current stage, only labels of type "tag" with an attribute of input_type "radio" can be used.

Labels can either be boolean with the values ["true", "false"], or multiclass with any other set of values, of which exactly one can be active per frame.

For agreement metrics between different annotators, the filenames of the videos must be identical if they are to be compared.

Features:

  • CVAT Control Panel:

    • Connect to CVAT server via API
    • List and manage projects and tasks
    • Trigger annotation export from CVAT into CSV
  • CVAT Annotation Viewer:

    • View annotations for individual annotators
    • Inspect label definitions and timelines
    • Display individual video frames
    • Generate descriptive statistics
  • Data Comparison & Merged Export:

    • Compare annotations from multiple annotators for the same video
    • Visualize timelines and overlaps
    • Calculate agreement metrics (Cohen’s Kappa, Fleiss’ Kappa, IoU)
    • Export merged annotations (majority vote)

Technology:

  • Python
  • Streamlit (web interface)
  • CVAT API integration
  • Pandas, NumPy (data processing)
  • Plotly (visualization)

Annotation Logic Summary

Boolean labels in this project are treated as persistent states over time. Each explicitly annotated value (True or False) defines a segment that begins at the annotated frame and continues until the next annotation for the same label/attribute. If no further annotation is made, the value is assumed to remain active until the final frame of the video.

Rules:

  • An annotation at frame f applies from f up to (but not including) the next annotated frame.
  • The initial state must be explicitly annotated (e.g., False at frame 0).
  • If there is no subsequent annotation, the current value is assumed to continue until max_frame.
  • If a label is annotated as False from the first to the last frame and never changes, the export includes this segment with default_assumed=True to indicate it was never set to True.
  • If a boolean label is defined in the project metadata but is never annotated in any frame, it is assumed to be False for the entire duration (frame 0 to max_frame) and is exported with not_annotated_assumed_false=True to indicate this assumption.
  • If no annotations exist at all for a task, no label segments are exported. Only metadata and label definitions are included in the CSV, along with a note: No annotations found – no segments exported.

This approach allows consistent and reproducible representation of stateful annotations across frames and supports reliable downstream analysis.

Agreement Metrics – Boolean and Multiclass

This module supports agreement metrics for both boolean and multiclass segment annotations under a persistent-state model.

Annotation Model

Boolean labels are treated as persistent state segments across video frames.

  • A value remains active until explicitly changed.
  • All false spans are inferred based on default_assumed=True when there was only one False frame at position 0 and no further True or False frame was ever annotated.
  • This model enables segment-based reasoning and avoids frame-wise redundancy.

Multiclass labels represent exclusive process states over time (e.g., Preparation, Step1, Step2), where exactly one value should be active per frame.


Boolean Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

  • Computes per-frame agreement using either Cohen’s (n=2) or Fleiss’ (n>2) Kappa.
  • Only frames where all annotators have provided a value (True or False) are considered.
  • Optional parameter include_default_assumed=True allows frames with inferred False (via default_assumed) to be included.
  • If all annotators assign the same class to all frames (e.g., all False), Kappa is undefined due to lack of variance.

2. Jaccard Index (IoU)

  • Computes segment-level intersection-over-union for frames labeled True.
  • Two modes:
    • pairwise: IoU is averaged across annotator pairs.
    • groupwise: Computes union and intersection across all annotators.
  • False segments (explicit or default) are not considered.
  • Best suited for assessing overlap of active labels over time.

Multiclass Agreement Metrics

1. Kappa (Cohen’s / Fleiss’)

  • Computes frame-level agreement for each multiclass label, based on the active value assigned per frame.
  • Cohen’s Kappa is used when exactly 2 annotators are present.
  • Fleiss’ Kappa is used when more than 2 annotators are available.
  • Requires all annotators to label the same frames.
  • Agreement is based on categorical (value-level) alignment per frame.

2. Jaccard Index (IoU)

  • Computes IoU per (Label, Value) by treating each value as a binary mask (e.g., Phase=Step1).
  • Two modes:
    • pairwise: Average IoU across annotator pairs.
    • groupwise: Union/intersection across all annotators.
  • Additionally, label-level IoU is computed as a weighted average across values, using each value’s Union Frame count as weight.
  • Suitable for process-step annotations where temporal consistency is important.

Comparison Summary

Metric Type Uses default_assumed Considers False segments Value-specific Label-level aggregation Notes
Kappa Boolean Optional Yes (if included) N/A N/A Requires full per-frame coverage
IoU Boolean No No N/A N/A Focuses only on True agreement
Kappa Multiclass N/A N/A No Yes Cohen (n=2) and Fleiss (n>2); frame-wise class
IoU Multiclass N/A N/A Yes Yes (weighted) Treats each value as binary class

This logic ensures consistent and meaningful agreement metrics for both boolean and multiclass annotation schemes under sparse and persistent labeling models.

Code origin and authorship

Parts of this code were generated or inspired by suggestions from AI-based tools, including:

  • OpenAI ChatGPT (June 2025)
  • GitHub Copilot (June 2025)

All AI-generated outputs were reviewed, modified, and integrated by the project author to meet the specific requirements of this project. The final code, design, and functionality are the sole responsibility of the authors.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors