Skip to content

Releases: mosmar99/HoopVision

MVP

25 Nov 18:04
b02b9f2

Choose a tag to compare

MVP Pre-release
Pre-release

MVP Release Description

This release provides a dockerized application delivered as a Docker Compose stack consisting of five services.
Four of these: detector_service, team_assigner_service, court-service, and orchestrator_service, are custom-built components developed by us, each with its own Dockerfile and image created during the build process. The fifth service, minio, is an external dependency pulled from the official minio/minio image on Docker Hub and included as part of the stack to provide object storage. The Compose stack orchestrates all five containers into a fully integrated application environment.

bt_architecture

For the MVP, the client needs first install Docker and then to run "docker-compose up --build" in the WORKDIR to run all the images in containers. The client specifies local folder with videos. Then after running python process_ui.py in WORKDIR, is prompted with the UI (see image below).

image

They simply specify the video to be processed, which triggers the process pipeline and is signaled by "Processing.." keywords. When processing it completed the UI shows "Done.". Both the raw and processed videos are individually uploaded to a S3 minIO container, in individual buckets. There are unit tests in the tests folder to ensure that bucket video upload, video deletion and bucket deletion functions correctly.

image

The orchestrator manages the whole processing pipeline, sending and recieving API (through FastAPI) calls from services.

A more comprehensive breakdown:

  1. The finetuned production model for player and ball detections is found on the cloud, specifically in a model registry within weights and biases. It is amongst other reasons a cheaper option than self hosting, check #13 for a more detailed breakdown.
  2. Subsequently the orchestrator sends the video path to the tracks detector service which returns player and ball tracks. These contain player and ball bounding boxes localizations for each player and ball object (identified by ByteTracker) across all frames.
  3. Additionally, the orchestrator sends the player tracks and the video path (note that the reference to the video is passed around to minize communication overhead) to the team assigner service. The FashionCLIP by Patrick John et al. is prompted by team jersey colors for each team and takes in a clipped version of the player bounding box and returns team group, 1 or 2. Subsequently, we implemented majority voting (for further details #12) over a set of frames (specifically, over 50 frames) to set the final team belonging for each object id over all frames. The result is returned to the orchestrator. Considerations were made with SAM2 without noticable improvements (see #19).
  4. Ball acquisition run on the orchestrator service since its rule based and lightweight (i.e., does not justify a seperate service instance). It primarly utilizes to rules to check for ball position, the first being IoU and the second is based on closest distance between ball and players.
  5. Thereafter, the homography matrices are calculated in the court service, which are utilized to yield accurate frame by frame updates on a minimap. The service provides homographies to reproject player coordinates for a top down view. The operation is currently ony possible on video_1.mp4 due to hardcoded reference. API is detailed in the court_service README which provides API endpoints for creating reference images. They need to be implemented into the UI to enable the option of creating reference images from any video. Detailed description of operations and previous work is found in issue #11.
  6. Lastly, within the orchestrator service, all yielded components are as an overlay drawn and depicted on the original inputted video.

The requirements file specify pytorch and NVIDIA GPU execution of models. The final product is visualized in a video below.

release_0_1_0.mp4