Setup the python virtual environment follow these commands (for linux):
# Go to the root directory
python3 -m venv <env_name>
source <env_name>/bin/activate
pip install -r requirements.txtDue to dependency isse after installing the packages from requirements.txt, install folowing packages:
pip install decord
pip install numpy==1.26.4
pip install wheel
pip install flash-attn
pip install git+https://github.com/huggingface/transformersYou can also create a conda virtual environment:
conda env create -f environment.yml
conda activate video_vl_envClone this repository
git clone https://github.com/MayankD409/Video-Temporal-Consistency-Analysis.git
cd Video-Temporal-Consistency-AnalysisDownload the videos and unzip into the /Video-Temporal-Consistency-Analysis directory
After downloading the videos, your file structure should look like this.
.
├── data/
├── src/
├── videos/
│ ├── human/
│ ├── object/
│ ├── simulated/
Create a .env file in the root directory with the following format:
OPENAI_API_KEY="your_openai_api_key"
GEMINI_API_KEY="your_gemini_api_key"
REKA_API_KEY="your_reka_api_key"
Create a pretrained folder to download pretrained model:
mkdir pretrainedBelow are the commands to setup and run the specific models:
Standard command:
python src/evaluate.py --model $model_name --reasoning_type ALL --demonstration_type ALL --total_frames $total_framesFirst download the pretrained model:
cd pretrained
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-1B --local-dir InternVL2-1B# For InternVL-1B
python src/evaluate.py --model InternVL2-1B --reasoning_type ALL --demonstration_type ALL --total_frames 8# For gemini-1.5-flash
python src/evaluate.py --model gemini-1.5-flash --reasoning_type ALL
# For gemini-1.5-pro
python src/evaluate.py --model gemini-1.5-pro --reasoning_type ALLMake sure you have cloned the github repo in the generate_lib folder. Otherwise here is the command:
git clone git@github.com:QQ-MM/Video-CCAM.git Download the pretrained model:
cd pretrained
# 4B
huggingface-cli download --resume-download --local-dir-use-symlinks False JaronTHU/Video-CCAM-4B-v1.1 --local-dir Video-CCAM-4B-v1.1
# Phi-3-mini
huggingface-cli download --resume-download --local-dir-use-symlinks False microsoft/Phi-3-mini-4k-instruct --local-dir Phi-3-mini-4k-instruct
# vision encoder
huggingface-cli download --resume-download --local-dir-use-symlinks False google/siglip-so400m-patch14-384 --local-dir siglip-so400m-patch14-384Run the evaluation script:
python src/evaluate.py --model Video-CCAM-4B-v1.1 --reasoning_type ALL --total_frames 8The code of Qwen2-VL has been in the latest Hugging face transformers and we advise you to build from source with command:
pip install git+https://github.com/huggingface/transformersFirst download the Pretrained model:
cd pretrained
huggingface-cli download --resume-download --local-dir-use-symlinks False Qwen/Qwen2-VL-2B-Instruct --local-dir Qwen2-VL-2B-Instruct# For InternVL-1B
python src/evaluate.py --model Qwen2-VL-2B-Instruct --reasoning_type ALL --demonstration_type ALL --total_frames 8Make sure you have cloned the github repo in the generate_lib folder. Otherwise here is the command:
git clone git@github.com:DAMO-NLP-SG/VideoLLaMA2.gitDownload the pretrained model:
cd pretrained
# video LLaMA 2 7B
huggingface-cli download --resume-download --local-dir-use-symlinks False DAMO-NLP-SG/VideoLLaMA2-7B --local-dir VideoLLaMA2-7BRun the evaluation script:
python src/evaluate.py --model VideoLLaMA2-7B --reasoning_type ALL --total_frames 16# For gpt-4-turbo-preview
python src/evaluate.py --model gpt-4-turbo-preview --reasoning_type ALL --total_frames 8
# For gpt-4o
python src/evaluate.py --model gpt-4o --reasoning_type ALL --total_frames 8
# For gpt-4o-mini
python src/evaluate.py --model gpt-4o-mini --reasoning_type ALL --total_frames 8Make sure to add api-key for reka in .env
# For reka-core-20240501
python src/evaluate.py --model reka-core-20240501 --reasoning_type ALL
# For reka-flash-20240226
python src/evaluate.py --model reka-flash-20240226 --reasoning_type ALL
# For reka-edge-20240208
python src/evaluate.py --model reka-edge-20240208--reasoning_type ALLDownload the pretrained model:
cd pretrained
huggingface-cli download --resume-download --local-dir-use-symlinks False LanguageBind/Video-LLaVA-7B-hf --local-dir Video-LLaVA-7B-hfRun the evaluation script:
python src/evaluate.py --model Video-LLaVA-7B-hf --reasoning_type ALL Make sure you have cloned the github repo in the generate_lib folder. Otherwise here is the command:
git clone git@github.com:NVlabs/VILA.gitDownload the pretrained model:
cd pretrained
# video LLaMA 2 7B
huggingface-cli download --resume-download --local-dir-use-symlinks Efficient-Large-Model/VILA1.5-13b --local-dir VILA1.5-13bRun the evaluation script:
python src/evaluate.py --model VILA1.5-13B --reasoning_type ALL --total_frames 8