Skip to content

NotYuSheng/Multimodal-Large-Language-Model

Multimodal-Large-Language-Model (MLLM)

GitHub last commit Sphinx

Thank you for checking out the Multimodal-Large-Language-Model project. Please note that this project was created for research purposes.

For a more robust and well-developed solution, you may consider using open-webui/open-webui with ollama/ollama.

Demo image

Documentation

You can access the project documentation at [GitHub Pages].

Host requirements

  • Docker: [Installation Guide]
  • Docker Compose: [Installation Guide]
  • Compatibile with Linux and Windows Host
  • Ensure port 8501 and 11434 are not already in use
  • You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. [Source]
  • Project can be ran on either CPU or GPU

Running on GPU

Tested Model(s)

Model Name Size Link
llava:7b 4.7GB Link
llava:34b 20GB Link

Llava is pulled and loaded by default, other models from Ollama can be added into ollama/ollama-build.sh

Usage

Note

Project will run on GPU by default. To run on CPU, use the docker-compose.cpu.yml instead

  1. Clone this repository and navigate to project folder
git clone https://github.com/NotYuSheng/Multimodal-Large-Language-Model.git
cd Multimodal-Large-Language-Model
  1. Build the Docker images:
docker compose build
  1. Run images
docker compose up -d
  1. Access Streamlit webpage from host
<host-ip>:8501

API calls to Ollama server can be made to

<host-ip>:11434