Skip to content

JJunHan/llama-fastapi-docker

Repository files navigation

FAST API DOCKER serving an ML Model

Guide on how to deploy a llama model using FastAPI and dockerising it.

pip freeze > requirements.txt

My setup

Im using a windows machine with Docker Desktop (WSL2). I have a GPU (NVIDIA 3060TI) running cuda 11.6. A little outdated, I know.

Use this https://pytorch.org/get-started/previous-versions/ to map your cude version to a pytorch version.

Encountered errors

Error 1. Windows testing error.

AssertionError: Torch not compiled with CUDA enabled

Fix: pip install torch==1.13.1+cu116 -f https://download.pytorch.org/whl/torch_stable.html

Error 2. When docker compose up --build, you will get error that linux cant download cu116. Just remove it. Linux fix.

torch==1.13.1 # Remove +cu116 from requirements .txt when executing docker compose

Error 3.

Fix for dtype error for pytorch version < 2.10. Since we are using 1.13.1, this error occurs.

Source: https://github.com/meta-llama/llama3/issues/110

Error message:
Traceback (most recent call last):
  File "C:\Users\Admin\Desktop\Learning\RDAI\model.py", line 45, in <module>
    output = pipe(messages, **generation_args)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\pipelines\text_generation.py", line 267, in __call__
    return super().__call__(Chat(text_inputs), **kwargs)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\pipelines\base.py", line 1302, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\pipelines\base.py", line 1309, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\pipelines\base.py", line 1209, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\pipelines\text_generation.py", line 370, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\generation\utils.py", line 2215, in generate
    result = self._sample(
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\generation\utils.py", line 3206, in _sample
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1190, in forward
    outputs = self.model(
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 915, in forward
    causal_mask = self._update_causal_mask(
    causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
  File "C:\Users\Admin\Desktop\Learning\RDAI\venv\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1090, in _prepare_4d_causal_attention_mask_with_cache_position
    causal_mask = torch.triu(causal_mask, diagonal=1)

Fix: Line  1089
            if sequence_length != 1:
                # causal_mask = torch.triu(causal_mask, diagonal=1)
                causal_mask = causal_mask.to(torch.float32)#
                causal_mask = torch.triu(causal_mask, diagonal=1)#
                causal_mask = causal_mask.to('cuda', dtype=torch.bfloat16)#

Error 4. When running docker compose up, the model will die here as the build did not account for error 2/3.

File: C:\Users\Admin\Desktop\Learning\RDAI\venv\Lib\site-packages\transformers\models\llama\modeling_llama.py

Fix: Run this either in dockerfile or exec into the container.

cp /app/venv/Lib/site-packages/transformers/models/llama/modeling_llama.py /usr/local/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py

Error 5. Cant find rust compiler. Add in fix to dockerfile.

1176.6     warning: no files found matching '*.json' under directory 'src/python_interpreter'
1176.6     writing manifest file 'maturin.egg-info/SOURCES.txt'
1176.6     warning: build_py: byte-compiling is disabled, skipping.
1176.6
1176.6     running build_ext
1176.6     running build_rust
1176.6     error: can't find Rust compiler

Fix: https://stackoverflow.com/questions/75085152/cant-find-rust-compiler-to-install-transformers

# Install Rust compiler
RUN curl https://sh.rustup.rs -sSf | bash -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"

Error 6. No Nvidia GPU in docker

Fix: https://docs.docker.com/compose/how-tos/gpu-support/

File "/usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py", line 229, in _lazy_init
server-1  |     torch._C._cuda_init()
server-1  | RuntimeError: Found no NVIDIA driver on you

Fix: https://stackoverflow.com/questions/57066162/how-to-get-docker-to-recognize-nvidia-drivers

Use: docker run --gpus all -it rdai-server

# Added new config into docker compose file. Ran with docker compose up
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all # alternatively, use `count: all` for all GPUs
              capabilities: [gpu]

Running with uvicorn

uvicorn main:app --reload --host 0.0.0.0 --port 8000

alt text

Using fastapi /docs to help.
http://localhost:8000/docs#/default/ask_ask_post

alt text

Looks good when running locally.

alt text

Added a simple post processing flag. Its a little tricky to retrain the model to embed this inside so this will do for now.

alt text

Docker compose up after building. Everything looks fine now.

alt text

alt text

alt text

Unrelated

Git hub error fix.

https://carldesouza.com/wrong-user-when-pushing-to-github-from-visual-studio-code/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published