Skip to content

ai-deep-learning/web-site

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

AI Deep Learning Web Site

Organization's web site deployed using GitHub Pages.

AI Deep Learning Hugging Face organization.

What has been done

  • Created an AWS account
  • Created an AWS organization
  • Granted access to Giri
  • Initial basic set up of SageMaker
  • Created GitHub organization
  • Configured organization's domain and initial web site - https://ai-deep-learning.us/
  • Researched AWS instance types for deep learning, P4 are fastests and most economical, but most expensive per hour at the same time - $30/hour
  • Requested access to SageMaker Studio Labs

Next steps

Resources

Pavel's local

  • RAM: 16 Gb
  • GPU: NVIDIA GeForce GTX 980
    • 12 Gb RAM,
    • 5 TFLOPS vs 65 TFLOPS on T4 (Google Colab Free tier) and 312/624 TFLOPS on A100 (Premium Tier). T4 is 13x faster, A100 is 62x faster

Local setup

  • Install Anaconda - https://www.anaconda.com/download
  • Open Anaconda prompt
  • Create environment: conda create -n star-coder python=3.11
  • Activate environment: conda activate star-coder
  • Install Hugging Face CLI: pip install --upgrade huggingface_hub (this step is in the online instructions. However, the CLI also gets installed as part of requirements installation)
  • Clone StarCoder: git clone https://github.com/bigcode-project/starcoder.git
  • CD to the starcoder directory
  • Install requirements: pip install -r requirements.txt
  • Generate a read access token for Hugging Face: https://huggingface.co/settings/tokens
  • Login to hugging face: huggingface-cli login
  • Install jupyter: pip install jupyter
  • Start jupyter notebooks: jupyter notebook

It my setup it was giving strange errors about not sufficient memory - unable to allocate 150 Mb and 600 Mb with several Gb available. It happened in both GPU and CPU modes so I suspended by efforts to run it locally.

SageMaker

Set disk size to 150 Gb

  • Create a notebook on ml.m5.12xlarge, requested quotas increase at https://us-east-1.console.aws.amazon.com/servicequotas/home/
  • Open a JupyterLab
  • Go to terminal and create a symbolic link from ~/.cache/huggingface to /home/ec2-user/SageMaker/.cache/huggingface. This is needed to have enough space for cache (~80 Gb)
    • sh-4.2$ mkdir /home/ec2-user/SageMaker/.cache
    • sh-4.2$ mkdir /home/ec2-user/SageMaker/.cache/huggingface
    • sh-4.2$ chmod 777 /home/ec2-user/SageMaker/.cache/huggingface
    • sh-4.2$ ln -s /home/ec2-user/SageMaker/.cache/huggingface .cache/huggingface
  • Open a notebook. The below steps need to be performed every time you open a notebook (or a lab?)
  • Install requirements from requirements.txt using pip: %pip install tqdm transformers datasets huggingface-hub accelerate
  • Install sagemaker %pip install sagemaker --upgrade
  • Login to Hugging Face !huggingface-cli login --token <your token here>
  • Open a notebook
  • Run the first snippet from starcoder with cpu as device - kernel dies with cuda. Use df -h command in the terminal to validate that cache is being stored to the /home/ec2-user/... disk. I got a message that the kernel has died once all downloads completed. It appears to be due to ml.t3.medium being too small for the task.

Notes

StarCoder requires ~80 Gb of disk space. As such it cannot be run on Google Colab free tier as is provides "only" 78.2 Gb disk space.

SageMaker Studio Lab

https://studiolab.sagemaker.aws - requires approval.

GPU compute - 4 hours per session and 8 hour in 24 hours period. Storage is limited to 15 Gb

GCP

https://cloud.google.com/compute/all-pricing

$300 credit?

Ability to custom-built a VM - CPU's, RAM, GPU's.

https://towardsdatascience.com/running-jupyter-notebook-in-google-cloud-platform-in-15-min-61e16da34d52

https://repo.anaconda.com/archive/Anaconda3-2023.03-1-Linux-x86_64.sh

Default limit 24 vCPU's per region per project - can request quota increase.

E2 highmem - 16 vCPU's, 128 Gb mem

Developer friendly - convenient observability

  • Download Google SDK and install in the local environment
  • Open SSH

Releases

No releases published

Packages

No packages published