AI Deep Learning Web Site

Organization's web site deployed using GitHub Pages.

AI Deep Learning Hugging Face organization.

What has been done

Created an AWS account
Created an AWS organization
Granted access to Giri
Initial basic set up of SageMaker
Created GitHub organization
Configured organization's domain and initial web site - https://ai-deep-learning.us/
Researched AWS instance types for deep learning, P4 are fastests and most economical, but most expensive per hour at the same time - $30/hour
Requested access to SageMaker Studio Labs

Next steps

Set up local environment for StarCoder - https://github.com/bigcode-project/starcoder/tree/main
Figure out the best way to run StarCoder on AWS (EC2, ECS, spot instances?), implement, document

Resources

Pavel's local

RAM: 16 Gb
GPU: NVIDIA GeForce GTX 980
- 12 Gb RAM,
- 5 TFLOPS vs 65 TFLOPS on T4 (Google Colab Free tier) and 312/624 TFLOPS on A100 (Premium Tier). T4 is 13x faster, A100 is 62x faster

Local setup

Install Anaconda - https://www.anaconda.com/download
Open Anaconda prompt
Create environment: conda create -n star-coder python=3.11
Activate environment: conda activate star-coder
Install Hugging Face CLI: pip install --upgrade huggingface_hub (this step is in the online instructions. However, the CLI also gets installed as part of requirements installation)
Clone StarCoder: git clone https://github.com/bigcode-project/starcoder.git
CD to the starcoder directory
Install requirements: pip install -r requirements.txt
Generate a read access token for Hugging Face: https://huggingface.co/settings/tokens
Login to hugging face: huggingface-cli login
Install jupyter: pip install jupyter
Start jupyter notebooks: jupyter notebook

It my setup it was giving strange errors about not sufficient memory - unable to allocate 150 Mb and 600 Mb with several Gb available. It happened in both GPU and CPU modes so I suspended by efforts to run it locally.

SageMaker

Set disk size to 150 Gb

Create a notebook on ml.m5.12xlarge, requested quotas increase at https://us-east-1.console.aws.amazon.com/servicequotas/home/
Open a JupyterLab
Go to terminal and create a symbolic link from ~/.cache/huggingface to /home/ec2-user/SageMaker/.cache/huggingface. This is needed to have enough space for cache (~80 Gb)
- sh-4.2$ mkdir /home/ec2-user/SageMaker/.cache
- sh-4.2$ mkdir /home/ec2-user/SageMaker/.cache/huggingface
- sh-4.2$ chmod 777 /home/ec2-user/SageMaker/.cache/huggingface
- sh-4.2$ ln -s /home/ec2-user/SageMaker/.cache/huggingface .cache/huggingface
Open a notebook. The below steps need to be performed every time you open a notebook (or a lab?)
Install requirements from requirements.txt using pip: %pip install tqdm transformers datasets huggingface-hub accelerate
Install sagemaker %pip install sagemaker --upgrade
Login to Hugging Face !huggingface-cli login --token <your token here>
Open a notebook
Run the first snippet from starcoder with cpu as device - kernel dies with cuda. Use df -h command in the terminal to validate that cache is being stored to the /home/ec2-user/... disk. I got a message that the kernel has died once all downloads completed. It appears to be due to ml.t3.medium being too small for the task.

Notes

StarCoder requires ~80 Gb of disk space. As such it cannot be run on Google Colab free tier as is provides "only" 78.2 Gb disk space.

SageMaker Studio Lab

https://studiolab.sagemaker.aws - requires approval.

GPU compute - 4 hours per session and 8 hour in 24 hours period. Storage is limited to 15 Gb

GCP

https://cloud.google.com/compute/all-pricing

$300 credit?

Ability to custom-built a VM - CPU's, RAM, GPU's.

https://towardsdatascience.com/running-jupyter-notebook-in-google-cloud-platform-in-15-min-61e16da34d52

https://repo.anaconda.com/archive/Anaconda3-2023.03-1-Linux-x86_64.sh

Default limit 24 vCPU's per region per project - can request quota increase.

E2 highmem - 16 vCPU's, 128 Gb mem

Developer friendly - convenient observability

Download Google SDK and install in the local environment
Open SSH
- pip install tqdm transformers datasets huggingface-hub accelerate
- huggingface-cli login --token <your token here>
- python
- https://github.com/bigcode-project/starcoder - code snippets line by line, device - cpu

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Deep Learning Web Site

What has been done

Next steps

Resources

Pavel's local

Local setup

SageMaker

Notes

SageMaker Studio Lab

GCP

About

Uh oh!

Releases

Packages

License

ai-deep-learning/web-site

Folders and files

Latest commit

History

Repository files navigation

AI Deep Learning Web Site

What has been done

Next steps

Resources

Pavel's local

Local setup

SageMaker

Notes

SageMaker Studio Lab

GCP

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages