This repo provides a short bash script to launch an interactive Jupyter notebook
that uses Spark to distribute work across the Big Data cluster. The Jupyter
notebook Demo.ipynb demonstrates how to use the PySpark API.
Log on to the bigdata cluster with port-forwarding from port 8889 on the bigdata gateway to some free local port (e.g. 9999):
ssh -L 9999:<remote_host>:8889 <remote_host>
where remote_host is the name of the Big Data gateway.
Then, clone this repo, cd to it, and run /launch-notebook.sh. On your local
machine, navigate to localhost:9999 and program away!