Python-Big-Data-Large-Scale-Computing

Utilizing PySpark for Machine Learning

The rate and amount of data being generated in today's world by both humans and machines are unprecedented. Being able to store, manage, and analyze large-scale data has critical impact on business intelligence, scientific discovery, social and environmental challenges.

The goal of this course is to equip students with the understanding, knowledge, and practical skills to develop big data / machine learning solutions with the state-of-the-art tools, particularly those in the Spark environment, with a focus on programming models in MLlib, GraphX, and SparkSQL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python-Big-Data-Large-Scale-Computing

Homework 1: pySpark and RDDs

Homework 2: Regression in Spark

Homework 3: Classification in Spark

Homework 4: Data Analysis with PCA in Spark

Homework 5: Hands-on with ML-lib and SparkSQL

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Python-Big-Data-Large-Scale-Computing

Homework 1: pySpark and RDDs

Homework 2: Regression in Spark

Homework 3: Classification in Spark

Homework 4: Data Analysis with PCA in Spark

Homework 5: Hands-on with ML-lib and SparkSQL