Scalable Machine Learning with PySpark

Scalable Machine Learning
with PySpark
Ladle Patel

Life Cycle of Data Science Project

What is Spark ?
Apache Spark is an open-source distributed general-purpose cluster computing framework with in-memory data processing engine that can do
ETL, machine learning & graph processing on large volumes of data at rest (batch processing) or in motion (streaming processing) with rich high-
level APIs for the programming languages: Scala, Python, Java and R

Life Cycle of Big Data - Data Science Project
Spark
dataframe

Educational Materials and Tutorials
https://docs.databricks.com/spark/latest/training/index.html
https://spark.apache.org/
https://github.com/lp-dataninja
https://github.com/databricks/Spark-The-Definitive-Guide
https://databricks-prod-
cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/57
22190290795989/875048944749694/8175309257345795/latest.html

Join our team : We are hiring
ladle.patel@genpact.com
ladlepatelr@gmail.com

Dataset to build Scalable Machine Learning Models
https://www.kaggle.com/benhamner/competitions-with-largest-datasets
https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public

Scalable Machine Learning with PySpark

More Related Content

What's hot (20)

Similar to Scalable Machine Learning with PySpark (20)

Recently uploaded (20)

Scalable Machine Learning with PySpark