SlideShare a Scribd company logo
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
PySpark
Dataframe Tutorial
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Today’s Training Topics
❖ Need for Dataframes
❖ What are Dataframes
❖ Dataframes Features
❖ Sources of Dataframes
❖ Demo
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Why do we need Dataframes?
Processing Structured
And Semi-Structured Data
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Why do we need Dataframes?
Processing Structured
And Semi-Structured Data
Handling Petabytes
of Data
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Why do we need Dataframes?
Processing Structured
And Semi-Structured Data
Handling Petabytes
of Data
Wide Range of Data
Formats and Sources
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Why do we need Dataframes?
Processing Structured
And Semi-Structured Data
Handling Petabytes
of Data
Wide Range of Data
Formats and Sources
Support for Multiple
Languages
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
What are Dataframes?
2d labelled
Data
Structure
Similar to
SQL
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Features of Dataframes
Distributed
Lazy
EVALs
Immutable
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Creating a Dataframe(Sources)
Dataframe
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Important Classes
• pyspark.sql.SQLContext
• pyspark.sql.DataFrame
• pyspark.sql.Column
• pyspark.sql.Row
• pyspark.sql.GroupedData
• pyspark.sql.DataFrameNaFunctions
• pyspark.sql.DataFrameStatFunctions
• pyspark.sql.functions
• pyspark.sql.types
• pyspark.sql.Window
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
CREATING DATAFRAMES DEMO
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
FIFA WORLD CUP – USE CASEFIFA world Cup Use Case
PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
SUPERHEROS Use Case
PySpark Dataframes Tutorial | Introduction to PySpark Dataframes API | PySpark Training | Edureka

More Related Content

What's hot (20)

Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
Databricks
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
clive boulton
 
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
Databricks
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark Service
Databricks
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Databricks
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
Databricks
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Ahsan Javed Awan
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
Databricks
 
Spark ML Pipeline serving
Spark ML Pipeline serving
Stepan Pushkarev
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Databricks
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
 
Alpine innovation final v1.0
Alpine innovation final v1.0
alpinedatalabs
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on Embeddings
Databricks
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
EDB
 
Optimized your sql server operation using big data echo system
Optimized your sql server operation using big data echo system
Tzahi Hakikat
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Databricks
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
Databricks
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
clive boulton
 
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
Databricks
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark Service
Databricks
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Databricks
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
Databricks
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Ahsan Javed Awan
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
Databricks
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Databricks
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
 
Alpine innovation final v1.0
Alpine innovation final v1.0
alpinedatalabs
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on Embeddings
Databricks
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
EDB
 
Optimized your sql server operation using big data echo system
Optimized your sql server operation using big data echo system
Tzahi Hakikat
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Databricks
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 

Similar to PySpark Dataframes Tutorial | Introduction to PySpark Dataframes API | PySpark Training | Edureka (20)

Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
Provectus
 
PYSPARK PROGRAMMING.pdf
PYSPARK PROGRAMMING.pdf
MuhammadFauzi713466
 
Dive into PySpark
Dive into PySpark
Mateusz Buśkiewicz
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Databricks
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFrames
Ilya Ganelin
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Spark Structured APIs
Spark Structured APIs
Knoldus Inc.
 
Improving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVM
Holden Karau
 
Pyspark training | Pyspark training online
Pyspark training | Pyspark training online
Accentfuture
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 
Pyspark training | Introduction to PySpark DataFrames
Pyspark training | Introduction to PySpark DataFrames
Accentfuture
 
Pyspark tutorial
Pyspark tutorial
HarikaReddy115
 
Pyspark tutorial
Pyspark tutorial
HarikaReddy115
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
PyData
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Holden Karau
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
Edureka!
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 
5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
Provectus
 
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Lessons from the Field, Episode II: Applying Best Practices to Your Apache S...
Databricks
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFrames
Ilya Ganelin
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Spark Structured APIs
Spark Structured APIs
Knoldus Inc.
 
Improving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVM
Holden Karau
 
Pyspark training | Pyspark training online
Pyspark training | Pyspark training online
Accentfuture
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 
Pyspark training | Introduction to PySpark DataFrames
Pyspark training | Introduction to PySpark DataFrames
Accentfuture
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
PyData
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Holden Karau
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
Edureka!
 
Ad

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
Safe Software
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
Introduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUE
Google Developer Group On Campus European Universities in Egypt
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
Safe Software
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 

PySpark Dataframes Tutorial | Introduction to PySpark Dataframes API | PySpark Training | Edureka

  • 1. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training PySpark Dataframe Tutorial
  • 2. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Today’s Training Topics ❖ Need for Dataframes ❖ What are Dataframes ❖ Dataframes Features ❖ Sources of Dataframes ❖ Demo
  • 3. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Why do we need Dataframes? Processing Structured And Semi-Structured Data
  • 4. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Why do we need Dataframes? Processing Structured And Semi-Structured Data Handling Petabytes of Data
  • 5. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Why do we need Dataframes? Processing Structured And Semi-Structured Data Handling Petabytes of Data Wide Range of Data Formats and Sources
  • 6. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Why do we need Dataframes? Processing Structured And Semi-Structured Data Handling Petabytes of Data Wide Range of Data Formats and Sources Support for Multiple Languages
  • 7. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training What are Dataframes? 2d labelled Data Structure Similar to SQL
  • 8. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Features of Dataframes Distributed Lazy EVALs Immutable
  • 9. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Creating a Dataframe(Sources) Dataframe
  • 10. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training Important Classes • pyspark.sql.SQLContext • pyspark.sql.DataFrame • pyspark.sql.Column • pyspark.sql.Row • pyspark.sql.GroupedData • pyspark.sql.DataFrameNaFunctions • pyspark.sql.DataFrameStatFunctions • pyspark.sql.functions • pyspark.sql.types • pyspark.sql.Window
  • 11. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training CREATING DATAFRAMES DEMO
  • 12. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training FIFA WORLD CUP – USE CASEFIFA world Cup Use Case
  • 13. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training SUPERHEROS Use Case