SlideShare a Scribd company logo
Powering a Virtual Power Station with
Big Data
Michael Bironneau
April 2016
0
5
10
15
20
25
30
35
Installed Capacity (GW) Generation (GW)
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
0
2
4
6
8
10
12
14
16
18
20
0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30
MW
Total Power
Average upwards flex – 120%
Average downwards flex – 35%
?
?
• 25-40k messages processed per second
• Total size of data 500TB-800TB
Open Energi in the coming year:
• 25-40k messages processed per second
• Total size of data 500TB-800TB
Open Energi in the coming year:
Perspective: here’s what “big data” means to Boeing [1]:
• ~64k messages per second from each aircraft
• Total size of data over 100 petabytes
[1]: http://bit.ly/18kQlMn
0
20
40
60
80
100
120
Open Energi Boeing
Size of data (PB)
Our data is not huge at the moment…
…but after domestic demand-side response (or something else on that scale)
0
20
40
60
80
100
120
Open Energi Boeing
Size of data (PB)
Why Hortonworks Data Platform
• Can scale quickly to respond to market demands
• Interoperability with existing code
• Fantastic data integration
• Knowledgeable technical support
• Security and data governance
Batch | Our HDP setup
Flume
Asset Data
National
Electricity Data
Market data
Other “live”
timeseries data
Hive
Streaming
Hive
other
Applications
Real-time | (Work ongoing)
Asset Data
ML models
HDFS, cache,
Elasticsearch
…
Update ML Models
Correlate Events
Enrich
Apache Hive | Example
CREATE EXTERNAL TABLE semi_structured_stuff (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = ‘semi/structured',
'es.index.auto.create' = 'false') ;
SELECT something FROM semi_structured_stuff
JOIN metadata m ON …
LEFT JOIN timeseries t ON …
Index semi-structured data
(Elasticsearch)
Use Hive to integrate this with
timeseries data and other metadata
Farm out complex analytics to
Python
SELECT transform(something)
USING ‘insane_maths.py’
AS (result)
Benefits
• Reduced storage cost compared to SAN + SQL Server
• Better utilisation of infrastructure thanks to YARN
• Pain-free integration of multiple data sources with external tables
in Hive
• Scale up/down on demand
• Re-use existing Python code = low development overhead
Dynamic
Demand
Predict
&
Forecast
Optimise
&
Explore
Verify
Alert Simulations
Insights via web
Machine learning
Statistical Analysis
Event correlation
Expert system
Real-time aggregation
Real-time web feed
Dynamic
Demand
Predict
&
Forecast
Optimise
&
Explore
Verify
Alert Simulations
Insights via web
Machine learning
Statistical Analysis
Event correlation
Expert system
Real-time aggregation
Real-time web feed
Thanks for listening. Any questions?

More Related Content

PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
PPTX
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
PDF
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
PPTX
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
PPTX
Achieving 100k Queries per Hour on Hive on Tez
PDF
Engineering fast indexes
PDF
Apache Eagle - Monitor Hadoop in Real Time
PPTX
Working with the Scalding Type -Safe API
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Achieving 100k Queries per Hour on Hive on Tez
Engineering fast indexes
Apache Eagle - Monitor Hadoop in Real Time
Working with the Scalding Type -Safe API

What's hot (20)

PPTX
To The Cloud and Back: A Look At Hybrid Analytics
PPTX
What's new in Hadoop Common and HDFS
PPTX
October 2014 HUG : Hive On Spark
PDF
TriHUG Feb: Hive on spark
PPTX
February 2014 HUG : Hive On Tez
PPTX
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
PPTX
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
PPTX
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
PPTX
Hadoop from Hive with Stinger to Tez
PPTX
Empower Data-Driven Organizations
PDF
HPE Hadoop Solutions - From use cases to proposal
PPTX
Real-time Analytics with Trino and Apache Pinot
PDF
Hudi architecture, fundamentals and capabilities
PPTX
Interactive Analytics at Scale in Apache Hive Using Druid
PDF
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
PPTX
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
PDF
Imply at Apache Druid Meetup in London 1-15-20
PPTX
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
PDF
Hd insight essentials quick view
To The Cloud and Back: A Look At Hybrid Analytics
What's new in Hadoop Common and HDFS
October 2014 HUG : Hive On Spark
TriHUG Feb: Hive on spark
February 2014 HUG : Hive On Tez
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Hadoop from Hive with Stinger to Tez
Empower Data-Driven Organizations
HPE Hadoop Solutions - From use cases to proposal
Real-time Analytics with Trino and Apache Pinot
Hudi architecture, fundamentals and capabilities
Interactive Analytics at Scale in Apache Hive Using Druid
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Imply at Apache Druid Meetup in London 1-15-20
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hd insight essentials quick view
Ad

Viewers also liked (20)

PPTX
Taming the Elephant: Efficient and Effective Apache Hadoop Management
PPTX
HDFS: Optimization, Stabilization and Supportability
PDF
The Future of Apache Storm
PPTX
Data Process Systems, connecting everything
PPTX
The key to unlocking the Value in the IoT? Managing the Data!
PPTX
Log I am your father
PDF
Cooperative Data Exploration with iPython Notebook
PPTX
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Protecting Enterprise Data in Apache Hadoop
PDF
The Heterogeneous Data lake
PDF
A Continuously Deployed Hadoop Analytics Platform?
PPTX
PPTX
Practical advice to build a data driven company
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
PDF
NLP Structured Data Investigation on Non-Text
PPTX
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
PPTX
Using a Data Lake at the core of a Life Assurance business
PDF
Architecting a multi-tenanted platform
PPTX
Hadoop Platform at Yahoo
Taming the Elephant: Efficient and Effective Apache Hadoop Management
HDFS: Optimization, Stabilization and Supportability
The Future of Apache Storm
Data Process Systems, connecting everything
The key to unlocking the Value in the IoT? Managing the Data!
Log I am your father
Cooperative Data Exploration with iPython Notebook
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Apache Hive 2.0: SQL, Speed, Scale
Protecting Enterprise Data in Apache Hadoop
The Heterogeneous Data lake
A Continuously Deployed Hadoop Analytics Platform?
Practical advice to build a data driven company
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
NLP Structured Data Investigation on Non-Text
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Using a Data Lake at the core of a Life Assurance business
Architecting a multi-tenanted platform
Hadoop Platform at Yahoo
Ad

Similar to Powering a Virtual Power Station with Big Data (20)

PPTX
Inroduction to Big Data
PPTX
Empower Data-Driven Organizations with HPE and Hadoop
PDF
Survey of Big Data Infrastructures
PPT
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
PPTX
Big data spain keynote nov 2016
PPTX
Big Data: It’s all about the Use Cases
PPTX
Big Data Infrastructure and Hadoop components.pptx
PDF
Big Data Techcon 2014
PPTX
slideshare is annoying as fsck duh aaaaaa
PDF
Big Data/Hadoop Infrastructure Considerations
PPTX
Benefits of Transferring Real-Time Data to Hadoop at Scale
PPTX
Cloud Austin Meetup - Hadoop like a champion
PPTX
Big Data vs Data Warehousing
PPTX
Cloud Computing y Big Data, próxima frontera de la innovación
PPTX
Big data business case
PPTX
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
PDF
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
PDF
Dell_whitepaper[1]
Inroduction to Big Data
Empower Data-Driven Organizations with HPE and Hadoop
Survey of Big Data Infrastructures
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big data spain keynote nov 2016
Big Data: It’s all about the Use Cases
Big Data Infrastructure and Hadoop components.pptx
Big Data Techcon 2014
slideshare is annoying as fsck duh aaaaaa
Big Data/Hadoop Infrastructure Considerations
Benefits of Transferring Real-Time Data to Hadoop at Scale
Cloud Austin Meetup - Hadoop like a champion
Big Data vs Data Warehousing
Cloud Computing y Big Data, próxima frontera de la innovación
Big data business case
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
Dell_whitepaper[1]

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
PPT
State of Security: Apache Spark & Apache Zeppelin
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
PDF
Enabling Digital Diagnostics with a Data Science Platform
PDF
Revolutionize Text Mining with Spark and Zeppelin
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
PDF
Hadoop Crash Course
PDF
Data Science Crash Course
PDF
Apache Spark Crash Course
PDF
Dataflow with Apache NiFi
PPTX
Schema Registry - Set you Data Free
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
Mool - Automated Log Analysis using Data Science and ML
PPTX
How Hadoop Makes the Natixis Pack More Efficient
PPTX
HBase in Practice
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
PPTX
Backup and Disaster Recovery in Hadoop
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Data Science Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
August Patch Tuesday
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
Encapsulation theory and applications.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Univ-Connecticut-ChatGPT-Presentaion.pdf
MIND Revenue Release Quarter 2 2025 Press Release
NewMind AI Weekly Chronicles - August'25-Week II
August Patch Tuesday
Reach Out and Touch Someone: Haptics and Empathic Computing
Heart disease approach using modified random forest and particle swarm optimi...
Machine Learning_overview_presentation.pptx
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Tartificialntelligence_presentation.pptx
A comparative study of natural language inference in Swahili using monolingua...

Powering a Virtual Power Station with Big Data

  • 1. Powering a Virtual Power Station with Big Data Michael Bironneau April 2016
  • 6. 0 2 4 6 8 10 12 14 16 18 20 0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30 MW Total Power Average upwards flex – 120% Average downwards flex – 35%
  • 7. ? ?
  • 8. • 25-40k messages processed per second • Total size of data 500TB-800TB Open Energi in the coming year:
  • 9. • 25-40k messages processed per second • Total size of data 500TB-800TB Open Energi in the coming year: Perspective: here’s what “big data” means to Boeing [1]: • ~64k messages per second from each aircraft • Total size of data over 100 petabytes [1]: http://bit.ly/18kQlMn
  • 10. 0 20 40 60 80 100 120 Open Energi Boeing Size of data (PB) Our data is not huge at the moment…
  • 11. …but after domestic demand-side response (or something else on that scale) 0 20 40 60 80 100 120 Open Energi Boeing Size of data (PB)
  • 12. Why Hortonworks Data Platform • Can scale quickly to respond to market demands • Interoperability with existing code • Fantastic data integration • Knowledgeable technical support • Security and data governance
  • 13. Batch | Our HDP setup Flume Asset Data National Electricity Data Market data Other “live” timeseries data Hive Streaming Hive other Applications
  • 14. Real-time | (Work ongoing) Asset Data ML models HDFS, cache, Elasticsearch … Update ML Models Correlate Events Enrich
  • 15. Apache Hive | Example CREATE EXTERNAL TABLE semi_structured_stuff (...) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = ‘semi/structured', 'es.index.auto.create' = 'false') ; SELECT something FROM semi_structured_stuff JOIN metadata m ON … LEFT JOIN timeseries t ON … Index semi-structured data (Elasticsearch) Use Hive to integrate this with timeseries data and other metadata Farm out complex analytics to Python SELECT transform(something) USING ‘insane_maths.py’ AS (result)
  • 16. Benefits • Reduced storage cost compared to SAN + SQL Server • Better utilisation of infrastructure thanks to YARN • Pain-free integration of multiple data sources with external tables in Hive • Scale up/down on demand • Re-use existing Python code = low development overhead
  • 17. Dynamic Demand Predict & Forecast Optimise & Explore Verify Alert Simulations Insights via web Machine learning Statistical Analysis Event correlation Expert system Real-time aggregation Real-time web feed
  • 18. Dynamic Demand Predict & Forecast Optimise & Explore Verify Alert Simulations Insights via web Machine learning Statistical Analysis Event correlation Expert system Real-time aggregation Real-time web feed
  • 19. Thanks for listening. Any questions?

Editor's Notes

  • #5: There is a powerful economic case to distribute demand more efficiently using DSR technology, regardless of the future generation mix The capital cost of building a new peaking power station can be up to £5 million per megawatt of power The current costs to aggregate a megawatt via Dynamic Demand sit at around £200,000 It provides a no-build approach to capacity challenges which is cleaner, cheaper, more secure and faster than the alternatives.
  • #6: - Open Energi is turning the energy system on it’s head, so that instead of supply adjusting to meet demand, demand adjusts to meet supply By harnessing small amounts of flexible energy demand from energy-intensive equipment we can create a virtual power station and displace fossil-fuelled peaking power stations This is enabling a user-led transformation in how our energy system works, so that businesses and consumers are not only making it happen, but also seeing the benefits It’s a vital part of our transition to a zero carbon economy because we cannot maximise our use of renewables unless our demand for energy becomes more responsive
  • #7: Dynamic Demand can deliver approx £85,000 per MW/Yr FCDM / Static FFR £22,000 - £26,000 per MW/Yr STOR - £10,000 - £15,000 per MW/Yr
  • #8: We capture data at finest grain level. Stored as COV. The challenge is then aggregating multiple timeseries without downsampling. We also need to downsample all these series to multiple resolutions. They are all irregularly sampled. Hence the challenge, which prevents us from using timeseries databases.
  • #13: Confidence that our data platform can scale quickly if needed The markets we operate in are unpredictable When domestic market takes off, our data could increase by two orders of magnitude! Fantastic data integration support Can easily wrap our existing codebase Reduce our £/GB by 80% for archival data while retaining ability to query Extensibility New tools being added to the ecosystem on a regular basis More and more developers trained in Hadoop ecosystem means easier on-boarding Knowledgeable support from Hortonworks Security and governance built into platform
  • #15: This is ongoing work and in particular we haven’t quite figured out the “asset data” -> storm bit.
  • #17: Not limited by storage cost – able to enrich data to reduce cost of processing Better utilisation of infrastructure compared to VMs dedicated to a single service – here YARN means we can really get the most out of everything Ability to mix Python with SQL means easier/maintainable aggregation/downsampling Interactive querying of multiple data sources with Spark in Jupyter Easy ingestion process using multiple Flume agents Can still use Elasticsearch for small timeseries
  • #18: Now let’s have a look at where HDP fits in to our big “wheel of data”.
  • #20: Not limited by storage cost – able to enrich data to reduce cost of processing Ability to mix Python with SQL means easier/maintainable aggregation/downsampling Interactive querying of multiple data sources with Spark in Jupyter Easy ingestion process using multiple Flume agents Can still use Elasticsearch for small timeseries