SlideShare a Scribd company logo
Warehouse-Scale
Computers
CS4342 Advanced Computer Architecture
Dilum Bandara
Dilum.Bandara@uom.lk
Slides adapted from “Computer Architecture, A Quantitative Approach” by John L.
Hennessy and David A. Patterson, 5th Edition, 2012, MK Publishers and
The Datacenter as a Computer:An Introduction to the Design of Warehouse-Scale
Machines by Luiz André Barroso & Urs Hölzle
Outline
 Programming model & workloads
 Architectures
 Cloud computing
2
Warehouse-Scale Computers (WSC)
3
www.laserfocusworld.com/articles/print/volume-48/issue-
12/features/optical-technologies-scale-the-datacenter.html http://www.slashgear.com/google-data-center-hd-photos-
hit-where-the-internet-lives-gallery-17252451/
WSC (Cont.)
4
WSC Layout
5
Source: http://bnrg.cs.berkeley.edu/~randy/Courses/CS294.F07/
Main Components of a WSC
6
Warehouse-Scale Computer (WSC)
 Provides Internet services
 Search, social networking, online maps, video sharing,
online shopping, email, cloud computing, etc.
 Differences with HPC clusters
 Clusters use higher performance processors & network
 Clusters emphasize thread-level parallelism, WSCs
emphasize request/task-level parallelism
 Differences with datacenters
 Datacenters consolidate different machines & software
into a single location
 Datacenters emphasize virtual machines & hardware
heterogeneity to serve varied customers 7
Design Factors for WSC
 Cost-performance
 Small savings add up
 Energy efficiency
 Affects power distribution & cooling
 Work per joule
 Operational costs count
 Power consumption is a primary constraint when
designing a system
 Dependability via redundancy
 Many low-cost components
8
Design Factors (Cont.)
 Network I/O
 Interactive & batch processing workloads
 Web search – interactive
 Web indexing – batch
 Ample computational parallelism isn’t important
 Most jobs are totally independent, “Request-level
parallelism”
 Scale – Its opportunities & problems
 Can afford to build customized systems as WSC
require volume purchase
 Frequent failures
9
Failure Example
 Consider a WSC with 50,000 nodes. MTTF of a node is 5
years. How many failures be there for a day?
MTTF in days = 5 x 365 = 1,825
Failure rate = 1/1,825 per day
No of failures per day = 50,000/1,825 = 27.4
 Consider a WSC with 50,000 nodes & each node with 4
hard disks. Suppose a annual failure rate of a disk is 4%.
What is the time for a disk failure?
No of disks = 50,000 x 4 = 200,000
No of failures per year = 200,000 x 0.04 = 8,000
Time for failure = 365 x 24 / 8,000 = 1.095 hours/failure 10
Programming Models & Workloads
 Batch processing framework
– MapReduce
 Map
 Applies a programmer-
supplied function to each
logical input record
 Runs on thousands of
computers
 Provides new set of (key,
value) pairs as intermediate
values
 Reduce
 Collapses values using
another function 11
http://www.cbsolution.net/techniques/ontarget/mapredu
ce_vs_data_warehouse
MapReduce Execution
12
Source: Dean et. al.,
“MapReduce, OSDI, 2004
Programming Models & Workloads
(Cont.)
13
www.datanami.com/datanami/2012-07-
16/top_5_challenges_for_hadoop_mapreduce
_in_the_enterprise.html
Programming Models & Workloads
(Cont.)
 MapReduce runtime environment schedules
map & reduce task to WSC nodes
 Availability
 Use replicas of data across different servers
 Use relaxed consistency
 No need for all replicas to always agree
 Workload demands
 Often vary considerably
14
Computer Architecture of WSC
 Often uses a hierarchy of networks for
interconnection
 Each 19” rack holds 48 1U servers connected to
a rack switch
 Rack switches are uplinked to a switch(es)
higher in hierarchy
 Uplink has 48/n times lower bandwidth –
Oversubscription
 n – No of uplink ports
 Goal is to maximize locality of communication relative
to the rack
15
Hierarchy of Switches
16
Network Hierarchy
17
Source: www.laserfocusworld.com/articles/print/volume-48/issue-12/features/optical-
technologies-scale-the-datacenter.html
Storage Hierarchy
18
Infrastructure & Costs
 Location
 Proximity to Internet backbones, electricity cost, property tax rates,
low risk from earthquakes, floods, & hurricanes
 Power distribution
19
Power Usage
20
U.S. EPA Report 2007 – 1.5% of total U.S.
power consumption used by data centers
which has more than doubled since 2000 &
costs $4.5 billion
How Many Nodes can a WSC Support?
 Each node
 “Nameplate power rating” gives maximum power
consumption
 To get actual, measure power under actual workloads
 Oversubscribe cumulative nodes power by 40%,
but monitor power closely
21
Cooling
22
Typically operate around 18 – 22 0C
Cooling (Cont.)
23
Cooling system also uses water (evaporation & spills)
e.g. 70,000 to 200,000 gallons per day for an 8 MW facility
Efficiency
 Power Utilization Effectiveness (PUE)
= Total facility power / IT equipment power
 ≥ 1
 Median PUE on 2006 study was 1.69
24
Source: http://hightech.lbl.gov/benchmarking-guides/data-a1.html
Performance
 Latency is important metric because it is seen by
users
 Bing study
 Users will use search less as response time
increases
 Service Level Objectives (SLOs) & Service Level
Agreements (SLAs)
 Typically given at application level
 e.g., 99% of requests be below 100 ms
 In clouds typically given only for static resources
 CPU speed, no of cores, & memory
25
Cost
 Capital expenditures (CAPEX)
 Cost to build a WSC
 Hardware cost dominates
 Operational expenditures (OPEX)
 Cost to operate a WSC
 Power for nodes & cooling dominates
26
Cloud Computing
27
Clients
Other
Cloud Services
Govt.
Cloud Services
Private
Cloud
Cloud
Manager
Public Cloud
Green Cloud Computing by Dr. Rajkumar Buyya
Cloud Computing (Cont.)
 WSCs offer economies of scale that can’t be
achieved with a datacenter
 5.7 times reduction in storage costs
 7.1 times reduction in administrative costs
 7.3 times reduction in networking costs
 This has given rise to cloud services such as Amazon
Web Services
 “Utility Computing”
 Based on using open source virtual machine & operating
system software
28
Amazon Web Services
 Virtual machines
 XEN
 Very low cost
 $ 0.10 per hour per instance
 Primary rely on open source software
 No (initial) service guarantees
 No contract required
 Amazon S3
 Simple Storage Service
 Amazon EC2
 Elastic Computer Cloud 29
Amazon Web Services – Example
30
http://www.ryhug.com/free-art-available-on-amazon-amazon-web-services-that-is/

More Related Content

PDF
CS8791 Cloud Computing - Question Bank
PPTX
Computer Organization
PDF
Principles of programming languages. Detail notes
PPTX
cloud computing, Principle and Paradigms: 1 introdution
PPT
Open source operating systems
PPTX
Formal Approaches to SQA.pptx
PPTX
Locks In Disributed Systems
CS8791 Cloud Computing - Question Bank
Computer Organization
Principles of programming languages. Detail notes
cloud computing, Principle and Paradigms: 1 introdution
Open source operating systems
Formal Approaches to SQA.pptx
Locks In Disributed Systems

What's hot (20)

PPTX
File system structure
PDF
Virtualization - An Introduction (Study Notes)
DOCX
Virtualize of IO Devices .docx
PPTX
Open Cloud Consortium Overview (01-10-10 V6)
PPT
Memory management
PPTX
System Programing Unit 1
PDF
Cloud Security, Standards and Applications
PDF
Introduction to Firmware
PPTX
Common Object Request Broker Architecture
PPTX
Assembly language
PPT
Scheduling in cloud
PPTX
Paging and Segmentation in Operating System
PPTX
Computer architecture virtual memory
PPTX
Networking in cloud computing
PPTX
The analysis synthesis model of compilation
PPTX
PPS
What is a Server
PPT
OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
File system structure
Virtualization - An Introduction (Study Notes)
Virtualize of IO Devices .docx
Open Cloud Consortium Overview (01-10-10 V6)
Memory management
System Programing Unit 1
Cloud Security, Standards and Applications
Introduction to Firmware
Common Object Request Broker Architecture
Assembly language
Scheduling in cloud
Paging and Segmentation in Operating System
Computer architecture virtual memory
Networking in cloud computing
The analysis synthesis model of compilation
What is a Server
OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
Ad

Similar to Introduction to Warehouse-Scale Computers (20)

PPTX
Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism...
PDF
Datacenter as computer
PPT
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
PDF
S00193ed1v01y200905cac006
PDF
An introduction to the Design of Warehouse-Scale Computers
PPTX
Cloud infrastructure, Virtualization tec
PDF
Cloud Computing Berkeley.pdf
PPTX
Cloud Computing
PPTX
CDP_2(1).pptx
PPTX
Warehouse scale computer
PPTX
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
PPT
AWS res 2024 key points for better research.ppt
PPT
L2-3.FA17 - Distributed Systems Fall 2017
PDF
Energy aware load balancing and application scaling for the cloud ecosystem
PDF
Scientific Computing in the Cloud
PPTX
Data Centers
PDF
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015
ODP
Cloud Computing ...changes everything
Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism...
Datacenter as computer
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
S00193ed1v01y200905cac006
An introduction to the Design of Warehouse-Scale Computers
Cloud infrastructure, Virtualization tec
Cloud Computing Berkeley.pdf
Cloud Computing
CDP_2(1).pptx
Warehouse scale computer
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
AWS res 2024 key points for better research.ppt
L2-3.FA17 - Distributed Systems Fall 2017
Energy aware load balancing and application scaling for the cloud ecosystem
Scientific Computing in the Cloud
Data Centers
Intro to SW Eng Principles for Cloud Computing - DNelson Apr2015
Cloud Computing ...changes everything
Ad

More from Dilum Bandara (20)

PPTX
Designing for Multiple Blockchains in Industry Ecosystems
PPTX
Introduction to Machine Learning
PPTX
Time Series Analysis and Forecasting in Practice
PPTX
Introduction to Dimension Reduction with PCA
PPTX
Introduction to Descriptive & Predictive Analytics
PPTX
Introduction to Concurrent Data Structures
PPTX
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
PPTX
Introduction to Map-Reduce Programming with Hadoop
PPTX
Embarrassingly/Delightfully Parallel Problems
PPTX
Introduction to Thread Level Parallelism
PPTX
CPU Memory Hierarchy and Caching Techniques
PPTX
Data-Level Parallelism in Microprocessors
PDF
Instruction Level Parallelism – Hardware Techniques
PPTX
Instruction Level Parallelism – Compiler Techniques
PPTX
CPU Pipelining and Hazards - An Introduction
PPTX
Advanced Computer Architecture – An Introduction
PPTX
High Performance Networking with Advanced TCP
PPTX
Introduction to Content Delivery Networks
PPTX
Peer-to-Peer Networking Systems and Streaming
PPTX
Mobile Services
Designing for Multiple Blockchains in Industry Ecosystems
Introduction to Machine Learning
Time Series Analysis and Forecasting in Practice
Introduction to Dimension Reduction with PCA
Introduction to Descriptive & Predictive Analytics
Introduction to Concurrent Data Structures
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Introduction to Map-Reduce Programming with Hadoop
Embarrassingly/Delightfully Parallel Problems
Introduction to Thread Level Parallelism
CPU Memory Hierarchy and Caching Techniques
Data-Level Parallelism in Microprocessors
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Compiler Techniques
CPU Pipelining and Hazards - An Introduction
Advanced Computer Architecture – An Introduction
High Performance Networking with Advanced TCP
Introduction to Content Delivery Networks
Peer-to-Peer Networking Systems and Streaming
Mobile Services

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
1. Introduction to Computer Programming.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Mushroom cultivation and it's methods.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Tartificialntelligence_presentation.pptx
PDF
August Patch Tuesday
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
1. Introduction to Computer Programming.pptx
OMC Textile Division Presentation 2021.pptx
A comparative study of natural language inference in Swahili using monolingua...
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
A Presentation on Artificial Intelligence
Mushroom cultivation and it's methods.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Spectroscopy.pptx food analysis technology
Tartificialntelligence_presentation.pptx
August Patch Tuesday

Introduction to Warehouse-Scale Computers

  • 1. Warehouse-Scale Computers CS4342 Advanced Computer Architecture Dilum Bandara [email protected] Slides adapted from “Computer Architecture, A Quantitative Approach” by John L. Hennessy and David A. Patterson, 5th Edition, 2012, MK Publishers and The Datacenter as a Computer:An Introduction to the Design of Warehouse-Scale Machines by Luiz André Barroso & Urs Hölzle
  • 2. Outline  Programming model & workloads  Architectures  Cloud computing 2
  • 3. Warehouse-Scale Computers (WSC) 3 www.laserfocusworld.com/articles/print/volume-48/issue- 12/features/optical-technologies-scale-the-datacenter.html http://www.slashgear.com/google-data-center-hd-photos- hit-where-the-internet-lives-gallery-17252451/
  • 7. Warehouse-Scale Computer (WSC)  Provides Internet services  Search, social networking, online maps, video sharing, online shopping, email, cloud computing, etc.  Differences with HPC clusters  Clusters use higher performance processors & network  Clusters emphasize thread-level parallelism, WSCs emphasize request/task-level parallelism  Differences with datacenters  Datacenters consolidate different machines & software into a single location  Datacenters emphasize virtual machines & hardware heterogeneity to serve varied customers 7
  • 8. Design Factors for WSC  Cost-performance  Small savings add up  Energy efficiency  Affects power distribution & cooling  Work per joule  Operational costs count  Power consumption is a primary constraint when designing a system  Dependability via redundancy  Many low-cost components 8
  • 9. Design Factors (Cont.)  Network I/O  Interactive & batch processing workloads  Web search – interactive  Web indexing – batch  Ample computational parallelism isn’t important  Most jobs are totally independent, “Request-level parallelism”  Scale – Its opportunities & problems  Can afford to build customized systems as WSC require volume purchase  Frequent failures 9
  • 10. Failure Example  Consider a WSC with 50,000 nodes. MTTF of a node is 5 years. How many failures be there for a day? MTTF in days = 5 x 365 = 1,825 Failure rate = 1/1,825 per day No of failures per day = 50,000/1,825 = 27.4  Consider a WSC with 50,000 nodes & each node with 4 hard disks. Suppose a annual failure rate of a disk is 4%. What is the time for a disk failure? No of disks = 50,000 x 4 = 200,000 No of failures per year = 200,000 x 0.04 = 8,000 Time for failure = 365 x 24 / 8,000 = 1.095 hours/failure 10
  • 11. Programming Models & Workloads  Batch processing framework – MapReduce  Map  Applies a programmer- supplied function to each logical input record  Runs on thousands of computers  Provides new set of (key, value) pairs as intermediate values  Reduce  Collapses values using another function 11 http://www.cbsolution.net/techniques/ontarget/mapredu ce_vs_data_warehouse
  • 12. MapReduce Execution 12 Source: Dean et. al., “MapReduce, OSDI, 2004
  • 13. Programming Models & Workloads (Cont.) 13 www.datanami.com/datanami/2012-07- 16/top_5_challenges_for_hadoop_mapreduce _in_the_enterprise.html
  • 14. Programming Models & Workloads (Cont.)  MapReduce runtime environment schedules map & reduce task to WSC nodes  Availability  Use replicas of data across different servers  Use relaxed consistency  No need for all replicas to always agree  Workload demands  Often vary considerably 14
  • 15. Computer Architecture of WSC  Often uses a hierarchy of networks for interconnection  Each 19” rack holds 48 1U servers connected to a rack switch  Rack switches are uplinked to a switch(es) higher in hierarchy  Uplink has 48/n times lower bandwidth – Oversubscription  n – No of uplink ports  Goal is to maximize locality of communication relative to the rack 15
  • 19. Infrastructure & Costs  Location  Proximity to Internet backbones, electricity cost, property tax rates, low risk from earthquakes, floods, & hurricanes  Power distribution 19
  • 20. Power Usage 20 U.S. EPA Report 2007 – 1.5% of total U.S. power consumption used by data centers which has more than doubled since 2000 & costs $4.5 billion
  • 21. How Many Nodes can a WSC Support?  Each node  “Nameplate power rating” gives maximum power consumption  To get actual, measure power under actual workloads  Oversubscribe cumulative nodes power by 40%, but monitor power closely 21
  • 23. Cooling (Cont.) 23 Cooling system also uses water (evaporation & spills) e.g. 70,000 to 200,000 gallons per day for an 8 MW facility
  • 24. Efficiency  Power Utilization Effectiveness (PUE) = Total facility power / IT equipment power  ≥ 1  Median PUE on 2006 study was 1.69 24 Source: http://hightech.lbl.gov/benchmarking-guides/data-a1.html
  • 25. Performance  Latency is important metric because it is seen by users  Bing study  Users will use search less as response time increases  Service Level Objectives (SLOs) & Service Level Agreements (SLAs)  Typically given at application level  e.g., 99% of requests be below 100 ms  In clouds typically given only for static resources  CPU speed, no of cores, & memory 25
  • 26. Cost  Capital expenditures (CAPEX)  Cost to build a WSC  Hardware cost dominates  Operational expenditures (OPEX)  Cost to operate a WSC  Power for nodes & cooling dominates 26
  • 27. Cloud Computing 27 Clients Other Cloud Services Govt. Cloud Services Private Cloud Cloud Manager Public Cloud Green Cloud Computing by Dr. Rajkumar Buyya
  • 28. Cloud Computing (Cont.)  WSCs offer economies of scale that can’t be achieved with a datacenter  5.7 times reduction in storage costs  7.1 times reduction in administrative costs  7.3 times reduction in networking costs  This has given rise to cloud services such as Amazon Web Services  “Utility Computing”  Based on using open source virtual machine & operating system software 28
  • 29. Amazon Web Services  Virtual machines  XEN  Very low cost  $ 0.10 per hour per instance  Primary rely on open source software  No (initial) service guarantees  No contract required  Amazon S3  Simple Storage Service  Amazon EC2  Elastic Computer Cloud 29
  • 30. Amazon Web Services – Example 30 http://www.ryhug.com/free-art-available-on-amazon-amazon-web-services-that-is/

Editor's Notes

  • #16: 1U - A rack unit (abbreviated U or RU) is a unit of measure defined as 44.50 mm (1.75 in)
  • #23: computer room air conditioning (CRAC)
  • #25: DCiE = 1/PUE
  • #30: S3 - Simple Storage Service EC2 - Elastic Compute Cloud