SlideShare a Scribd company logo
 Cluster ArchitectureWeb Search for a Planet and much more….Abhijeet Desaidesaiabhijeet89@gmail.com1
2
Google Query Serving InfrastructureElapsed time: 0.25s, machines involved: 1000s+3
PageRankPageRankTM is the core technology to measure the importance of a pageGoogle's theory	– If page A links to page B		# Page B is important		# The link text is irrelevant	– If many important links point to page A		# Links from page A are also important4
5
Key Design PrinciplesSoftware reliabilityUse replication for better request throughput and availability Price/performance beats peak performanceUsing commodity PCs reduces the cost of computation 6
The Power ProblemHigh density of machines (racks)	– High power consumption 400-700 W/ft2		# Typical data center provides 70-150 W/ft2		# Energy costs	– Heating		# Cooling system costsReducing power	– Reduce performance (c/p may not reduce!)	– Faster hardware depreciation (cost up!)7
ParallelismLookup of matching docs in a large index	--> many lookups in a set of smaller indexes followed by a merge stepA query stream	--> multiple streams		(each handled by a cluster)Adding machines to a pool increases serving capacity8
Hardware Level Consideration Instruction level parallelism does not helpMultiple simple, in-order, short-pipeline coreThread level parallelismMemory system with moderate sized L2 cache is enoughLarge shared-memory machines are not required to boost the performance9
GFS (Google File System) Design Master manages metadata
 Data transfers happen directly between clients/chunk servers
 Files broken into chunks (typically 64 MB)10
GFS Usage @ Google• 200+ clusters• Many clusters of 1000s of machines• Pools of 1000s of clients• 4+ PB Filesystems• 40 GB/s read/write load	– (in the presence of frequent HW failures)11
The Machinery12
Architectural view of the storage hierarchy13
Clusters through the years“Google” Circa 1997 (google.stanford.edu)Google (circa 1999)14
Clusters through the yearsGoogle (new data center 2001)Google Data Center (Circa 2000)3 days later15
Current Design• In-house rack design• PC-class motherboards• Low-end storage and networking hardware• Linux• + in-house software16
Container Datacenter 17
Container Datacenter 18
Multicore Computing19
Comparison Between Custom built & High-end Servers 20
Implications of the Computing EnvironmentStuff Breaks• If you have one server, it may stay up three years (1,000 days)• If you have 10,000 servers, expect to lose ten a day• “Ultra-reliable” hardware doesn’t really help• At large scale, super-fancy reliable hardware still fails, albeit less often	– software still needs to be fault-tolerant	– commodity machines without fancy hardware give better performance/$• Reliability has to come from the software• Making it easier to write distributed programs21
Infrastructure for Search SystemsSeveral key pieces of infrastructure:	– GFS	– MapReduce	– BigTable22
MapReduce• A simple programming model that applies to many large-scale computing problems• Hide messy details in MapReduce runtime library:	– automatic parallelization	– load balancing	– network and disk transfer             optimizations	– handling of machine failures	– robustness	– improvements to core library benefit all users of library!23
Typical problem solved by MapReduce• Read a lot of data• Map: extract something you care about from each record• Shuffle and Sort• Reduce: aggregate, summarize, filter, or transform• Write the results• Outline stays the same, map and reduce change to fit the problem24

More Related Content

PDF
Google jeff dean lessons learned while building infrastructure software at go...
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PDF
An introduction to Rust: the modern programming language to develop safe and ...
PPTX
MongoDB presentation
PPTX
Bash shell scripting
PDF
Mongo db dhruba
Google jeff dean lessons learned while building infrastructure software at go...
Apache Spark in Depth: Core Concepts, Architecture & Internals
An introduction to Rust: the modern programming language to develop safe and ...
MongoDB presentation
Bash shell scripting
Mongo db dhruba

What's hot (20)

PPTX
The Basics of MongoDB
PPT
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
PDF
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
PPTX
Apache Spark Architecture
PDF
MongoDB
PPT
Troubleshooting Linux Kernel Modules And Device Drivers
PDF
Introduction to Spark Internals
PPTX
Introduction to Apache Spark Developer Training
PPTX
Gnome and kde
PPTX
Apache Spark overview
PDF
Lesson 2 Understanding Linux File System
PDF
Monitoring MySQL with DTrace/SystemTap
PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
PPTX
MongoDB
PDF
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
PDF
Hadoop & MapReduce
PPTX
Introduction to Apache Spark
PDF
Introduction to PySpark
PDF
Apache Hadoop 3
PPTX
CockroachDB
The Basics of MongoDB
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Apache Spark Architecture
MongoDB
Troubleshooting Linux Kernel Modules And Device Drivers
Introduction to Spark Internals
Introduction to Apache Spark Developer Training
Gnome and kde
Apache Spark overview
Lesson 2 Understanding Linux File System
Monitoring MySQL with DTrace/SystemTap
Stream processing with Apache Flink (Timo Walther - Ververica)
MongoDB
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Hadoop & MapReduce
Introduction to Apache Spark
Introduction to PySpark
Apache Hadoop 3
CockroachDB
Ad

Viewers also liked (20)

PDF
Google Architecture - Breaking it Open
PPT
The Anatomy Of The Google Architecture Fina Lv1.1
PDF
facebook architecture for 600M users
PPTX
Cluster computing
PPTX
Google history nd architecture
PPTX
Cluster computing pptl (2)
PPT
Cluster Computing
PPTX
Cluster computer
PPTX
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
PPT
Cluster Computers
PPTX
GOOGLE FILE SYSTEM
PPTX
Facebook architecture presentation: scalability challenge
PDF
Facebook Architecture - Breaking it Open
DOC
PDF
It 4-yr-1-sem-digital image processing
PDF
Digital image processing unit 1
DOC
Dip Unit Test-I
PPT
Google Architecture - Breaking it Open
The Anatomy Of The Google Architecture Fina Lv1.1
facebook architecture for 600M users
Cluster computing
Google history nd architecture
Cluster computing pptl (2)
Cluster Computing
Cluster computer
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
Cluster Computers
GOOGLE FILE SYSTEM
Facebook architecture presentation: scalability challenge
Facebook Architecture - Breaking it Open
It 4-yr-1-sem-digital image processing
Digital image processing unit 1
Dip Unit Test-I
Ad

Similar to Google cluster architecture (20)

PPT
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
PPT
Google Cloud Computing on Google Developer 2008 Day
PPTX
Google data centers
PPTX
54665962-Nav-Cluster-Computing.pptx
PPTX
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
PPTX
CDP_2(1).pptx
PDF
Cloud computing overview
PDF
The Rise of Cloud Computing Systems
PPTX
Architecting Cloudy Applications
PPT
云计算及其应用
PPT
Cloud computing skepticism - But i'm sure
PPTX
The elephantintheroom bigdataanalyticsinthecloud
PPTX
Taking High Performance Computing to the Cloud: Windows HPC and
PPT
Computing Outside The Box September 2009
PDF
Introduction to Cloud Computing and Big Data
PPTX
Data Center of the Future v1.0.pptx
PDF
node.js on Google Compute Engine
PDF
PDF
Designing Scalable Applications
PDF
Webinar: SQL for Machine Data?
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
Google Cloud Computing on Google Developer 2008 Day
Google data centers
54665962-Nav-Cluster-Computing.pptx
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
CDP_2(1).pptx
Cloud computing overview
The Rise of Cloud Computing Systems
Architecting Cloudy Applications
云计算及其应用
Cloud computing skepticism - But i'm sure
The elephantintheroom bigdataanalyticsinthecloud
Taking High Performance Computing to the Cloud: Windows HPC and
Computing Outside The Box September 2009
Introduction to Cloud Computing and Big Data
Data Center of the Future v1.0.pptx
node.js on Google Compute Engine
Designing Scalable Applications
Webinar: SQL for Machine Data?

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
August Patch Tuesday
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Spectroscopy.pptx food analysis technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Getting Started with Data Integration: FME Form 101
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
1. Introduction to Computer Programming.pptx
August Patch Tuesday
A comparative analysis of optical character recognition models for extracting...
cloud_computing_Infrastucture_as_cloud_p
Machine learning based COVID-19 study performance prediction
Spectroscopy.pptx food analysis technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
OMC Textile Division Presentation 2021.pptx
Empathic Computing: Creating Shared Understanding
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25-Week II

Google cluster architecture

  • 1. Cluster ArchitectureWeb Search for a Planet and much more….Abhijeet [email protected]
  • 2. 2
  • 3. Google Query Serving InfrastructureElapsed time: 0.25s, machines involved: 1000s+3
  • 4. PageRankPageRankTM is the core technology to measure the importance of a pageGoogle's theory – If page A links to page B # Page B is important # The link text is irrelevant – If many important links point to page A # Links from page A are also important4
  • 5. 5
  • 6. Key Design PrinciplesSoftware reliabilityUse replication for better request throughput and availability Price/performance beats peak performanceUsing commodity PCs reduces the cost of computation 6
  • 7. The Power ProblemHigh density of machines (racks) – High power consumption 400-700 W/ft2 # Typical data center provides 70-150 W/ft2 # Energy costs – Heating # Cooling system costsReducing power – Reduce performance (c/p may not reduce!) – Faster hardware depreciation (cost up!)7
  • 8. ParallelismLookup of matching docs in a large index --> many lookups in a set of smaller indexes followed by a merge stepA query stream --> multiple streams (each handled by a cluster)Adding machines to a pool increases serving capacity8
  • 9. Hardware Level Consideration Instruction level parallelism does not helpMultiple simple, in-order, short-pipeline coreThread level parallelismMemory system with moderate sized L2 cache is enoughLarge shared-memory machines are not required to boost the performance9
  • 10. GFS (Google File System) Design Master manages metadata
  • 11. Data transfers happen directly between clients/chunk servers
  • 12. Files broken into chunks (typically 64 MB)10
  • 13. GFS Usage @ Google• 200+ clusters• Many clusters of 1000s of machines• Pools of 1000s of clients• 4+ PB Filesystems• 40 GB/s read/write load – (in the presence of frequent HW failures)11
  • 15. Architectural view of the storage hierarchy13
  • 16. Clusters through the years“Google” Circa 1997 (google.stanford.edu)Google (circa 1999)14
  • 17. Clusters through the yearsGoogle (new data center 2001)Google Data Center (Circa 2000)3 days later15
  • 18. Current Design• In-house rack design• PC-class motherboards• Low-end storage and networking hardware• Linux• + in-house software16
  • 22. Comparison Between Custom built & High-end Servers 20
  • 23. Implications of the Computing EnvironmentStuff Breaks• If you have one server, it may stay up three years (1,000 days)• If you have 10,000 servers, expect to lose ten a day• “Ultra-reliable” hardware doesn’t really help• At large scale, super-fancy reliable hardware still fails, albeit less often – software still needs to be fault-tolerant – commodity machines without fancy hardware give better performance/$• Reliability has to come from the software• Making it easier to write distributed programs21
  • 24. Infrastructure for Search SystemsSeveral key pieces of infrastructure: – GFS – MapReduce – BigTable22
  • 25. MapReduce• A simple programming model that applies to many large-scale computing problems• Hide messy details in MapReduce runtime library: – automatic parallelization – load balancing – network and disk transfer optimizations – handling of machine failures – robustness – improvements to core library benefit all users of library!23
  • 26. Typical problem solved by MapReduce• Read a lot of data• Map: extract something you care about from each record• Shuffle and Sort• Reduce: aggregate, summarize, filter, or transform• Write the results• Outline stays the same, map and reduce change to fit the problem24
  • 27. ConclusionsFor a large scale web service system like Google – Design the algorithm which can be easily parallelized – Design the architecture using replication to achieve distributed computing/storage and fault tolerance – Be aware of the power problem which significantly restricts the use of parallelism25
  • 28. References1. Luiz André Barroso , Jeffrey Dean , UrsHölzle, Web Search for a Planet: The Google Cluster Architecture, IEEE Micro, v.23 n.2, p.22-28, March 2003 [doi>10.1109/MM.2003.1196112]2. S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Proc. Seventh World Wide Web Conf. (WWW7), International World Wide Web Conference Committee (IW3C2), 1998, pp. 107-117.3. “TPC Benchmark C Full Disclosure Report for IBM eserverxSeries 440 using Microsoft SQL Server 2000 Enterprise Edition and Microsoft Windows .NET Datacenter Server 2003, TPC-C Version 5.0,” http://www.tpc.org/results/FDR/TPCC/ibm.x4408way.c5.fdr.02110801.pdf.4. D. Marr et al., “Hyper-Threading Technology Architecture and Microarchitecture: A Hypertext History,” Intel Technology J., vol. 6, issue 1, Feb. 2002.5. L. Hammond, B. Nayfeh, and K. Olukotun, “A Single-Chip Multiprocessor,” Computer, vol. 30, no. 9, Sept. 1997, pp. 79-85. 6. L.A. Barroso et al., “Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,” Proc. 27th ACM Int’l Symp. Computer Architecture, ACM Press, 2000, pp. 282-293.7. L.A. Barroso, K. Gharachorloo, and E. Bugnion, “Memory System Characterization of Commercial Workloads,” Proc. 25th ACM Int’l Symp. Computer Architecture, ACM Press, 1998, pp. 3-14. 26