SlideShare a Scribd company logo
Microsoft SQL Server Data
Warehouses for SQL DBAs

SQL Saturday Philly June 9, 2012
http://mssqldude.wordpress.com


http://www.sqlmag.com/blog/sql-server-bi-blog-17

mkromer@microsoft.com




http://joedantoni.wordpress.com

jdanton1@yahoo.com
Agenda
•
•
    −
    −
•
    −
    −
•
    −
•
•
    −
    −
Microsoft Data Warehousing
Offerings
                                                         Tier 1 Offerings
                                      Fast Track Data                    HP Business DW                          Parallel Data
      Enterprise
                                        Warehouse                           Appliance                             Warehouse
                                                                                                             Appliance for high end Data
 Scalable and reliable platform     Reference Architectures offering    An affordable SMP solution for
                                                                                                            Warehousing requiring highest
 for Data Warehousing on any        best price performance for Data     data warehousing on optimized
                                                                                                             scalability, performance or
           hardware                          Warehousing                          hardware
                                                                                                                     complexity

 Ideal for data marts or small to    Ideal for data marts or small to
                                                                        Ideal for small data marts or DWs   Offers flexibility in hardware and
    mid-sized enterprise data       mid-sized DWs with scan centric
                                                                           with scan centric workloads                 architecture
       warehouses (EDWs)                        workloads

                                                                                                                      DW Appliance
                                       Reference Architectures                Integrated Appliance
         Software only                                                                                       (Fully integrated Software and
                                       (Software and Hardware)              (Software and Hardware)
                                                                                                                        Hardware)

                                                                                                               Scale out data warehousing
  Scale up data warehousing           Scale up data warehousing            Scale up data warehousing            with massively parallel
                                                                                                                   processing (MPP)

        10s of terabytes                     4–80 terabytes                     Up to 5 terabytes                10s–100s of terabytes
Some Data Warehouses today

Big SAN
Big SMP Server
Connected together




       What’s wrong with this picture?
Answer: system out of balance

   This server can consume 12 GB/Sec of IO, but the
    SAN can only deliver 2 GB/Sec
       Even when the SAN is dedicated to the SQL Data
        Warehouse, which it often isn’t
   Queries are slow
       Despite significant investment in both Server and Storage




Result: significant investment, not delivering performance
Microsoft SQL Server Data Warehouses for SQL Server DBAs
The Alternative: A Balanced System

   Design a server + storage configuration that can
    deliver all the IO bandwidth that CPUs can
    consume when executing a SQL Relational DW
    workload
   Avoid sharing storage devices among servers
   Avoid overinvesting in disk drives
Microsoft SQL Server Data Warehouses for SQL Server DBAs
SQL Server Fast Track Data Warehouse
Solution to help customers and partners
accelerate their data warehouse deployments

   A method for designing a cost-effective,
    balanced system for Data Warehouse
    workloads
   Reference hardware configurations
    developed in conjunction with hardware
    partners using this method
   Best practices for data layout, loading and
    management
Software:
  • SQL Server 2008 R2
     Enterprise
  • Windows Server 2008 R2

Configuration guidelines:
  • Physical table structures
  • Indexes
  • Compression
  • SQL Server settings
  • Windows Server settings
  • Loading

Hardware:
  • Tight specifications for servers,
    storage and networking
  • ‘Per core’ building block
Core Fast Track Metrics

•
    −
        −

    −
        −
System Benchmarking - MCR

•

    −
    −
•
    −

•
    − 200MB/s per core
Establishing Fast Track MCR

•

    −
    −
•

    −
System Benchmarking - BCR

•


    −
    −
•                Actual Miles Per Gallon

•
Establishing Fast Track BCR

•
    −
        −

        −
        −
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Fast Track Reference Configurations

2 Processor Configurations (5 – 20 TB, 2-3.7 GB/s)
  
  
  
  


4 Processor Configurations (20 – 40 TB, 3.5-7.5 GB/s)
  
  
  
  


8 processor Configurations (40 – 80 TB, 7.5-14 GB/s)
  
Data Warehouse Workload Characteristics


SELECT    L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,
          SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE,
          SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE,
          SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX))
                      AS SUM_CHARGE,
          AVG(L_QUANTITY) AS AVG_QTY,
          AVG(L_EXTENDEDPRICE) AS AVG_PRICE,
          AVG(L_DISCOUNT) AS AVG_DISC,
          COUNT(*) AS COUNT_ORDER
     FROM LINEITEM
     GROUP BY L_RETURNFLAG,
                      L_LINESTATUS
     ORDER BY L_RETURNFLAG,
                L_LINESTATUS
Software configuration
SQL Server Startup
•
    −
•
Software configuration
Temp DB
•
    −
        −
•
    −
•
•
    −
    −
Software configuration
Temp DB & TLOG
•
    −
        −
    −
    −
•
    −

    −

    −
•
    −
    −
DW Server Baseline Configs

•
    −
        −
        −
        −
        −
•
    −
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Fast Track Data Striping

  •


                      FT Storage Enclosure
 Raid-1
                                Primary Data                             Log


             ARY01D1v01   ARY02D1v03        ARY03D1v05     ARY04D1v07   ARY05v09



              DB1-1.ndf                     DB1-5.ndf       DB1-7.ndf   DB1.ldf
                          DB1-3.ndf




Disk 1 & 2   ARY01D2v02   ARY02D2v04        ARY03D2v06     ARY04D2v08



             DB1-2.ndf    DB1-4.ndf         DB1-6.ndf       DB1-8.ndf




                                  Microsoft Confidential
User Databases

•

    −
    −
    −
•
•

•
    −
Transaction Log


•

•
•
LUN 1                LUN 2                  LUN 3                               LUN16


                                                          Permanent FG
  Permanant_DB




                 Permanent_1.ndf     Permanent_2.ndf        Permanent_3.ndf                    Permanent_16.ndf




                                                           Stage FG
Database
 Stage




                  Stage_1.ndf          Stage_2.ndf          Stage_3.ndf                         Stage_16.ndf
                 Local Drive 1
  TempDB




                 TempDB.mdf (25GB) TempDB_02.ndf (25GB)    TempDB_03ndf (25GB)            TempDB_16.ndf (25GB)



                                                                                   Log LUN 1

                                                                                 Permanent DB
                                                                                     Log
                                                                                 Stage DB Log
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Control rack                                                                      Data racks

 Control Rack                                          Data Rack



                                            Compute Nodes                           Storage Nodes


  Control Nodes                                          SQL

  Active / Passive
                                                         SQL

                   SQL                                   SQL


                                                         SQL

Management Nodes




                                                               Dual Fiber Channel
                                                         SQL




                         Dual Infiniband
                                                         SQL


                                                         SQL
   Landing Node
                                                         SQL


                                                         SQL

  Backup Node
                                                         SQL

                                           Spare Compute Node

 Private Network
1 Data Rack

• 17 Servers
• 22 Procs
• 132 Cores




                   Control Rack     DataRack




                 Expand to 4 data racks and
           quadruple your performance and capacity!
Query Speed in Seconds
                       PDW Time       Orig. Time

4500     4200
4000
3500
3000
2500
2000
1500                1200                                    1200
1000
 500   16          6         2 120      2 120      2 120   4
   0
        Q1        Q2         Q3         Q4         Q5      Q6
       263x       200x        60x         60x       60x    300x
          PDW times faster than original query speeds
Parallel Data Warehouse Appliance
    Hardware Architecture
                                                                 Compute Nodes                              Storage Nodes


                        Control Nodes                                            SQL


                        Active/Passive
                                                                                 SQL


                                                                                 SQL
   Client Drivers                        SQL


                                                                                 SQL



                      Management Nodes                                           SQL




                                                                                       Dual Fiber Channel
   Data Center                                 Dual Infiniband                   SQL
   Monitoring
                                                                                 SQL


                        Landing Node                                             SQL

 ETL Load Interface
                                                                                 SQL



                        Backup Node                                              SQL

 Corporate Backup
     Solution
                                                                 Spare Compute Node


Corporate Network     Private Network
Parallel Data Warehouse benefits
   Massively Parallel Processing
                                                                   Compute Nodes                              Storage Nodes


                      Control Nodes                            ?                   SQL


                      Active/Passive                                                                                          Query 1 is
      Query 1
                                                               ?                   SQL
                                                                                                                              submitted to
                                                                                                                              SQL Server
                                       SQL                     ?                   SQL
                                                                                                                              on Control
                                                                                                                              Node
                                                               ?                   SQL



                    Management Nodes                           ?                   SQL




                                                                                         Dual Fiber Channel
                                                                                                                              Query is
                                             Dual Infiniband   ?                   SQL
                                                                                                                              executed on
                                                                                                                              all 10 Nodes
                                                               ?                   SQL


                      Landing Node
                                                               ?                   SQL                                        Results are
                                                                                                                              sent back to
                                                               ?                   SQL                                        client
                      Backup Node                              ?                   SQL



                                                                   Spare Compute Node


Corporate Network   Private Network
Parallel Data Warehouse benefits
   Massively Parallel Processing
                                                               Compute Nodes                              Storage Nodes


                      Control Nodes                                                                                       Multiple
                                                               ????????        SQL

                                                                                                                          queries are
            ?         Active/Passive
                                                               ????????        SQL
                                                                                                                          simultane-
    ?                    ????          SQL
                                                               ????????        SQL
                                                                                                                          ously
                         ???
                                                                                                                          executed
            ?               ?                                                                                             across all
                                                               ????????        SQL

                                                                                                                          nodes.
    ?               Management Nodes                           ????????        SQL




                                                                                     Dual Fiber Channel
                                             Dual Infiniband   ????????        SQL



        ?                                                      ????????        SQL
                                                                                                                          PDW
                                                                                                                          supports
                ?     Landing Node                             ????????        SQL                                        querying
                                                               ????????                                                   while
                                                                               SQL                                        data is
    ?                                                          ????????                                                   loading.
                ?     Backup Node                                              SQL



                                                               Spare Compute Node
       Blazing fast performance by parallelizing queries on highly optimized
Corporate Network    Private Network
                                 shared nothing nodes
•



•




•

    −

    −
MPP Engine Coordinator

Software Architecture                                               Provides single system image
                                                                    SQL compilation
                                                                    Global metadata and appliance configuration
                                                                    Global query optimization and plan generation
                                                                    Global query execution coordination
                                        Other                       Global transaction coordination
Query       MS BI                                    Internet       Authentication and authorization
                          DWSQL         Third-       Explorer
Tool       (AS, RS)                                                 Supportability (hardware and software status)
                                      Party Tools

                                                                         Compute Node
                                                                           Compute Nodes
                                                                             Compute Nodes
                                                       IIS                  Data Movement Service
              Data Access                            Admin
    (OLEDB, ODBC, ADO.NET, JDBC)
                                                     Console
                                                                                 User Data
                                                                                               SQL Server


                     Core
     SQL                          DMS
                    Engine
    Parser                       Manager              Data               Backup Node
                   Services
                                                    Movement
        MPP Engine Coordinator                       Service                Data Movement Service


                                                                         Landing Zone Node

       DW                  DW             DW                                Data Movement Service
                                                      TempDB
  Authentication      Configuration     Schema

                                                       SQL Server
                                                                               Data Movement Service
Control Node                                                            Data movement across the appliance
                                                                        Distributed query execution operators
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Blazing-Fast Performance



“400 percent
improvement in
performance
                            First American Title
                           Insurance Company




                                             Now, up to                10xFaster³
                                                                       ColumnStore
¹Source: Microsoft customer evidence, Choice Hotels International
²Source: Microsoft customer evidence, KAS Bank
³Source: Microsoft customer testing; common data warehousing queries
ProductKey              SalesAmount

           OrderDateKey
                                          OrderDateKey   ProductKey   SalesAmount
                                          20101107       106          30.00
                                          20101107
StoreKey    RegionKey   Quantity                         103
                                                                      17.00
                                          20101107
01          1           6                                109
                                          20101107                    20.00
            2           1                                103
04                                        20101107
                                                                      17.00
            2           2                                106
04                                        20101108
            2                                                         20.00
                        1                                106
03
            3                             OrderDateKey                25.00
                        4
05          1                             20101108       ProductKey
                        5
02                                        20101108                    SalesAmount
                                                         102
            RegionKey
                        Quantity          20101108
                                                         106          14.00
StoreKey    1
                        1                 20101109
                                                         109          25.00
02          2
                        5                 20101109
            1
                                                         106          10.00
03                                        20101109
                        1                                106
01          2                                                         20.00
                        4                                103
            2
04                                                                    25.00
            1
                        5
04                      1
                                                                      17.00
01
41




•                                 Batch object
•
                                  Column vectors
•




        List of qualifying rows
    −
    −
•
Microsoft SQL Server Data Warehouses for SQL Server DBAs
In a standard scale-out server deployment, multiple report servers share a single
report server database. The report server database should be installed on a
remote SQL Server instance. The following diagram is an example of a standard
scale-out server deployment configuration with the report server database on a
remote SQL Server instance.
As another option, you might decide to host the report server database on a
SQL Server instance that is part of a failover cluster. The following diagram is
an example of a scale-out server deployment configuration where the report
server databases are on an instance that is part of a failover cluster.
In addition to the standard scale-out deployment, you might determine that your reporting environment
would benefit from a more advanced scale-out deployment configuration. For example, you might decide
to use the load-balanced report servers for interactive report processing and add a separate report server
computer to process only scheduled reports. The following diagram is an example of this advanced scale-
out server deployment configuration.
Log                               Description

                                  The report server execution log contains data about specific reports, including when a report was run,
Report Server Execution Log       who ran it, where it was delivered, and which rendering format was used.
                                  The execution log is stored in the report server database.


                                  The service trace log contains very detailed information that is useful if you are debugging an
Report Server Service Trace Log   application or investigating an issue or event. The file is located at Microsoft SQL Server<SQL Server
                                  Instance>Reporting ServicesLogFiles.


                                  The HTTP log file contains a record of all HTTP requests and responses handled by the Report Server
                                  Web service and Report Manager. HTTP logging is not enabled by default. You must modify the
Report Server HTTP Log
                                  ReportingServicesService.exe configuration file to use this feature in your installation. The file is
                                  located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles.
Microsoft SQL Server Data Warehouses for SQL Server DBAs
•
    −



•
•
•
•



•
•
    −
    −

        −
        −
        −
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

•
•
•

•
Under the properties of your data source, increasing the network packet size for SQL
Server minimizes the protocol overhead require to build many, small packages. The
default value for SQL Server 2008 is 4096. With a data warehouse load, a packet size of
32K (in SQL Server, this means assigning the value 32767) can benefit processing. Don’t
change the value in SQL Server using sp_configure; instead override it in your data source.
This can be set whether you are using TCP/IP or Shared Memory.
Microsoft SQL Server Data Warehouses for SQL Server DBAs
•
•
•
•
•
•

•

•

•

•

•

•

•

•
•
•
    −
•
    −

    −

    −


•
•
•
•
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAs
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
                 it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
                                       MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related Content

PPTX
Intro to Exadata
PDF
Connecting Hadoop and Oracle
PPS
Oracle Database Overview
PDF
Architecture of exadata database machine – Part II
PDF
Oracle Database appliance - Value proposition Webcast
PPTX
Oracle Database Appliance
PDF
My First 100 days with an Exadata (PPT)
PDF
A Journey from Oracle to PostgreSQL
 
Intro to Exadata
Connecting Hadoop and Oracle
Oracle Database Overview
Architecture of exadata database machine – Part II
Oracle Database appliance - Value proposition Webcast
Oracle Database Appliance
My First 100 days with an Exadata (PPT)
A Journey from Oracle to PostgreSQL
 

What's hot (20)

PDF
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
PPTX
Hadoop databases for oracle DBAs
PPTX
Exadata 12c New Features RMOUG
PPTX
Debunking the Myths of HDFS Erasure Coding Performance
PPTX
Experience sql server on l inux and docker
PDF
Oracle GoldenGate for Oracle DBAs
PDF
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
PPTX
Introduction to Apache Accumulo
PPTX
Simplify Consolidation with Oracle Database 12c
PPT
Teradata vs-exadata
PDF
My First 100 days with a MySQL DBMS
PDF
Oracle database high availability solutions
PDF
PayPal Big Data and MySQL Cluster
PDF
My First 100 days with a MySQL DBMS (WP)
PDF
Rapid Cluster Computing with Apache Spark 2016
PDF
Running E-Business Suite Database on Oracle Database Appliance
PPTX
Oracle Goldengate training by Vipin Mishra
ODP
Exadata
PDF
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
PDF
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...
 
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Hadoop databases for oracle DBAs
Exadata 12c New Features RMOUG
Debunking the Myths of HDFS Erasure Coding Performance
Experience sql server on l inux and docker
Oracle GoldenGate for Oracle DBAs
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
Introduction to Apache Accumulo
Simplify Consolidation with Oracle Database 12c
Teradata vs-exadata
My First 100 days with a MySQL DBMS
Oracle database high availability solutions
PayPal Big Data and MySQL Cluster
My First 100 days with a MySQL DBMS (WP)
Rapid Cluster Computing with Apache Spark 2016
Running E-Business Suite Database on Oracle Database Appliance
Oracle Goldengate training by Vipin Mishra
Exadata
Best Practices – Extreme Performance with Data Warehousing on Oracle Database
Overview of EnterpriseDB Postgres Plus Advanced Server 9.4 and Postgres Enter...
 
Ad

Viewers also liked (20)

PDF
Building Data Warehouse in SQL Server
PPTX
PSSUG Nov 2012: Big Data with SQL Server
PPTX
Big Data in the Cloud with Azure Marketplace Images
PPTX
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
DOCX
MEC Data sheet
PPTX
What's new in SQL Server 2012 for philly code camp 2012.1
PPTX
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
PPTX
Microsoft Event Registration System Hosted on Windows Azure
PDF
Sql server 2012 tutorials reporting services
PPTX
Big Data with SQL Server
PPTX
Pentaho Big Data Analytics with Vertica and Hadoop
PDF
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
PDF
Adventures with Angular 2
PPTX
Anexinet Big Data Solutions
PPTX
Big Data in the Real World
PPTX
Pentaho Analytics on MongoDB
PPTX
Big Data Analytics Projects - Real World with Pentaho
PPTX
Sql server 2012 roadshow masd overview 003
PPT
SQL Server Transaction Management
PPTX
Azure vs. amazon
Building Data Warehouse in SQL Server
PSSUG Nov 2012: Big Data with SQL Server
Big Data in the Cloud with Azure Marketplace Images
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
MEC Data sheet
What's new in SQL Server 2012 for philly code camp 2012.1
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Microsoft Event Registration System Hosted on Windows Azure
Sql server 2012 tutorials reporting services
Big Data with SQL Server
Pentaho Big Data Analytics with Vertica and Hadoop
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Adventures with Angular 2
Anexinet Big Data Solutions
Big Data in the Real World
Pentaho Analytics on MongoDB
Big Data Analytics Projects - Real World with Pentaho
Sql server 2012 roadshow masd overview 003
SQL Server Transaction Management
Azure vs. amazon
Ad

Similar to Microsoft SQL Server Data Warehouses for SQL Server DBAs (20)

PPTX
HP Microsoft SQL Server Data Management Solutions
PPTX
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
PDF
SQL Server 2008 Fast Track Data Warehouse
PDF
User Group Bi
PDF
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
PPTX
Sql Server 2008 Performance and Scaleability
PDF
SQL Server 2008 R2 Parallel Data Warehouse
PDF
Tools for developing and monitoring SQL in DB2 for z/OS
PPT
Cs753 2a
PDF
Oow 2008 yahoo_pie-db
PPTX
From the Big Data keynote at InCSIghts 2012
PDF
BI Forum 2009 - Principy architektury MPP datového skladu
PDF
SQL Server User Group 02/2009
PPTX
Oracle: Dw Design
PPTX
Oracle: DW Design
PDF
SQL Server 2008 Migration Workshop 04/29/2009
PDF
SQL Server Workshop Paul Bertucci
PDF
An overview of Microsoft data mining technology
PPT
Tivoli Storage Productivity Center... What’s new in v4.2.2?
PDF
The fillmore-group-aese-presentation-111810
HP Microsoft SQL Server Data Management Solutions
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
SQL Server 2008 Fast Track Data Warehouse
User Group Bi
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Sql Server 2008 Performance and Scaleability
SQL Server 2008 R2 Parallel Data Warehouse
Tools for developing and monitoring SQL in DB2 for z/OS
Cs753 2a
Oow 2008 yahoo_pie-db
From the Big Data keynote at InCSIghts 2012
BI Forum 2009 - Principy architektury MPP datového skladu
SQL Server User Group 02/2009
Oracle: Dw Design
Oracle: DW Design
SQL Server 2008 Migration Workshop 04/29/2009
SQL Server Workshop Paul Bertucci
An overview of Microsoft data mining technology
Tivoli Storage Productivity Center... What’s new in v4.2.2?
The fillmore-group-aese-presentation-111810

More from Mark Kromer (20)

PPTX
Fabric Data Factory Pipeline Copy Perf Tips.pptx
PPTX
Build data quality rules and data cleansing into your data pipelines
PPTX
Mapping Data Flows Training deck Q1 CY22
PPTX
Data cleansing and prep with synapse data flows
PPTX
Data cleansing and data prep with synapse data flows
PPTX
Mapping Data Flows Training April 2021
PPTX
Mapping Data Flows Perf Tuning April 2021
PPTX
Data Lake ETL in the Cloud with ADF
PPTX
Azure Data Factory Data Wrangling with Power Query
PPTX
Azure Data Factory Data Flow Performance Tuning 101
PPTX
Data Quality Patterns in the Cloud with ADF
PPTX
Azure Data Factory Data Flows Training (Sept 2020 Update)
PPTX
Data quality patterns in the cloud with ADF
PPTX
Azure Data Factory Data Flows Training v005
PPTX
Data Quality Patterns in the Cloud with Azure Data Factory
PPTX
ADF Mapping Data Flows Level 300
PPTX
ADF Mapping Data Flows Training V2
PPTX
ADF Mapping Data Flows Training Slides V1
PDF
ADF Mapping Data Flow Private Preview Migration
PPTX
Azure Data Factory ETL Patterns in the Cloud
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Build data quality rules and data cleansing into your data pipelines
Mapping Data Flows Training deck Q1 CY22
Data cleansing and prep with synapse data flows
Data cleansing and data prep with synapse data flows
Mapping Data Flows Training April 2021
Mapping Data Flows Perf Tuning April 2021
Data Lake ETL in the Cloud with ADF
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Flow Performance Tuning 101
Data Quality Patterns in the Cloud with ADF
Azure Data Factory Data Flows Training (Sept 2020 Update)
Data quality patterns in the cloud with ADF
Azure Data Factory Data Flows Training v005
Data Quality Patterns in the Cloud with Azure Data Factory
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flow Private Preview Migration
Azure Data Factory ETL Patterns in the Cloud

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Empathic Computing: Creating Shared Understanding
PDF
Approach and Philosophy of On baking technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
Empathic Computing: Creating Shared Understanding
Approach and Philosophy of On baking technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
KodekX | Application Modernization Development

Microsoft SQL Server Data Warehouses for SQL Server DBAs

  • 1. Microsoft SQL Server Data Warehouses for SQL DBAs SQL Saturday Philly June 9, 2012
  • 3. Agenda • • − − • − − • − • • − −
  • 4. Microsoft Data Warehousing Offerings Tier 1 Offerings Fast Track Data HP Business DW Parallel Data Enterprise Warehouse Appliance Warehouse Appliance for high end Data Scalable and reliable platform Reference Architectures offering An affordable SMP solution for Warehousing requiring highest for Data Warehousing on any best price performance for Data data warehousing on optimized scalability, performance or hardware Warehousing hardware complexity Ideal for data marts or small to Ideal for data marts or small to Ideal for small data marts or DWs Offers flexibility in hardware and mid-sized enterprise data mid-sized DWs with scan centric with scan centric workloads architecture warehouses (EDWs) workloads DW Appliance Reference Architectures Integrated Appliance Software only (Fully integrated Software and (Software and Hardware) (Software and Hardware) Hardware) Scale out data warehousing Scale up data warehousing Scale up data warehousing Scale up data warehousing with massively parallel processing (MPP) 10s of terabytes 4–80 terabytes Up to 5 terabytes 10s–100s of terabytes
  • 5. Some Data Warehouses today Big SAN Big SMP Server Connected together What’s wrong with this picture?
  • 6. Answer: system out of balance  This server can consume 12 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec  Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’t  Queries are slow  Despite significant investment in both Server and Storage Result: significant investment, not delivering performance
  • 8. The Alternative: A Balanced System  Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload  Avoid sharing storage devices among servers  Avoid overinvesting in disk drives
  • 10. SQL Server Fast Track Data Warehouse Solution to help customers and partners accelerate their data warehouse deployments  A method for designing a cost-effective, balanced system for Data Warehouse workloads  Reference hardware configurations developed in conjunction with hardware partners using this method  Best practices for data layout, loading and management
  • 11. Software: • SQL Server 2008 R2 Enterprise • Windows Server 2008 R2 Configuration guidelines: • Physical table structures • Indexes • Compression • SQL Server settings • Windows Server settings • Loading Hardware: • Tight specifications for servers, storage and networking • ‘Per core’ building block
  • 12. Core Fast Track Metrics • − − − −
  • 13. System Benchmarking - MCR • − − • − • − 200MB/s per core
  • 14. Establishing Fast Track MCR • − − • −
  • 15. System Benchmarking - BCR • − − • Actual Miles Per Gallon •
  • 16. Establishing Fast Track BCR • − − − −
  • 18. Fast Track Reference Configurations 2 Processor Configurations (5 – 20 TB, 2-3.7 GB/s)     4 Processor Configurations (20 – 40 TB, 3.5-7.5 GB/s)     8 processor Configurations (40 – 80 TB, 7.5-14 GB/s) 
  • 19. Data Warehouse Workload Characteristics SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE, AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS AVG_DISC, COUNT(*) AS COUNT_ORDER FROM LINEITEM GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG, L_LINESTATUS
  • 20. Software configuration SQL Server Startup • − •
  • 21. Software configuration Temp DB • − − • − • • − −
  • 22. Software configuration Temp DB & TLOG • − − − − • − − − • − −
  • 23. DW Server Baseline Configs • − − − − − • −
  • 25. Fast Track Data Striping • FT Storage Enclosure Raid-1 Primary Data Log ARY01D1v01 ARY02D1v03 ARY03D1v05 ARY04D1v07 ARY05v09 DB1-1.ndf DB1-5.ndf DB1-7.ndf DB1.ldf DB1-3.ndf Disk 1 & 2 ARY01D2v02 ARY02D2v04 ARY03D2v06 ARY04D2v08 DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf Microsoft Confidential
  • 26. User Databases • − − − • • • −
  • 28. LUN 1 LUN 2 LUN 3 LUN16 Permanent FG Permanant_DB Permanent_1.ndf Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf Stage FG Database Stage Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf Local Drive 1 TempDB TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB) Log LUN 1 Permanent DB Log Stage DB Log
  • 30. Control rack Data racks Control Rack Data Rack Compute Nodes Storage Nodes Control Nodes SQL Active / Passive SQL SQL SQL SQL Management Nodes Dual Fiber Channel SQL Dual Infiniband SQL SQL Landing Node SQL SQL Backup Node SQL Spare Compute Node Private Network
  • 31. 1 Data Rack • 17 Servers • 22 Procs • 132 Cores Control Rack DataRack Expand to 4 data racks and quadruple your performance and capacity!
  • 32. Query Speed in Seconds PDW Time Orig. Time 4500 4200 4000 3500 3000 2500 2000 1500 1200 1200 1000 500 16 6 2 120 2 120 2 120 4 0 Q1 Q2 Q3 Q4 Q5 Q6 263x 200x 60x 60x 60x 300x PDW times faster than original query speeds
  • 33. Parallel Data Warehouse Appliance Hardware Architecture Compute Nodes Storage Nodes Control Nodes SQL Active/Passive SQL SQL Client Drivers SQL SQL Management Nodes SQL Dual Fiber Channel Data Center Dual Infiniband SQL Monitoring SQL Landing Node SQL ETL Load Interface SQL Backup Node SQL Corporate Backup Solution Spare Compute Node Corporate Network Private Network
  • 34. Parallel Data Warehouse benefits Massively Parallel Processing Compute Nodes Storage Nodes Control Nodes ? SQL Active/Passive Query 1 is Query 1 ? SQL submitted to SQL Server SQL ? SQL on Control Node ? SQL Management Nodes ? SQL Dual Fiber Channel Query is Dual Infiniband ? SQL executed on all 10 Nodes ? SQL Landing Node ? SQL Results are sent back to ? SQL client Backup Node ? SQL Spare Compute Node Corporate Network Private Network
  • 35. Parallel Data Warehouse benefits Massively Parallel Processing Compute Nodes Storage Nodes Control Nodes Multiple ???????? SQL queries are ? Active/Passive ???????? SQL simultane- ? ???? SQL ???????? SQL ously ??? executed ? ? across all ???????? SQL nodes. ? Management Nodes ???????? SQL Dual Fiber Channel Dual Infiniband ???????? SQL ? ???????? SQL PDW supports ? Landing Node ???????? SQL querying ???????? while SQL data is ? ???????? loading. ? Backup Node SQL Spare Compute Node Blazing fast performance by parallelizing queries on highly optimized Corporate Network Private Network shared nothing nodes
  • 36. • • • − −
  • 37. MPP Engine Coordinator Software Architecture Provides single system image SQL compilation Global metadata and appliance configuration Global query optimization and plan generation Global query execution coordination Other Global transaction coordination Query MS BI Internet Authentication and authorization DWSQL Third- Explorer Tool (AS, RS) Supportability (hardware and software status) Party Tools Compute Node Compute Nodes Compute Nodes IIS Data Movement Service Data Access Admin (OLEDB, ODBC, ADO.NET, JDBC) Console User Data SQL Server Core SQL DMS Engine Parser Manager Data Backup Node Services Movement MPP Engine Coordinator Service Data Movement Service Landing Zone Node DW DW DW Data Movement Service TempDB Authentication Configuration Schema SQL Server Data Movement Service Control Node Data movement across the appliance Distributed query execution operators
  • 39. Blazing-Fast Performance “400 percent improvement in performance First American Title Insurance Company Now, up to 10xFaster³ ColumnStore ¹Source: Microsoft customer evidence, Choice Hotels International ²Source: Microsoft customer evidence, KAS Bank ³Source: Microsoft customer testing; common data warehousing queries
  • 40. ProductKey SalesAmount OrderDateKey OrderDateKey ProductKey SalesAmount 20101107 106 30.00 20101107 StoreKey RegionKey Quantity 103 17.00 20101107 01 1 6 109 20101107 20.00 2 1 103 04 20101107 17.00 2 2 106 04 20101108 2 20.00 1 106 03 3 OrderDateKey 25.00 4 05 1 20101108 ProductKey 5 02 20101108 SalesAmount 102 RegionKey Quantity 20101108 106 14.00 StoreKey 1 1 20101109 109 25.00 02 2 5 20101109 1 106 10.00 03 20101109 1 106 01 2 20.00 4 103 2 04 25.00 1 5 04 1 17.00 01
  • 41. 41 • Batch object • Column vectors • List of qualifying rows − − •
  • 43. In a standard scale-out server deployment, multiple report servers share a single report server database. The report server database should be installed on a remote SQL Server instance. The following diagram is an example of a standard scale-out server deployment configuration with the report server database on a remote SQL Server instance.
  • 44. As another option, you might decide to host the report server database on a SQL Server instance that is part of a failover cluster. The following diagram is an example of a scale-out server deployment configuration where the report server databases are on an instance that is part of a failover cluster.
  • 45. In addition to the standard scale-out deployment, you might determine that your reporting environment would benefit from a more advanced scale-out deployment configuration. For example, you might decide to use the load-balanced report servers for interactive report processing and add a separate report server computer to process only scheduled reports. The following diagram is an example of this advanced scale- out server deployment configuration.
  • 46. Log Description The report server execution log contains data about specific reports, including when a report was run, Report Server Execution Log who ran it, where it was delivered, and which rendering format was used. The execution log is stored in the report server database. The service trace log contains very detailed information that is useful if you are debugging an Report Server Service Trace Log application or investigating an issue or event. The file is located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles. The HTTP log file contains a record of all HTTP requests and responses handled by the Report Server Web service and Report Manager. HTTP logging is not enabled by default. You must modify the Report Server HTTP Log ReportingServicesService.exe configuration file to use this feature in your installation. The file is located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles.
  • 48. − • • • • •
  • 49. − − − − − • • • • • • • • • • • • •
  • 51. Under the properties of your data source, increasing the network packet size for SQL Server minimizes the protocol overhead require to build many, small packages. The default value for SQL Server 2008 is 4096. With a data warehouse load, a packet size of 32K (in SQL Server, this means assigning the value 32767) can benefit processing. Don’t change the value in SQL Server using sp_configure; instead override it in your data source. This can be set whether you are using TCP/IP or Shared Memory.
  • 54. − • − − − • • • •
  • 58. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Editor's Notes

  • #4: This slide shows what we are going to talk about today. We will start off discussing Microsoft’s vision for data warehousing solutions. Then we will discuss the different offerings. Next, we will discuss how you can get support and services to help you get started with your data warehouse and to help accelerate the completion of your solution. Finally, we will end with a discussion of the quick start services to enable you to begin your data warehouse solution quickly.
  • #5: SQL Server 2008 R2 comes in several editions. In this presentation, we will look at 4 different SKUs, each of which has different features that are important for data warehousing. We will drill down to get more information about each edition and the features that are important.
  • #14: Remind them
  • #15: In order to ensure the query is cached you need to do the following:Ensure the results of the query will fit in memoryRun the query once. The 2nd and subsequent times you execute the query it should be cached from memory. You can tell this b/c the 2nd execution should be much faster than the initialReview:TPC BENCHMARKTM Hhttp://www.tpc.org/tpch/spec/tpch2.8.0.pdfTPC-H Data Sethttp://www.tpc.org/tpch/spec/tpch_2_8_0.ziphttp://www.tpc.org/tpch/spec/reference2.8.0.zip
  • #16: Remind them “Your mileage may vary”
  • #21: -E is the primary way we help to ensure longer “runs” of contiguous, logically grouped pages.An extent is (8) 8k pages.. Or 64k (64k*64k)/1024 = 4MBSQL will still allocate the 4MB extent in groups of (8) 8k pages at a time. This means that pages can still be interleaved (extent fragmentation) down to the extent level.TF117 is specific to TempDB as Autogrow should be off for all other databasesCustomer may have a database with a specific use case that requires autogrow..this is ok just needs to be managedShould not be a major part of the overall workload. This file will become fragmentedUsing Autogrow for Tempdb is about practicality. It can be hard to pre-allocated TempDB. If they can pre-allocate it, go for itReview:Using the SQL Server Service Startup Optionshttp://msdn.microsoft.com/en-us/library/ms190737.aspxSAP with Microsoft SQL Server 2005: Best Practices for High Availability, Maximum Performance, and Scalabilityhttp://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/SAP_SQL2005_Best%20Practices.doc
  • #22: Remember that additional space may be needed during initial migration of data if moving onto a Fast Track RA or during the initial load of a new Fast Track RAReview:Working with tempdb in SQL Server 2005http://technet.microsoft.com/en-us/library/cc966545.aspxCapacity Planning for tempdbhttp://msdn.microsoft.com/en-us/library/ms345368.aspx
  • #23: Remember that additional space may be needed during initial migration of data if moving onto a Fast Track RA or during the initial load of a new Fast Track RAReview:Working with tempdb in SQL Server 2005http://technet.microsoft.com/en-us/library/cc966545.aspxCapacity Planning for tempdbhttp://msdn.microsoft.com/en-us/library/ms345368.aspx
  • #24: Workloads often need large amounts of data pages to be in cache, in this case add additional memory as neededHash Joins and Sorts can make use of additional memory to help prevent them from spilling to tempdb. Workloads with large amounts of queries and bulk loads performing hash joins and sorts will benefit from more memory.Review:Troubleshooting Performance Problems in SQL Server 2008http://msdn.microsoft.com/en-us/library/dd672789.aspxHow to: Enable the Lock Pages in Memory Optionhttp://msdn.microsoft.com/en-us/library/ms190730.aspxTuning options for SQL Server 2005 and SQL Server 2008 when running in high performance workloads
  • #31: 4 Racks in V1Orderable at the rack levelRequired software13k Price per TB Pricing and licensing training in resources
  • #37: Data layout options:Dimension tables are typically replicated.PDW maintains data integrity across all nodes.Fact tables are typically distributed.The data model, table sizes, and workloads must all be considered when choosing between replicated and distributed tables.The following join types are used to achieve Distribution Compatibility:Shared Nothing join - Achieves Distribution Compatibility by using compatible Distribution Keys in the SQL join criteria.Ultra Shared Nothing join - Achieves Distribution Compatibility through a replicated table; no data movement between nodes is required.Redistribution join - Requires data to be dynamically distributed between Compute Nodes to achieve Distribution Compatibility.