SlideShare a Scribd company logo
RTO/RPO and Backup
Recovery Setup
Mod u l e 1 3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Module 13
If your infrastructure becomes unavailable, you need to be able to get your
application running again within an appropriate amount of time and at an
appropriate level of cost.
• Disaster Planning
• Recovery Options
The architectural need
Module Overview
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Disaster Planning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What Are We Planning?
Everything fails, all the time.
-Werner Vogels
Large-scale events Colossal events
Small-scale events
How do we prepare for these?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability Concepts
• Minimizing downtime for your application
High availability
• Make sure your data is safe
Backup
• Get your applications and data back after a major
disaster
Disaster recovery
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RPO and RTO
Recovery Point Objective
(RPO)
How often does data need to be
backed up?
Disaster
Time
Example:
The business can recover
from losing (at most) the
last 12 hours of data.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RPO and RTO
Recovery Point Objective
(RPO)
How often does data need to be
backed up?
Disaster
Time
Example:
The business can recover
from losing (at most) the
last 12 hours of data.
Recovery Time Objective (RTO)
How long can the application be
unavailable?
The application can be
unavailable for a
maximum of 1 hour.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Regions Can Go Down
Region 1 Region 2
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Essential AWS Services and Features for Disaster Recovery
Regions
Storage Compute Networking Database Deployment
orchestration
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Should Be Duplicated
Amazon S3
Cross-region
replication
10
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Should Be Duplicated
Amazon S3
Cross-region
replication
Replicated to
multiple Availability
Zones and multiple
devices in each
Availability Zone
Amazon S3
Glacier
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Should Be Duplicated
Amazon S3
Amazon S3
Glacier
Amazon EBS
Cross-region
replication
• Create point-in-time
volume snapshots
• Copy snapshots across
regions and accounts
Replicated to
multiple Availability
Zones and multiple
devices in each
Availability Zone
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Should Be Duplicated
Amazon S3
AWS Snowball
Cross-region
replication
• Create point-in-time
volume snapshots
• Copy snapshots across
regions and accounts
Transfers large volumes
(>10TB) of data more
quickly than high-speed
Internet.
Replicated to
multiple Availability
Zones and multiple
devices in each
Availability Zone
Amazon S3
Glacier
Amazon EBS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage Should Be Duplicated
Amazon S3
AWS
DataSync
AWS Snowball
Cross-region
replication
• Create point-in-time
volume snapshots
• Copy snapshots across
regions and accounts
Transfers large volumes
(>10TB) of data more
quickly than high-speed
Internet.
Replicated to
multiple Availability
Zones and multiple
devices in each
Availability Zone
Amazon S3
Glacier
Amazon EBS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Spinning Your Compute Back Up Should Be Easy
Custom
AMIs
Obtain and boot new server instances or containers within minutes
Custom
container
images
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Networking Disaster Recovery Options
Amazon Route
53
• Traffic distribution
• Failover
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Networking Disaster Recovery Options
Amazon Route
53
Elastic Load Balancing
• Load balancing
• Health checks
and failover
• Traffic distribution
• Failover
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Networking Disaster Recovery Options
Amazon Route
53
Elastic Load Balancing
Amazon VPC
• Load balancing
• Health checks
and failover
Extend your existing
on-premises
network topology to
the cloud.
• Traffic distribution
• Failover
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Networking Disaster Recovery Options
Amazon Route
53
Elastic Load Balancing
AWS Direct Connect
Amazon VPC
• Load balancing
• Health checks
and failover
Extend your existing
on-premises
network topology to
the cloud.
Fast and consistent
replication/backups of
your large on-premises
environment to the cloud
• Traffic distribution
• Failover
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Databases Should Be
Backed Up and Redundant
Amazon RDS
• Snapshot data and save it
in a separate region.
• Combine Read Replicas
with Multi-AZ to build a
resilient disaster recovery
strategy.
• Automatic backups
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Databases Should Be
Backed Up and Redundant
Amazon DynamoDB
Amazon RDS
• Back up full tables in seconds.
• Use point-in-time-recovery to
continuously back up tables for up to 35
days.
• Initiate backups with a single click in the
console or a single API call.
• Build multi-region, multi-master tables
for fast local performance for globally
distributed apps with Global tables.
• Snapshot data and save it
in a separate region.
• Combine Read Replicas
with Multi-AZ to build a
resilient disaster recovery
strategy.
• Retain automated backups
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use Automation To Quickly Recover
AWS
CloudFormation
Use templates to
quickly deploy
collections of
resources as
needed
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use Automation To Quickly Recover
AWS
CloudFormation
Use templates to
quickly deploy
collections of
resources as
needed
AWS Elastic
Beanstalk
Quickly redeploy
your entire stack in
only a few clicks
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use Automation To Quickly Recover
AWS OpsWorks
• Automatic host
replacement
• Combine it with AWS
CloudFormation in the
recovery phase
• Provision a new stack
that supports the
defined RTO
AWS
CloudFormation
Use templates to
quickly deploy
collections of
resources as
needed
AWS Elastic
Beanstalk
Quickly redeploy
your entire stack in
only a few clicks
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recovery Strategies
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Backup and Restore Example
Amazon S3
Amazon
Glacier
Remote
location
/mybucket
Amazon S3
Standard IA
Lifecycle
policy
Remote
location
AWS DR Region
Amazon EC2
Backup Restore
Amazon S3
Amazon
Glacier
/mybucket
Amazon S3
Standard IA
Lifecycle
policy
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Backing up On-Premises Data to AWS
AWS Storage
Gateway
Amazon
S3
Amazon
Glacier
File gateway
Tape gateway
Volume gateway
On-premises
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Storage Gateway
On-premises
infrastructure
File
gateway
Amazon
Glacier
S3-IA
N
FS
v3
/ v4.1
Backup
server
Volume
gateway
iSCSI
Tape
gateway
VTL - iSCSI
Volume
gateway
S3
Amazon
Glacier
Tape gateway VTL
EBS snapshots
Amazon S3
S3
Standard
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Direct attached or SAN disks
Host
Use Case: Off-Site Backup Solution
with Gateway-Stored Volumes
On-premises data center
iSCSI
Hypervisor
SSL
CIFS/
NFS
File
servers
Volume
storage
Upload
buffer
AWS Storage Gateway VM
Snapshots
(incremental
backup)
Create new volumes
in Amazon EBS or
on your local
gateway's storage
Host
AWS Storage
Gateway
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Restore Backup To On-Premises
Data Center: Gateway-Stored
On-premises data center
Direct-attached or SAN disks
iSCSI
CIFS/
NFS
File
servers
Volume
storage
Upload
buffer
AWS Storage Gateway
VM
Provision a new local disk
and restore a snapshot to
it
Snapshot
AWS Storage
Gateway
Hypervisor
Host
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Backup and Restore
Preparation Phase
• Take backups of current systems.
• Store backups in Amazon S3.
• Describe procedure to restore from backup on AWS.
• Know which AMI to use; build your own as needed.
• Know how to restore system from backups.
• Know how to switch to new system.
• Know how to configure the deployment.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Backup and Restore
In case of disaster:
• Retrieve backups from Amazon S3.
• Bring up required infrastructure.
• Amazon EC2 instances with prepared AMIs, ELB, etc.
• Use AWS CloudFormation to automate deployment of core networking.
• Restore system from backup.
• Switch over to the new system.
• Adjust DNS records to point to AWS.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pilot Light Example
Web
Server
App
Server
Database
Server
Data mirroring/replication
Not running
User or system
Amazon Route 53
hosted zone
DB
secondar
y
Database
Server
DB
Web
Server
App
Server
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pilot Light Example
Web
Server
App
Server
Data mirroring/replication
Starts in
minutes
User or system
Amazon Route 53
hosted zone
DB
secondar
y
Database
Server
DB
Web
Server
App
Server
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pilot Light
Advantage
• Very cost-effective (uses fewer 24/7 resources)
Preparation Phase
• Set up Amazon EC2 instances to replicate or mirror data.
• Ensure that you have all supporting custom software packages available in AWS.
• Create and maintain Amazon Machine Images (AMI) of key servers where fast recovery
is required.
• Regularly run these servers, test them, and apply any software updates and
configuration changes.
• Consider automating the provisioning of AWS resources.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pilot Light
In case of disaster
• Automatically bring up resources around the replicated core data set.
• Scale the system as needed to handle current production traffic.
• Switch over to the new system.
• Adjust DNS records to point to AWS.
Objectives
• RTO: As long as it takes to detect need for DR and automatically scale up replacement
system
• RPO: Depends on replication type
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully Working Low-Capacity Standby
Web
Server
App
Server
Low capacity
User or system
Amazon Route 53
hosted zone
Auto
Scaling
Auto
Scaling
Database
Server
Database
Server
Data mirroring/replication
DB
secondar
y
Database
Server
DB
Web
Server
App
Server
Web
Server
App
Server
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully Working Low-Capacity Standby
Web
server
App
server
Low capacity
User or system
Amazon Route 53
hosted zone
Web
server
App
server
Database
Server
Data mirroring/replication
DB
secondar
y
Database
Server
Database
Server
DB
Web
Server
App
Server
Web
Server
App
Server
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully Working Low-Capacity Standby
Advantages
• Can take some production traffic at any time
• Cost savings (IT footprint smaller than full DR)
Preparation
• Similar to Pilot Light
• All necessary components running 24/7, but not scaled for production traffic
• Best practice: continuous testing
• “Trickle” a statistical subset of production traffic to DR site
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully Working Low-Capacity Standby
In case of disaster
• Immediately fail over most critical production load
• Adjust DNS records to point to AWS
• (Auto) Scale the system further to handle all production load
Objectives
• RTO: For critical load: as long as it takes to fail over; for all other load, as long as it takes
to scale further
• RPO: Depends on replication type
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-Site Active-Active
Web
server
Web
server
App
server
Full capacity
User or system
Amazon Route 53
hosted zone
Web
server
App
server
Database
Server
Database
Server
Database
Server
Data mirroring/replication
DB
secondar
y
Database
Server
Database
Server
DB
Web
Server
App
Server
Web
Server
App
Server
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-Site Active-Active
Advantages
• At any moment, can take all production load
Preparation
• Similar to low-capacity standby
• Fully scaling in/out with production load
In case of disaster
• Immediately fail over all production load
Objectives
• RTO: As long as it takes to fail over
• RPO: Depends on replication type
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Common Practices for Disaster Recovery on AWS
 Lower priority
use cases
 Solutions:
Amazon S3,
Storage
Gateway
 Meeting lower RTO
and RPO
requirements
 Core services
 Scale AWS resources
in response to a DR
event
 Solutions that
require RTO and
RPO in minutes
 Business-critical
services
 Auto-failover of
your
environment in
AWS to a running
duplicate
Cost: $ Cost: $$ Cost: $$$ Cost: $$$$
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best Practices For Being Prepared
Check for
software licensing
issues
Practice ”Game Day”
exercises
Start simple
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
One more thing..
Your feedback is critical for us!
• Login to https://aws.training
• Click on My Transcript, then on the Archived tab
• Find the training completed Architecting on AWS, and then click
Evaluate.
© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in
part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections
or feedback on the course, please email us at: aws-course-feedback@amazon.com. For all other questions, contact us at:
https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.
Thank You

More Related Content

PDF
Backup-and-Recovery Procedures decribed in AWS
PDF
AWS CZSK Webinář 2020.03: AWS Outposts
PPTX
Aws for disaster recovery - DevOps Union
PDF
Uses, considerations, and recommendations for AWS
PDF
Backup and recovery_approaches_using_aws
PPTX
Aws disaster recovery
PPTX
Optimize your AWS FEST - N2WS session
PDF
BackupRestoreInfographic.pdf
Backup-and-Recovery Procedures decribed in AWS
AWS CZSK Webinář 2020.03: AWS Outposts
Aws for disaster recovery - DevOps Union
Uses, considerations, and recommendations for AWS
Backup and recovery_approaches_using_aws
Aws disaster recovery
Optimize your AWS FEST - N2WS session
BackupRestoreInfographic.pdf

Similar to ArchitectingOnAWS_Module_13 goat bumrah i (20)

PDF
Backup & Disaster Recovery on AWS - An overview of our Approach
PPTX
AWS Session.pptx
PDF
2. migration, disaster recovery and business continuity in the cloud
PPTX
Aws disaster recovery
PDF
LTIMindtree Surviving the Storm WhitePaper
PPTX
How Easy to Automate Application Deployment on AWS
PDF
Understand the Cloud Computing and the future career possibilities
PDF
AWS Study Group - Chapter 08 - Disaster Recovery Strategies [Solution Archite...
PPTX
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
PPTX
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
PPTX
AWS Webinar 24 - Getting Started with AWS - Understanding DR
PDF
AWS Outposts Update
PDF
Up and Running, even during disaster
PDF
AWS-services.pdf
PPTX
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
PDF
Whte Paper: Using aws for disaster recovery
PDF
Hybrid cloud enabling a borderless data centre for your business
PPTX
DevConf 2020: Resiliency and availability design patterns for the cloud
PDF
Disaster Recovery on AWS best practices lessons learned
PDF
Backup to the Cloud
Backup & Disaster Recovery on AWS - An overview of our Approach
AWS Session.pptx
2. migration, disaster recovery and business continuity in the cloud
Aws disaster recovery
LTIMindtree Surviving the Storm WhitePaper
How Easy to Automate Application Deployment on AWS
Understand the Cloud Computing and the future career possibilities
AWS Study Group - Chapter 08 - Disaster Recovery Strategies [Solution Archite...
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Outposts Update
Up and Running, even during disaster
AWS-services.pdf
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
Whte Paper: Using aws for disaster recovery
Hybrid cloud enabling a borderless data centre for your business
DevConf 2020: Resiliency and availability design patterns for the cloud
Disaster Recovery on AWS best practices lessons learned
Backup to the Cloud
Ad

More from m23aid005 (9)

PPTX
3D printer (1)-2.pptxhshshsbsbsbsbsbsbsbsbb
PPTX
shhsbshsuhsbsjsjshsvsbsjsiusegbejsjsjsbsbs
PPTX
3D printer (1)-1.pptxbhcvjgbjfccvjnbvvvvv
PPTX
dockerselfstudy one of tye best manibaku pafiba
PDF
Kohinoor Odia Calendar 2025 All Month-output.pdf
PPTX
Financial Engineering Project engine will be
PPTX
Snowflake_Data_Validation_Framework_Presentation.pptx
PPTX
Recipe_Finder_Presentation7816w98tcw6.pptx
PPTX
Heart Disease Prediction Using ML (274,332,167,546).pptx
3D printer (1)-2.pptxhshshsbsbsbsbsbsbsbsbb
shhsbshsuhsbsjsjshsvsbsjsiusegbejsjsjsbsbs
3D printer (1)-1.pptxbhcvjgbjfccvjnbvvvvv
dockerselfstudy one of tye best manibaku pafiba
Kohinoor Odia Calendar 2025 All Month-output.pdf
Financial Engineering Project engine will be
Snowflake_Data_Validation_Framework_Presentation.pptx
Recipe_Finder_Presentation7816w98tcw6.pptx
Heart Disease Prediction Using ML (274,332,167,546).pptx
Ad

Recently uploaded (20)

PPT
EthicsNotesSTUDENTCOPYfghhnmncssssx sjsjsj
PPTX
22CDH01-V3-UNIT-I INTRODUCITON TO EXTENDED REALITY
PDF
Strengthening Tamil Identity A. Swami Durai’s Legacy
PPTX
timber basics in structure mechanics (dos)
PPTX
LITERATURE CASE STUDY DESIGN SEMESTER 5.pptx
PDF
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
PPTX
22CDO02-IMGD-UNIT-I-MOBILE GAME DESIGN PROCESS
PDF
Introduction-to-World-Schools-format-guide.pdf
PPTX
UNIT III - GRAPHICS AND AUDIO FOR MOBILE
PPTX
Tenders & Contracts Works _ Services Afzal.pptx
PDF
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
PPTX
Introduction to Building Information Modeling
PDF
ART & DESIGN HISTORY OF VEDIC CIVILISATION.pdf
PPTX
rapid fire quiz in your house is your india.pptx
PPTX
Entrepreneur intro, origin, process, method
PDF
Architecture Design Portfolio- VICTOR OKUTU
PPT
robotS AND ROBOTICSOF HUMANS AND MACHINES
PPT
aksharma-dfs.pptgfgfgdfgdgdfgdfgdgdrgdgdgdgdgdgadgdgd
PDF
Chalkpiece Annual Report from 2019 To 2025
PPT
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
EthicsNotesSTUDENTCOPYfghhnmncssssx sjsjsj
22CDH01-V3-UNIT-I INTRODUCITON TO EXTENDED REALITY
Strengthening Tamil Identity A. Swami Durai’s Legacy
timber basics in structure mechanics (dos)
LITERATURE CASE STUDY DESIGN SEMESTER 5.pptx
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
22CDO02-IMGD-UNIT-I-MOBILE GAME DESIGN PROCESS
Introduction-to-World-Schools-format-guide.pdf
UNIT III - GRAPHICS AND AUDIO FOR MOBILE
Tenders & Contracts Works _ Services Afzal.pptx
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
Introduction to Building Information Modeling
ART & DESIGN HISTORY OF VEDIC CIVILISATION.pdf
rapid fire quiz in your house is your india.pptx
Entrepreneur intro, origin, process, method
Architecture Design Portfolio- VICTOR OKUTU
robotS AND ROBOTICSOF HUMANS AND MACHINES
aksharma-dfs.pptgfgfgdfgdgdfgdfgdgdrgdgdgdgdgdgadgdgd
Chalkpiece Annual Report from 2019 To 2025
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt

ArchitectingOnAWS_Module_13 goat bumrah i

  • 1. RTO/RPO and Backup Recovery Setup Mod u l e 1 3
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Module 13 If your infrastructure becomes unavailable, you need to be able to get your application running again within an appropriate amount of time and at an appropriate level of cost. • Disaster Planning • Recovery Options The architectural need Module Overview
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Disaster Planning
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What Are We Planning? Everything fails, all the time. -Werner Vogels Large-scale events Colossal events Small-scale events How do we prepare for these?
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability Concepts • Minimizing downtime for your application High availability • Make sure your data is safe Backup • Get your applications and data back after a major disaster Disaster recovery
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RPO and RTO Recovery Point Objective (RPO) How often does data need to be backed up? Disaster Time Example: The business can recover from losing (at most) the last 12 hours of data.
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RPO and RTO Recovery Point Objective (RPO) How often does data need to be backed up? Disaster Time Example: The business can recover from losing (at most) the last 12 hours of data. Recovery Time Objective (RTO) How long can the application be unavailable? The application can be unavailable for a maximum of 1 hour.
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Regions Can Go Down Region 1 Region 2
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Essential AWS Services and Features for Disaster Recovery Regions Storage Compute Networking Database Deployment orchestration
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Should Be Duplicated Amazon S3 Cross-region replication 10
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Should Be Duplicated Amazon S3 Cross-region replication Replicated to multiple Availability Zones and multiple devices in each Availability Zone Amazon S3 Glacier
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Should Be Duplicated Amazon S3 Amazon S3 Glacier Amazon EBS Cross-region replication • Create point-in-time volume snapshots • Copy snapshots across regions and accounts Replicated to multiple Availability Zones and multiple devices in each Availability Zone
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Should Be Duplicated Amazon S3 AWS Snowball Cross-region replication • Create point-in-time volume snapshots • Copy snapshots across regions and accounts Transfers large volumes (>10TB) of data more quickly than high-speed Internet. Replicated to multiple Availability Zones and multiple devices in each Availability Zone Amazon S3 Glacier Amazon EBS
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage Should Be Duplicated Amazon S3 AWS DataSync AWS Snowball Cross-region replication • Create point-in-time volume snapshots • Copy snapshots across regions and accounts Transfers large volumes (>10TB) of data more quickly than high-speed Internet. Replicated to multiple Availability Zones and multiple devices in each Availability Zone Amazon S3 Glacier Amazon EBS
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Spinning Your Compute Back Up Should Be Easy Custom AMIs Obtain and boot new server instances or containers within minutes Custom container images
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Networking Disaster Recovery Options Amazon Route 53 • Traffic distribution • Failover
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Networking Disaster Recovery Options Amazon Route 53 Elastic Load Balancing • Load balancing • Health checks and failover • Traffic distribution • Failover
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Networking Disaster Recovery Options Amazon Route 53 Elastic Load Balancing Amazon VPC • Load balancing • Health checks and failover Extend your existing on-premises network topology to the cloud. • Traffic distribution • Failover
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Networking Disaster Recovery Options Amazon Route 53 Elastic Load Balancing AWS Direct Connect Amazon VPC • Load balancing • Health checks and failover Extend your existing on-premises network topology to the cloud. Fast and consistent replication/backups of your large on-premises environment to the cloud • Traffic distribution • Failover
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Databases Should Be Backed Up and Redundant Amazon RDS • Snapshot data and save it in a separate region. • Combine Read Replicas with Multi-AZ to build a resilient disaster recovery strategy. • Automatic backups
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Databases Should Be Backed Up and Redundant Amazon DynamoDB Amazon RDS • Back up full tables in seconds. • Use point-in-time-recovery to continuously back up tables for up to 35 days. • Initiate backups with a single click in the console or a single API call. • Build multi-region, multi-master tables for fast local performance for globally distributed apps with Global tables. • Snapshot data and save it in a separate region. • Combine Read Replicas with Multi-AZ to build a resilient disaster recovery strategy. • Retain automated backups
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use Automation To Quickly Recover AWS CloudFormation Use templates to quickly deploy collections of resources as needed
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use Automation To Quickly Recover AWS CloudFormation Use templates to quickly deploy collections of resources as needed AWS Elastic Beanstalk Quickly redeploy your entire stack in only a few clicks
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use Automation To Quickly Recover AWS OpsWorks • Automatic host replacement • Combine it with AWS CloudFormation in the recovery phase • Provision a new stack that supports the defined RTO AWS CloudFormation Use templates to quickly deploy collections of resources as needed AWS Elastic Beanstalk Quickly redeploy your entire stack in only a few clicks
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recovery Strategies
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Backup and Restore Example Amazon S3 Amazon Glacier Remote location /mybucket Amazon S3 Standard IA Lifecycle policy Remote location AWS DR Region Amazon EC2 Backup Restore Amazon S3 Amazon Glacier /mybucket Amazon S3 Standard IA Lifecycle policy
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Backing up On-Premises Data to AWS AWS Storage Gateway Amazon S3 Amazon Glacier File gateway Tape gateway Volume gateway On-premises
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Storage Gateway On-premises infrastructure File gateway Amazon Glacier S3-IA N FS v3 / v4.1 Backup server Volume gateway iSCSI Tape gateway VTL - iSCSI Volume gateway S3 Amazon Glacier Tape gateway VTL EBS snapshots Amazon S3 S3 Standard
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Direct attached or SAN disks Host Use Case: Off-Site Backup Solution with Gateway-Stored Volumes On-premises data center iSCSI Hypervisor SSL CIFS/ NFS File servers Volume storage Upload buffer AWS Storage Gateway VM Snapshots (incremental backup) Create new volumes in Amazon EBS or on your local gateway's storage Host AWS Storage Gateway
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Restore Backup To On-Premises Data Center: Gateway-Stored On-premises data center Direct-attached or SAN disks iSCSI CIFS/ NFS File servers Volume storage Upload buffer AWS Storage Gateway VM Provision a new local disk and restore a snapshot to it Snapshot AWS Storage Gateway Hypervisor Host
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Backup and Restore Preparation Phase • Take backups of current systems. • Store backups in Amazon S3. • Describe procedure to restore from backup on AWS. • Know which AMI to use; build your own as needed. • Know how to restore system from backups. • Know how to switch to new system. • Know how to configure the deployment.
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Backup and Restore In case of disaster: • Retrieve backups from Amazon S3. • Bring up required infrastructure. • Amazon EC2 instances with prepared AMIs, ELB, etc. • Use AWS CloudFormation to automate deployment of core networking. • Restore system from backup. • Switch over to the new system. • Adjust DNS records to point to AWS.
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pilot Light Example Web Server App Server Database Server Data mirroring/replication Not running User or system Amazon Route 53 hosted zone DB secondar y Database Server DB Web Server App Server
  • 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pilot Light Example Web Server App Server Data mirroring/replication Starts in minutes User or system Amazon Route 53 hosted zone DB secondar y Database Server DB Web Server App Server
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pilot Light Advantage • Very cost-effective (uses fewer 24/7 resources) Preparation Phase • Set up Amazon EC2 instances to replicate or mirror data. • Ensure that you have all supporting custom software packages available in AWS. • Create and maintain Amazon Machine Images (AMI) of key servers where fast recovery is required. • Regularly run these servers, test them, and apply any software updates and configuration changes. • Consider automating the provisioning of AWS resources.
  • 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pilot Light In case of disaster • Automatically bring up resources around the replicated core data set. • Scale the system as needed to handle current production traffic. • Switch over to the new system. • Adjust DNS records to point to AWS. Objectives • RTO: As long as it takes to detect need for DR and automatically scale up replacement system • RPO: Depends on replication type
  • 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully Working Low-Capacity Standby Web Server App Server Low capacity User or system Amazon Route 53 hosted zone Auto Scaling Auto Scaling Database Server Database Server Data mirroring/replication DB secondar y Database Server DB Web Server App Server Web Server App Server
  • 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully Working Low-Capacity Standby Web server App server Low capacity User or system Amazon Route 53 hosted zone Web server App server Database Server Data mirroring/replication DB secondar y Database Server Database Server DB Web Server App Server Web Server App Server
  • 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully Working Low-Capacity Standby Advantages • Can take some production traffic at any time • Cost savings (IT footprint smaller than full DR) Preparation • Similar to Pilot Light • All necessary components running 24/7, but not scaled for production traffic • Best practice: continuous testing • “Trickle” a statistical subset of production traffic to DR site
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully Working Low-Capacity Standby In case of disaster • Immediately fail over most critical production load • Adjust DNS records to point to AWS • (Auto) Scale the system further to handle all production load Objectives • RTO: For critical load: as long as it takes to fail over; for all other load, as long as it takes to scale further • RPO: Depends on replication type
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-Site Active-Active Web server Web server App server Full capacity User or system Amazon Route 53 hosted zone Web server App server Database Server Database Server Database Server Data mirroring/replication DB secondar y Database Server Database Server DB Web Server App Server Web Server App Server
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-Site Active-Active Advantages • At any moment, can take all production load Preparation • Similar to low-capacity standby • Fully scaling in/out with production load In case of disaster • Immediately fail over all production load Objectives • RTO: As long as it takes to fail over • RPO: Depends on replication type
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Common Practices for Disaster Recovery on AWS  Lower priority use cases  Solutions: Amazon S3, Storage Gateway  Meeting lower RTO and RPO requirements  Core services  Scale AWS resources in response to a DR event  Solutions that require RTO and RPO in minutes  Business-critical services  Auto-failover of your environment in AWS to a running duplicate Cost: $ Cost: $$ Cost: $$$ Cost: $$$$
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best Practices For Being Prepared Check for software licensing issues Practice ”Game Day” exercises Start simple
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. One more thing.. Your feedback is critical for us! • Login to https://aws.training • Click on My Transcript, then on the Archived tab • Find the training completed Architecting on AWS, and then click Evaluate.
  • 46. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: [email protected]. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Thank You

Editor's Notes

  • #4: What kind of disaster are you planning for? A small-scale event where you simply need to get a restoration and backup? A larger-scale event where multiple resources are impacted? A colossal scale event where multiple people and resources will be impacted? Disaster recovery (DR) is about preparing for and recovering from a disaster. Any event that has a negative impact on a company’s business continuity or finances could be termed a disaster. This includes hardware or software failure, a network outage, a power outage, physical damage to a building like fire or flooding, human error, or some other significant event.   To minimize the impact of a disaster, companies invest time and resources to plan and prepare, to train employees, and to document and update processes. The amount of investment for DR planning for a particular system can vary dramatically depending on the cost of a potential outage.   Companies that have traditional physical environments typically must duplicate their infrastructure to ensure the availability of spare capacity in the event of a disaster. The infrastructure needs to be procured, installed, and maintained so that it is ready to support the anticipated capacity requirements. During normal operations, the infrastructure typically is under-utilized or over-provisioned. With AWS, your company can scale up its infrastructure on an as-needed, pay-as-you-go basis. You get access to the same highly secure, reliable, and fast infrastructure that Amazon uses to run its own global network of websites. AWS also gives you the flexibility to quickly change and optimize resources during a DR event, which can result in significant cost savings.
  • #5: Production systems typically come with defined or implicit objectives in terms of uptime. A system is highly available when it can withstand the failure of an individual or multiple components (e.g., hard disks, servers, network links etc.).   High availability provides redundancy and fault tolerance. Its goal is to ensure this service is always available even in the event of a failure.    Backup is critical to protect data and to ensure business continuity. At the same time, it can be a challenge to implement well. The pace at which data is generated is growing exponentially. The density and durability of local disk is not benefiting from the same growth rate. The enterprise backup has become its own industry.   Data is generated on an arbitrarily large number of endpoints; laptops, desktops, servers, virtual machines, and now mobile devices, that is, the problem is distributed in nature. Current backup software is very centralized – the general model is to collect data from many devices and store it in single place. Sometimes a copy of that stored data is also sent to tape. The centralized approach has the potential to overwhelm the backup target during recovery from a disaster and result in broken recovery SLAs.   Enterprise backup scenarios used to look like this: If you wanted high performance data access, it had to live on disk. If you wanted cost-effective archival storage, it had to live on tape. If you wanted to archive off-site, you had to physically deliver your archival tapes to another location. Recovery from local disk was fine, unless you needed something from a tape, and it might have been a while if that tape wasn’t on site. The cloud has changed things. Backup software can write to the cloud without any changes to the backup software itself. (This will be discussed later.)   Disaster recovery (DR) is about preparing for and recovering from a disaster. A disaster is any event that has a negative impact on a company’s business continuity or finances—including hardware or software failure, a network outage, a power outage, physical damage to a building like fire or flooding, human error, or some other significant event.   To minimize the impact of a disaster, companies invest time and resources to plan and prepare, to train employees, and to document and update processes. The amount of investment for DR planning for a particular system can vary dramatically depending on the cost of a potential outage. Companies that have traditional physical environments typically must duplicate their infrastructure to ensure the availability of spare capacity in the event of a disaster. The infrastructure needs to be procured, installed, and maintained so that it is ready to support the anticipated capacity requirements. During normal operations, the infrastructure typically is under-utilized or over-provisioned.
  • #6: Recovery point objective (RPO) is the acceptable amount of data loss measured in time. For example, if a disaster occurs at 12:00 PM (noon) and the RPO is one hour, the system should recover all data that was in the system before 11:00 AM. Data loss will span only one hour, between 11:00 AM and 12:00 PM (noon).
  • #7: Recovery time objective (RTO) is the time it takes after a disruption to restore a business process to its service level, as defined by the operational level agreement (OLA). For example, if a disaster occurs at 12:00 PM (noon) and the RTO is eight hours, the DR process should restore the business process to the acceptable service level by 8:00 PM.   A company typically decides on an acceptable RPO and RTO based on the financial impact to the business when systems are unavailable. The company determines financial impact by considering many factors, such as the loss of business and damage to its reputation due to downtime and the lack of systems availability. IT organizations then plan solutions to provide cost-effective system recovery based on the RPO within the timeline and the service level established by the RTO.
  • #8: AWS is available in multiple regions around the globe, so you can choose the most appropriate location for your DR site, in addition to the site where your system is fully deployed. It’s highly unlikely for a region to be unavailable. But if some very large-scale event impacts a region—for instance, a meteor strike—it is within the realm of possibility. AWS maintains a page that inventories current services offered by region (products and services by region). AWS maintains a strict region isolation policy so that any large-scale event in one region will not impact any other region. We encourage our customers to take a similar approach to their multi-region strategy. Each region should be able to be taken offline with no impact to any other region. If you have an AWS Direct Connect (DX) circuit to any AWS Region in the United States, it will provide you with access to all regions in the US, including AWS GovCloud (US), without that traffic going through the public internet. Also consider how applications are deployed. If you deploy to each region separately, you can isolate that region in case of disaster, and transfer all your traffic to another region. If you are deploying new applications and infrastructure rapidly, you may want to have an active-active region. Let’s say you deploy something that causes a region's applications to be unavailable or misbehaving. You can remove the region from the active record set in Route 53, identify the root cause, and roll back the change before re-enabling the region.
  • #9: Before discussing the various approaches to disaster recovery, it is important to review the AWS services and features that are the most relevant to it. This section provides a summary. When planning for DR, it is important to consider the use of services and features that support data migration and durable storage, because they enable you to restore backed-up, critical data to AWS when disaster strikes. For some of the scenarios that involve either a scaled-down or a fully scaled deployment of your system in AWS, compute resources will be required as well.   During a disaster, you need to either spin up new resources or failover to existing pre-configured resources. These resources not only include code and content, but other pieces such as DNS entries, network firewall rules, and virtual machines/instances.
  • #10: AWS offers many different ways of storing your data. Each service has different capabilities, so that you can match the right service with the right need for each of your systems.   Amazon S3 provides a highly durable storage infrastructure designed for mission critical and primary data storage. Objects are redundantly stored on multiple devices across multiple facilities within a region, designed to provide a durability of 99.999999999% (119s). AWS provides further protection for data retention and archiving through versioning in Amazon S3, AWS MFA, bucket policies, and AWS IAM. Cross-region replication is a bucket-level configuration that enables automatic, asynchronous copying of objects across buckets in different AWS Regions. These buckets are called source bucket and destination bucket, and they can be owned by different AWS accounts. To activate this feature, you add a replication configuration to your source bucket to direct Amazon S3 to replicate objects according to the configuration. 
  • #11: Amazon S3 Glacier provides extremely low-cost storage for data archiving and backup. Objects (or archives, as they are known in Amazon S3 Glacier) are optimized for infrequent access, for which retrieval times of several hours are adequate. Amazon Glacier is designed for the same durability as Amazon S3. Although you need to maintain your own index of data you upload to Amazon S3 Glacier, an inventory of all archives in each of your vaults is maintained for disaster recovery or occasional reconciliation purposes. The vault inventory is updated approximately once a day. You can request a vault inventory as either a JSON or CSV file and will contain details about the archives within your vault including the size, creation date and the archive description (if you provided one during upload). The inventory will represent the state of the vault at the time of the most recent inventory update. Similar to Amazon S3, Amazon S3 Glacier allows for cross-region replication.
  • #12: Amazon EBS gives you the ability to create point-in-time snapshots of data volumes. You can use the snapshots as the starting point for new Amazon EBS volumes, and you can protect your data for long-term durability because snapshots are stored within Amazon S3. After a volume is created, you can attach it to a running Amazon EC2 instance. Amazon EBS volumes provide off-instance storage that persists independently from the life of an instance and is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. After you've created a snapshot and it has finished copying to Amazon S3 (when the snapshot status is completed), you can copy it from one AWS region to another, or within the same region. Amazon S3 server-side encryption (256-bit AES) protects a snapshot's data in-transit during a copy operation. The snapshot copy receives an ID that is different than the ID of the original snapshot.
  • #13: AWS Snowball is a data transport solution that accelerates moving terabytes to petabytes of data into and out of AWS using storage devices designed to be secure for physical transport. Using Snowball helps to eliminate challenges that can be encountered with large-scale data transfers including high network costs, long transfer times, and security concerns. In the event that you need to quickly retrieve a large quantity of data stored in Amazon S3, Snowball devices can help retrieve the data much quicker than high-speed internet.
  • #14: Use AWS DataSync to efficiently and securely sync files from on-premises or in-cloud file systems to Amazon Elastic File System (Amazon EFS) at speeds of up to 10x faster than open source tools. AWS DataSynch securely and efficiently copies files over the internet or a DX connection. For more information, see: https://aws.amazon.com/datasync/
  • #15: In the context of DR, it’s critical to be able to rapidly create virtual machines that you control. By launching instances in separate Availability Zones, you can protect your applications from the failure of a single location.   You can arrange for automatic recovery of an EC2 instance when a system status check of the underlying hardware fails. The instance will be rebooted (on new hardware if necessary) but will retain its Instance Id, IP Address, Elastic IP Addresses, EBS Volume attachments, and other configuration details. In order for the recovery to be complete, you’ll need to make sure that the instance automatically starts up any services or applications as part of its initialization process.   Amazon Machine Images (AMIs) are preconfigured with operating systems, and some preconfigured AMIs might also include application stacks. You can also configure your own AMIs. In the context of DR, AWS strongly recommends that you configure and identify your own AMIs so that they can launch as part of your recovery procedure. Such AMIs should be preconfigured with your operating system of choice plus appropriate pieces of the application stack.
  • #16: When you are dealing with a disaster, it’s very likely that you will have to modify network settings as your system is failing over to another site. AWS offers several services and features that enable you to manage and modify network settings, such as Amazon Route 53, ELB, Amazon VPC, and DX.   Amazon Route 53 includes a number of global load balancing capabilities (which can be effective when you are dealing with DR scenarios such as DNS endpoint health checks) and the ability to failover between multiple endpoints and even static websites hosted in Amazon S3.
  • #17: ELB automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications by seamlessly providing the load-balancing capacity that is needed in response to incoming application traffic. Just as you can pre-allocate Elastic IP addresses, you can pre-allocate your load balancer so that its DNS name already known, which can simplify the execution of your DR plan.
  • #18: In the context of DR, you can use Amazon VPC to extend your existing network topology to the cloud. This can be especially appropriate when recovering enterprise applications that are typically on the internal network.
  • #19: AWS Direct Connect (DX) makes it easy to set up a dedicated network connection from your premises to AWS. In many cases, this can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections. For information on using AWS Direct Connect for high resiliency for critical workloads, see https://aws.amazon.com/directconnect/resiliency-recommendation/
  • #20: For your database needs, consider using these AWS services: Amazon RDS, Amazon DynamoDB, and Amazon Redshift.   You can use Amazon RDS either in the preparation phase for DR to hold your critical data in a database that is already running, or in the recovery phase to run your production database. When you want to look at multiple regions, Amazon RDS gives you the ability to snapshot data from one region to another, and also to have a read replica running in another region. Using Amazon RDS, you can share a manual DB snapshot or DB cluster snapshot. You can share a manual snapshot with up to 20 other AWS accounts. You can also share an unencrypted manual snapshot as public, which makes the snapshot available to all AWS accounts. Take care when sharing a snapshot as public so that none of your private information is included in any of your public snapshots. Amazon RDS Read Replicas for MySQL and MariaDB now support Multi-AZ deployments. Combining Read Replicas with Multi-AZ enables you to build a resilient disaster recovery strategy and simplify your database engine upgrade process.  Amazon RDS Read Replicas enable you to create one or more read-only copies of your database instance within the same AWS Region or in a different AWS Region. Updates made to the source database are then asynchronously copied to your Read Replicas. In addition to providing scalability for read-heavy workloads, Read Replicas can be promoted to become a standalone database instance when needed. 
  • #21: You can use Amazon DynamoDB in the preparation phase to copy data to DynamoDB in another region or to Amazon S3. During the recovery phase of DR, you can scale up seamlessly in a matter of minutes with a single click or API call. Global Tables builds on the DynamoDB global footprint to provide you with a fully managed, multi-region, and multi-master database that provides fast, local, read and write performance for massively scaled, global applications. Global Tables replicates your Amazon DynamoDB tables automatically across your choice of AWS regions. Global Tables eliminates the difficult work of replicating data between regions and resolving update conflicts, enabling you to focus on your application’s business logic. In addition, Global Tables enables your applications to stay highly available even in the unlikely event of isolation or degradation of an entire region.  
  • #22: AWS CloudFormation allows you to model your entire infrastructure in a text file. This template becomes the single source of truth for your infrastructure. This helps you to standardize infrastructure components used across your organization, enabling configuration compliance and faster troubleshooting. AWS CloudFormation provisions your resources in a safe, repeatable manner, allowing you to build and rebuild your infrastructure and applications, without having to perform manual actions or write custom scripts. AWS CloudFormation takes care of determining the right operations to perform when managing your stack, and rolls back changes automatically if errors are detected.
  • #23: You can use the AWS Elastic Beanstalk to upload an updated source bundle and deploy it to your AWS Elastic Beanstalk environment, or redeploy a previously uploaded version. You can deploy a previously uploaded version of your application to any of its environments.
  • #24: AWS OpsWorks is an application management service that makes it easy to deploy and operate applications of all types and sizes. You can define your environment as a series of layers, and configure each layer as a tier of your application. AWS OpsWorks has automatic host replacement, so in the event of an instance failure it will be automatically replaced. You can use AWS OpsWorks in the preparation phase to template your environment, and you can combine it with AWS CloudFormation in the recovery phase. You can quickly provision a new stack from the stored configuration that supports the defined RTO.
  • #26: In most traditional environments, data is backed up to tape and sent offsite regularly. If you use this method, it can take a long time to restore your system in the event of a disruption or disaster. Amazon S3 is an ideal destination for backup data that might be needed quickly to perform a restore. Transferring data to and from Amazon S3 is typically done through the network and is therefore accessible from any location. There are many commercial and open-source backup solutions that integrate with Amazon S3. For example: You can use AWS Snowball to transfer very large data sets by shipping storage devices directly to AWS. For longer-term data storage where retrieval times of several hours are adequate, there is Amazon Glacier, which has the same durability model as Amazon S3. Amazon Glacier and Amazon S3 can be used in conjunction to produce a tiered backup solution.
  • #27: AWS Storage Gateway connects an on-premises software appliance with cloud-based storage to provide seamless and highly secure integration between your on-premises IT environment and the AWS storage infrastructure. The service allows you to securely store data in the AWS cloud for scalable and cost-effective storage. Storage Gateway supports industry-standard storage protocols that work with your existing applications while securely storing all of your data encrypted in Amazon S3 or Amazon Glacier.   With AWS Storage Gateway, you get an extension of AWS management services locally; the service is also integrated with Amazon CloudWatch, AWS CloudTrail, AWS KMS, AWS IAM, and etc.   AWS Storage Gateway supports three storage interfaces: file, volume, and tape. Each gateway you have can provide one type of interface.   The file gateway enables you to store and retrieve objects in Amazon S3 using the NFS and SMB file protocols. Objects written through file gateway can be directly accessed in S3.   The volume gateway provides block storage to your applications using the iSCSI protocol. Data on the volumes is stored in Amazon S3. To access your iSCSI volumes in AWS, you can take EBS snapshots which can be used to create EBS volumes.   The tape gateway provides your backup application with an iSCSI virtual tape library (VTL) interface, consisting of a virtual media changer, virtual tape drives, and virtual tapes. Virtual tape data is stored in Amazon S3 or can be archived to Amazon Glacier. To back up your on-premises data to the AWS cloud, you can choose between two common approaches: Writing backup data directly to Amazon S3 by making API calls to the AWS service. Writing or retrieving backup data through secure HTTP PUT and GET requests directly across the Internet. Here, the endpoint itself makes a direct connection with Amazon S3 to write data and retrieve data. Gateway-Virtual Tape Library (VTL) You can have a limitless collection of virtual tapes. Each virtual tape can be stored in a virtual tape library backed by Amazon S3 or a virtual tape shelf backed by Amazon Glacier. Gateway-Cached Volumes You can store your primary data in Amazon S3 and retain your frequently accessed data locally. Gateway-cached volumes provide substantial cost savings on primary storage, minimize the need to scale your storage on-premises, and retain low-latency access to your frequently accessed data. Gateway-Stored Volumes If you need low-latency access to your entire data set, you can configure your on-premises data gateway to store your primary data locally and asynchronously back up point-in-time snapshots of this data to Amazon S3. AWS Storage Gateway Hardware Appliance The AWS Storage Gateway Hardware Appliance is a hardware appliance that provides AWS Storage Gateway software that is preinstalled on a third-party server that can be installed on-premises. AWS Storage Gateway Hardware Appliance can be managed from the Hardware page on the AWS Management Console. https://docs.aws.amazon.com/storagegateway/latest/userguide/HardwareAppliance.html
  • #28: In addition to NFS v3 and v4.1 protocol, the AWS Storage Gateway service added the Server Message Block (SMB) protocol to File Gateway, enabling file-based applications developed for Microsoft Windows to easily store and access objects in Amazon Simple Storage Service (S3). For more information, see: https://aws.amazon.com/about-aws/whats-new/2018/06/aws-storage-gateway-adds-smb-support-to-store-objects-in-amazon-s3/
  • #29: After you've installed the AWS Storage Gateway software appliance—the virtual machine (VM)—on a host in your data center and activated it, you can create gateway storage volumes and map them to on-premises direct-attached storage (DAS) or storage area network (SAN) disks. You can start with either new disks or disks already holding data. You can then mount these storage volumes to your on-premises application servers as iSCSI devices. As your on-premises applications write data to and read data from a gateway's storage volume, this data is stored and retrieved from the volume's assigned disk. To prepare data for upload to Amazon S3, your gateway also stores incoming data in a staging area, referred to as an upload buffer. You can use on-premises DAS or SAN disks for working storage. Your gateway uploads data from the upload buffer over an encrypted Secure Sockets Layer (SSL) connection to the AWS Storage Gateway service running in the AWS cloud. The service then stores the data encrypted in Amazon S3. You can take incremental backups, called snapshots, of your storage volumes. The gateway stores these snapshots in Amazon S3 as Amazon EBS snapshots. When you take a new snapshot, only the data that has changed since your last snapshot is stored. You can initiate snapshots on a scheduled or one-time basis. When you delete a snapshot, only the data not needed for any other snapshot is removed. You can restore an Amazon EBS snapshot to an on-premises gateway storage volume if you need to recover a backup of your data. You can also use the snapshot as a starting point for a new Amazon EBS volume, which you can then attach to an Amazon Elastic Compute Cloud (Amazon EC2) instance.
  • #30: For gateway-stored volumes, your volume data is stored on-premises. In this case, snapshots provide durable, off-site backups in Amazon S3. For example, if a local disk allocated as a storage volume crashes, you can provision a new local disk and restore a snapshot to it during the volume creation process. (For more information on this approach, see Adding a Storage Volume at http://docs.aws.amazon.com/storagegateway/latest/userguide/ApplicationStorageVolumesStored-Adding.html). After you initiate a snapshot restore to a gateway-stored volume, snapshot data is downloaded in the background. This functionality means that after you create a volume from a snapshot, there is no need to wait for all of the data to transfer from Amazon S3 to your volume before your application can start accessing the volume and all of its data. If your application accesses a piece of data that has not yet been loaded, the gateway immediately downloads the requested data from Amazon S3. The gateway then continues loading the rest of the volume's data in the background.
  • #33: This pattern is relatively inexpensive to implement. In the preparation phase of DR, it is important to consider the use of services and features that support data migration and durable storage, because they enable you to restore backed-up, critical data to AWS when disaster strikes. For some of the scenarios that involve either a scaled-down or a fully scaled deployment of your system in AWS, compute resources will be required as well.   When reacting to a disaster, it is essential to either quickly commission compute resources to run your system in AWS or to orchestrate the failover to already running resources in AWS. The essential infrastructure pieces include DNS, networking features, and various Amazon EC2 features. In the preparation phase, in which you need to have your regularly changing data replicated to the pilot light, the small core around which the full environment will be started in the recovery phase. Your less frequently updated data, such as operating systems and applications, can be periodically updated and stored as AMIs.
  • #37: Low capacity standby is like the next level of Pilot Light. The term warm standby is used to describe a DR scenario in which a scaled-down version of a fully functional environment is always running in the cloud. A warm standby solution extends the pilot light elements and preparation. It further decreases the recovery time because some services are always running. By identifying your business-critical systems, you can fully duplicate these systems on AWS and have them always on.   These servers can be running on a minimum-sized fleet of Amazon EC2 instances on the smallest sizes possible. This solution is not scaled to take a full production load, but it is fully functional. It can be used for non-production work, such as testing, quality assurance, and internal use.   In a disaster, the system is scaled up quickly to handle the production load. In AWS, this can be done by adding more instances to the load balancer and by resizing the small capacity servers to run on larger Amazon EC2 instance types. As stated in the preceding section, horizontal scaling is preferred over vertical scaling.   In the diagram above there are two systems running: the main system and a low-capacity system running on AWS. Use Amazon Route 53 to distribute requests between the main system and the cloud system.  
  • #38: If the primary environment is unavailable, Amazon Route 53 switches over to the secondary system, which is designed to automatically scale its capacity up in the event of a failover from the primary system.
  • #39: This pattern is more expensive because active systems are running.
  • #41: The next level of disaster recovery is to have a fully functional system running in AWS at the same time as the on-premises systems.   A multi-site solution runs in AWS as well as on your existing on-site infrastructure, in an active-active configuration. The data replication method that you employ will be determined by the recovery point that you choose.   You can use a DNS service that supports weighted routing, such as Amazon Route 53, to route production traffic to different sites that deliver the same application or service. A proportion of traffic will go to your infrastructure in AWS, and the remainder will go to your on-site infrastructure.   In an on-site disaster situation, you can adjust the DNS weighting and send all traffic to the AWS servers. The capacity of the AWS service can be rapidly increased to handle the full production load. You can use Amazon EC2 Auto Scaling to automate this process. You might need some application logic to detect the failure of the primary database services and cut over to the parallel database services running in AWS.   The cost of this scenario is determined by how much production traffic is handled by AWS during normal operation. In the recovery phase, you pay only for what you use for the duration that the DR environment is required at full scale. You can further reduce cost by purchasing Amazon EC2 Reserved Instances for your “always on” AWS servers.
  • #42: This pattern potentially has the least downtime of all. It does have more costs associated with it, because more systems are running.
  • #43: Applications can be placed on a spectrum of complexity. Business continuity ensures that critical business functions continue to operate or recover quickly despite serious disasters.   The next slides outline four DR scenarios that highlight the use of AWS and compare AWS with traditional DR methods (sorted from highest to lowest RTO/RPO), as follows: Backup and Restore Pilot Light Fully Working Low-Capacity Standby Multi-Site Active-Active   The figure above shows a spectrum for the four scenarios, arranged by how quickly a system can be available to users after a DR event.   AWS enables you to cost-effectively operate each of these DR strategies. It’s important to note that these are just examples of possible approaches, and variations and combinations of these are possible. If your application is already running on AWS, then multiple regions can be employed and the same DR strategies will still apply.
  • #44: Start simple and work your way up. Backups in AWS are a first step. Incrementally improve RTO/RPO as a continuous effort.   Check for any software licensing issues.   Exercise your DR solution Practice "Game Day" exercises. These exercises test critical systems going offline or even entire regions. What if an entire fleet were to crash? Ensure that backups, snapshots, AMIs, etc. are working. Monitor your monitoring system.