SlideShare a Scribd company logo
Fault Tolerance in Amazon Web Service (AWS)
Caner Kaya
Computers and System Engineering
Tallinn University of Technology
Tallinn, Estonia
canerkaya89@gmail.com
Abstract— The cloud computing enable information
technologies solutions by using the visual machines to
provide resource-sharing and using on demand basis; so
within this complex, this area is becoming more attractive
for researching. Upon the rapid development of these
technologies, the fault tolerance of cloud computing has
become one of the most important topic for information
technologies. This requirement has become forefront since,
this system needs reliability and must be ready all the time.
This case-study, review the techniques that protect the
cloud computing and user systems from process fault. One
of the indications is as shown below, that, the cloud
computing is prone to create faults. The main goals of the
fault tolerance are to protect financial loses, to achieve the
restoration of the system. The case study has review the
scenario that the fault, repetitions could be solved by
checkpoints and back-ups. The Amazon AWS is shown
asan example for the fault-tolerance.
Keywords- Cloud Computing; Fault Tolerance;
Dependability ; Availability ; Redundancy; Human
Factor: Replication ;Amazon Web Services.
III... INTRODUCTION
That processing and storage technologies have developed
rapidly and Internet has been successful made computing
resources more affordable, more powerful and easier to access
compared to the past. Thanks to this technological
improvement, a new computing model, which is called cloud
computing, could be developed. In cloud computing, resources
such as CPU and storage are presented to users as general
services to be leased through internet or upon demand. In the
environment of cloud computing, service provider has two
conventional functions: infrastructure providers regulate cloud
platforms and rent resources within the frame of a user-based
price tariff and service providers lease resources from one or
more infrastructure providers to provide service for end users.
Over the recent years, the development of cloud computing
altered Information Technology Industry (IT) considerably,
thus major companies like Google, Amazon and Microsoft
cloud platforms seek the ability to provide more powerful and
reliable cloud platforms with lower costs, and business
enterprises want to restructure their existing business models
to benefit from such a new phenomenon. In fact, as it can be
seen below, many fascinating properties presented by cloud
computing attract business owners.
 No up-front investment: A pricing model called pay-
as-you-go is used by cloud computing. In order to
obtain interest from cloud computing, no
infrastructure investment is required from the service
provider. The method is simply leasing needed
resources form the cloud and paying only for the
usage. Lower operating cost: In a cloud
environment, it is possible to designate resources
more than one time based on the demand. Thus, no
provision capacities are required for the peak load
anymore. Thus, in case of low demand, resources
can be released and operating costs can be lowered.
 Highly scalable: Large amount of resources is pulled
from data centers provided by infrastructure
providers – this gives straightforward and easy access
to the resources. The service is expandable to large
scales – it allows to handle rapid increase in service
demands (flash-crowd effect etc.) This model is often
referred as surge computing [1]
 Easy access: Clouds normally host web-based
services. Thus, more than one device with an Internet
connection can easily access these services. Even
smart phones and PDAs are suitable for providing
access as well as desktop and laptop computers.
Reducing business risks and maintenance expenses:
As infrastructures for the service are outsourced to
the clouds, infrastructure providers take over business
risks like hardware failure and with more expertise
and equipment, service providers are more likely to
remove the risks. Moreover, expenses like hardware
maintenance and personnel training costs can be
reduced by the business owner. On the other hand,
while cloud computing provides many possibilities
for IT Industry, some difficulties unique to its
structure, which should be carefully handled, might
be encountered. In this study, we examine a cloud
computing research with key concepts, properties of
architecture, cutting edge practices and drawbacks of
the research.
IIIIII... TECHNOLOGY OVERVIEW
AAA... Definition of Cloud Computing
Today, cloud computing has become a wide-spread
phenomenon. Cloud computing can be defined as a new
practice to use information technology as a Internet-based
service. Perhaps, there may be different views regarding cloud
computing. In fact, every individual engaging in different
aspects of cloud computing may define it differently.
Following are different definitions of cloud computing by
major companies, organizations and individuals.
“National Institute of Standards and Technology” (NIST)
team has a more precise and technical definition: “Cloud
computing is a model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned and
released with minimal management effort or service provider
interaction.” [2]
Irving Wladawsky Berger (IBM) definition :
"When virtualizing applications to be used by people who care
nothing about computers or technology - as is mostly the case
with Clouds - the key thing we want to virtualize or hide from
the user is complexity. Most people want to deal with an
application or a service, not software. ... The more intelligent
we want [computers and computer applications] to be - that
is, intuitive, exhibiting common sense and not making us have
to constantly take care of them - the more smart software it
will take. But with cloud computing, our expectation is that all
that software will be virtualized or hidden from us and taken
care of by systems and/or professionals that are somewhere
else - out there in The Cloud." [3]
BBB... Cloud Computing Service Models
Figure 1: three main cloud service models
There are three main service models:
 SaaS – Software-as-a-Service
“National Institute of Standards and Technology”
(NIST) definition: “The capability provided to the
consumer is to use the provider’s applications
running on a cloud infrastructure . The applications
are accessible from various client devices through
either a thin client interface, such as a web browser
(e.g., web-based email), or a program interface. The
consumer does not manage or control the underlying
cloud infrastructure including network, servers,
operating systems, storage, or even individual
application capabilities, with the possible exception
of limited user specific application configuration
settings.” [5]
 IaaS – Infrastructure-as-a-Service:
IaaS is a model in which hardware, software, servers
and other elements of infrastructure are hosted by a
third-party provider in support of users. User
applications are also hosted by IaaS providers and
they take over tasks such as maintenance of the
system, back up and resiliency planning. Highly
scalable resources provided by IaaS platforms can be
adapted according to demand. Thus, IaaS is
appropriate for temporary works with sudden
changes or experimental works. Management task
automation, dynamic scaling, virtualization of
desktop and services based on policy are among other
properties of IaaS environment. Pricing of IaaS is
hourly, weekly or monthly based on the usage. Some
providers also charge customers based on the amount
of virtual machine space they use. Thanks to this pay-
as-you-go pricing model, in-house hardware and
software capital and costs are removed. On the other
hand, if IaaS environments of users aren’t supervised
carefully, there may be charges for unauthorized
services.
Amazon Web Services (AWS), Windows Azure,
Google Compute Engine, Rackspace Open Cloud,
and IBM SmartCloud Enterprise are among major
IaaS providers.
 PaaS – Platform-as-a-Service is a cloud computing
model delivering applications over the Internet. In a
PaaS model, a new application hardware and
software tools required for application development
are delivered to users by a cloud provider. Hardware
and software are hosted in the infrastructure of a
PaaS provider. Therefore, PaaS users don’t need to
have in house hardware and software and still can
develop and run applications.
Also, some variant services are available from three main
service models:
 DaaS – Desktop-as-a-Service has a multi-tenancy
architecture and the service is purchased on a
subscription basis. In the DaaS delivery model, the
service provider manages the back-end
responsibilities of data storage, backup, security and
upgrades. Typically, the customer's personal data is
copied to and from the virtual desktop during
logon/logoff and access to the desktop is device,
location and network independent.
 XaaS – Everthing-as-a-Service was first developed as
software-as-a-service (SaaS) then expanded in time
and includes services such as infrastructure-as-a-
service, platform-as-a-service, storage-as-a-service,
desktop-as-a-service, disaster recovery-as-a-service,
and even nascent operations like marketing-as-a-
service and healthcare-as-a-service.
 CaaS – Communication-as-a-Service, allows the
consumer to utilize Enterprise level VoIP, VPNs,
PBX and Unified Communications free from
expensive investment of purchase.
 MaaS – Monitoring-as-a-Service is online state
monitoring, continuously tracking definite situation
of applications, networks, systems, instances or any
element deployable within the cloud.
CCC... Definition of Virtualization
Intel definition is; “Virtualization abstracts compute
resources—typically as virtual machines (VMs)—with
associated storage and networking connectivity. The cloud
determines how those virtualized resources are allocated,
delivered, and presented. Virtualization is not necessary to
create a cloud environment, but it enables rapid scaling of
resources in a way that nonvirtualized environments find
hard to achieve. “[6]
DDD... Service Attributes
 On-demand self-services
Consumers can access cloud resources such as server or
storage by using websites or web services interface whenever
they need them. Automatically, they can order, customize, pay
without interaction with the cloud provider’s personnel. [7]
 Broad network access
Because cloud computing services are web-based technology,
a consumer can gain resources over the Internet using standard
methods such as heterogeneous OSs, or thick and thin
platform as laptops, and smart phones [8]. Therefore cloud
computing is device independent. [9]
 Resource pooling
Cloud computing supports multi-tenant resource usage. In
other words, a pool of provider’s computing resources is
shared among a large number of consumers. These virtual
resources allocate and reallocate relying on consumer demand
[2].
In fact, resource pooling in cloud computing based on
abstraction concept, this mean that the exact location of
resources is not stated (e.g. VMs, processing, memory,
storage, or connectivity). However, it may be able to specify
location at a higher level of abstraction as country name, state,
or datacenter. [2]
 Rapid elasticity
Resources can be supplied or released quickly and elastically
so that consumers capable to scale up/add or scale down
resources in an automatic or manual method, as well as in
various quantities and at different time [7]; much or less
electricity required from the power grid.
Figure 2 : Automated elasticity in AWS[32]
 Measured services
A metered system is used which makes consumers pay-per-
use billing, for example the amount of storage billed by the
day, the amount of processing power billing by the hour, as
well as network I/O bandwidth and number of transactions.
[10]
EEE... Deployment models
 Private cloud “internal cloud”
For the unique use of a specific association or business
that provides full authority over data, maintenance,
security, and quality of services, a private cloud is built.
A business might be built and managed its own private
cloud in-house or the operation may be fully done by third
party providers on the premises [12]. Thus, a pool of
computing resources across applications, departments or
business units can be shared by enterprises and projects
Unlike the public cloud, this model needs considerable
up-front costs, continuous maintenance, hardware,
software, datacenter, and internal expertise also [8].
Therefore, private clouds are considered as the secure
style of IaaS. [9]
 Public Cloud: Services and utilities are available for the
general public, and are used in pay-per-use consuming
model. Public clouds are run by the third party providers
and their physical infrastructure and resources are often
hosted off-premises from consumers. Such cloud reduces
consumer risk and cost throughout, providing elastic and
even provisional extension to enterprise infrastructure
[11]. Common examples of public clouds are Amazon
AWS, such as EC2, S3, Microsoft Azure, and Rackspace
Cloud Suite.
 Community cloud
A number of organizations share the cloud infrastructure
to serve a common function or purpose alternatively share
similar concerns such as security requirements, policies,
missions, regulatory compliance needs, and so on [12].
Additionally, a constituent or a third party can manage the
community cloud. [13]
 Hybrid cloud
Finally, combining multiple public and private cloud
models by standardized technology to distribute
applications across them is known as hybrid cloud. The
fact that hybrid clouds have the ability to spread
applications and data from one cloud to another [7]. This
model can be applied by many enterprises that use public
cloud for general computing while customers’ data is kept
within a private cloud [8]. Most popular and wide-spread
hybrid clouds are Amazon Virtual Private Cloud, Skylap
Virtual La band they offer hybrid cloud services.
 Virtual Private Cloud:
Virtual Private Cloud (VPC) can be defined as an
alternative solution for the restrictions of public and
private cloud. A VPC is a platform running on top of
public clouds, basically. The difference of VPS is that
VPC leverages virtual private network (VPN) technology
allowing customized topology designs and security
settings as firewall rules for service providers. That VPC
virtualizes underlying communication network as well as
servers and applications makes it a more holistic design.
Besides, thanks to the virtualized network layer, most
companies are provided with a smooth transition from
proprietary service infrastructure to a cloud based
infrastructure by VPC.
IIIIIIIII...HISTORY OF CLOUD COMPUTING
 1950s - During this decade, the word 'cloud' still
refers to a visible mass of condensed water vapor
floating in the atmosphere. The mainframe and time
sharing are born introducing the concept of shared,
centralized compute resources.
 1969 – The first working prototype of ARPANET is
launched, linking four geographically dispersed
computers over what is now known as the Internet.
 Late 1970s – The term ’client-server’ come into use
defining the computing model where clients access
data and applications from a central server over a
local area network.
 1995 – Pictures of clouds start showing up in network
diagrams denoting anything too complicated for non-
technical people to understand.
 1999 – Salesforce.com launches, becoming the first
company to make enterprise applications available
from a website.
 1999 – Google launches a fledgling search service
that returns impressive results.
 2003 – Web 2.0 is born, characterized by rich
multimedia, user-generated content and dynamic
interfaces.
 2006 - Amazon launches amazon web services
(AWS), giving users a new way to store data offsite
and rent compute cycles as a service.
 2007 – Netflix launches streaming service and binge-
watching is born.
 2008 – The concept of private cloud emerges, viewed
by enterprises as a more secure version of the named
‘public cloud’.
 2008 – Dropbox launch for a personal cloud storage
service.
 2009 – Browser-based cloud enterprise applications
like google Apps are introduced revolutionizing the
market for productivity applications.
 2010 – The open-source cloud launches like
OpenStack.
 2011 – Hybrid cloud emerges, combining public and
private cloud environments to the delight of trigger-
shy IT departments.
 2011 – Microsoft’s ‘to the cloud’ commercials
launch, attempting to explain how the cloud can
benefit mere mortals.
 2011 – Apple launches iCloud letting people
automatically can back-up all content on phone.
 2012 – Google launches google drive with free cloud
storage for digital packrats.
 2012 – In Turkey, people start to learn cloud
computing from a Turkish politician who was
Minister of Transport, Maritime and Communication
of Turkey (Binali Yildirim). He said that ‘Nowadays,
there emerged something called ’cloud system’.
Nowadays, everybody drops something in it and
takes from there, what required. I understand like
that, it might be a different thing. There is no
systematic ‘thing’, anymore. You stack everything in
it, everybody takes what he/she needs, however
nothing gets mixed up. You find whatever you want.
That information technology… If you ponder too
much, then you’ll get crazy! You’ll use it and benefit
from it for your work.”[34]
IIIVVV... DEPENDABILITY TREE IN CLOUD COMPUTING
Figure 3: Dependability Tree
AAA... Attributes
Data centers should provide those listed below for the
High-availability assurance of the cloud services;
 Failure-isolated zone: They are named as
Availability Zones in Amazon EC2. Users practicing
IaaS know the places of their application instances.
These geographical locations of cloud datacenter are
known as zones and they isolate failure in one zone
from the other. Thus, distributing the users’
application instances between the multiple zone can
increase the availability rate.
 Automatic scale-up: This function provides
automatical start and stop their instances depending
on load. But, the customer must determine how they
wish to scale according to the changing demand. This
function is useful as it provides the high-availability
of applications in case of running server process
failures. Also in financial terms ,this function is more
sensible.
 Configurable load balancer: dynamically
configuration of load balancer in distributing the
requests to a different zone assists fulfilling high-
availability. [14]
BBB... Threats
 Provider-inner faults: prevalent methods recovering
services from failure are redundancy, backup or stop
and restart services.
 Provider-user: Faulty nodes may result from network
congestions, hacker attack, browser collapse, time out
of the request, or malicious.
 User-across: sharing critical resources among users
throughout may cause chaos in a cloud computing
system due to unsafe access to the resources. [15]
 Datacenter hardware failures: processor, hard disk
drive, integrated circuit socket and memory [16].
 Datacenter software faults: lead to application
failures
 Crash faults: either stop functioning of the system
components or not returning to a right condition
might cause crash faults (e.g. hard disk crash)
 Byzantine faults: this malevolent fault leads the
system components behave arbitrary or maliciously
and causes production of incorrect and different
output values.
CCC... Means
Fault avoidance aims to prevent faults from occurring
in the operational system. It limits introduction of
faults during system construction. It includes fault
prevention, fault removal, and fault forecasting [3].
Removing any possible faults creeping into a system
before it goes operational is the function of fault
prevention. Fault removal attempts to find and
remove the causes of errors. Hence, fault avoidance
contributes to the improvement of the quality of both
the components and the systems. Fault forecasting
evaluates, estimates or ranks the system behavior
during fault activation.
VVV... FAULT TAXONOMY IN THE CLOUD COMPUTING
There are different types of faults in the cloud, however the
cloud is prone to most of them. Fault resolving mechanism
includes various fault tolerance techniques at either task
level or workflow level[17].
Figure 4: Fault Tolerance Taxonomy
AAA... Proactive Fault tolerance
Proactive fault tolerance is a function that foresees the
fault before they emerge, replace the components which
might cause it with the properly working ones and
preventing faults and errors before the need to recovery.
Preemptive migration, software rejuvenation etc. follow
this policy.
 Software Rejuvenation-the system is planned for
periodic reboots and with every reboot the system
starts with a new state. For applications which run for
a long period without stopping, and thus contributes
to the risk of failure or decreased the reliability and
performance, rejuvenation is a low-cost and proactive
technique for fault management. So that, a software
application is called software aging. Rejuvenation
solution scheduling periodic stopping the running of
a software and reboot it so as to cleaning and
refreshing its internal state. [18]
 Self-healing: It automatically controls failure of an
instance of an application running on multiple virtual
machines .
 Preemptive Migration : Preemptive migration is an
avoiding failure technique base on a feedback-loop
control mechanism where an application is constantly
monitored and preventative action is taken to avoid
application failure. However, not all failures can be
expected and covered by preemptive, so that
combination of pro/ post-active fault tolerance
technology will provide a sophisticated mechanism.
[19]
BBB... Reactive fault tolerance
Reactive fault tolerance techniques are used to reduce the
impact of failures on a system when the failures have
actually occurred. This technique provides robustness to a
system. Techniques based on this policy are
checkpoint/Restart and retry [25].
 User-Defined Exception Handling
The ability of allowing users to specify the defined
exceptions to handle task specific failures relying
on the context of the task, as well as, the ability to
define customs procedures to handle these errors
as well.
 Task Resubmission A new technique tries to re-
execute the same task whenever a failed task is
detected and is utilized during the workflow
execution phase. However, a task may resubmit
either to the same or to the different resources low
at run time without interrupting the workflow of
the system. [21]
 Retry: Repeatedly, tries the failed task on the same
cloud resources. [22]
 S-Guard: A fault tolerance technique is used for
distributed stream processing engines (SPEs) such
as email, online games, e-commerce, instance
messaging, and search. It is relied on a rollback
recovery that capable of checkpoint the state of
stream processing nodes recurrently and restart
failed nodes from last checkpoint. [23]
 Replication
The availability of replicated resources is a key
requirement for the forming of fault tolerant
systems in the cloud. Simply, replication means
several copies of an application with the same
input-set are executed simultaneously on
alternative sites [24]. For instance, proxy server
and the caching in web browser can be considered
as a form of replication. The main target of
replication is guaranteeing at least one replica to
complete the task correctly in case others fail [33].
More than one replication mechanisms such as
active, passive, or semi-active have been used in
the cloud computing.
 Job Migration Moves a job’s state from a particular
machine (node) to another when a node cannot be
completely executed or processed the tasks.
However tasks can be migrated by using HAproxy
tool and load balancing. [25, 26]
 Checkpoint / Restart (C/R) recovery: C/R is the
typical technique to tolerate failure on unreliable
systems. By saving a snapshot of running
application on a stable storage periodically so as to
restart the application from a latest checkpointing
image in case of a crash. [27]
Until today, researchers examined many
checkpoint strategy types, but only three
checkpoint fault tolerance strategies are widely
used among them in cloud computing.
- Full checkpoint: is a traditional mechanism
which saves the total state of the application
or the system periodically to a storage
platform. The drawback of this mechanism is
the time which is consumed to make a
snapshot of a whole system. And also the
consumed of a large storage to save the whole
system running states. [28]
- Incremental checkpoint: The first checkpoint
is full while the subsequent checkpoints only
save pages that have been modified. This
procedure produces a large recovery overhead
due to the system must recover from the
starting checkpoint. [29]
- Hybrid checkpoint: is a combination between
the full checkpoint and the incrementing
strategies. Therefore a balance between the
checkpoint overhead and the fault recovery
overhead should achieve.
VVVIII... FAILURE DETECTOR
Failure detector can be defined as an application or system
used in order to determine node failures or crashes. Failure
detectors can be determined as reliable or unreliable according
to yielded results. Correctness properties of failure detectors:
 Completeness: Process failure should be detected by
at least one other non-faulty process. Completeness
describes the capability of failure detector of
suspecting every failed process permanently..
 Accuracy: Failure predictions should be accurate and
contain no mistakes Less number of false positives
result in high accuracy. 100% accuracy is hardly
feasible – real life failure detectors guarantee the
completeness but have some faults in their accuracy
either practical or probabilistic. So, there always be
some trade-offs between completeness and accuracy.
 Speed: Prediction time of a failure should be as less
as possible.
 Scale: The load should be low and equally distributed
– each process in a group should have low overall
network load.
VVVIIIIII... FEATURES OF AMAZON WEB SERVICE OVERVIEW
Figure 5: AWS structure [31]
AAA... Definition of AWS features
The following are service definitions from [31] and other
sources, so, the text is used here like a citation:
 Amazon Elastic Compute Cloud (Amazon EC2) is a
web service that provides resizable compute capacity
in the cloud. You can bundle the operating system,
application software and associated configuration
settings into an Amazon Machine Image (AMI)
allows to bundle the operating system and application
software and associate configuration settings with
that. AMI then can be used to provide multiple
virtualized instances as well as decommission them
using simple web service calls to scale capacity up
and down quickly, as your capacity requirement
changes. There are On-Demand Instances in which
the instances can be paid by the hour or Reserved
Instances in which you pay a low, one-time payment
and receive a lower usage rate to run the instance
than with an On-Demand Instance or Spot Instances
where unused capacity could be bid to further reduce
cost of the product. Instances can be launched in one
or more geographical regions. Each region has
multiple Availability Zones. Availability Zones are
distinct locations that are engineered to be insulated
from failures in other Availability Zones and provide
inexpensive, low latency network connectivity to
other Availability Zones in the same Region. [31]
 Amazon CloudWatch is a monitoring service for
AWS cloud resources and the applications you run on
AWS. Amazon CloudWatch is a service which can be
used to collect and track metrics, collect and monitor
log files, set alarms, and automatically react to
changes in your AWS resources. One can monitor
AWS resources such as Amazon EC2 instances,
Amazon DynamoDB tables, and Amazon RDS DB
instances, as well as custom metrics generated by
your applications and services, and any log files your
applications generate with Amazon CloudWatch.
Amazon CloudWatch can be used to gain system-
wide visibility into resource utilization, application
performance, and operational health. One can use
these insights to react with the application and to
keep its smooth run. [31]
 Amazon Virtual Private Cloud (Amazon VPC) Within
your logically isolated network, Amazon VPC
provides complete control over your virtual
networking environment, including selection of your
own IP address range, creation of subnets, and
configuration of route tables and network gateways.
[31]
 Amazon Relational Database Service (Amazon RDS)
provides an easy way to setup, operate and scale a
relational database in the cloud. A DB Instance can
be launched and access to a full-featured MySQL
database is provided. One should not worry about
common database administration tasks like backups,
patch management etc. – the service includes these
features[31].
 Amazon Simple Queue Service (Amazon SQS)
computers and other components of the system can
use this service is a reliable, highly scalable, hosted
distributed queue to store messages[31].
 Amazon Simple Notifications Service (Amazon SNS)
provides a simple way to notify applications or
people from the cloud by creating Topics and using a
publish-subscribe protocol[31].
BBB... AWS specific tactics for best practice implementing:
1. Failover carefully using Elastic IPs: We can
dynamically and speedily re-map and failover to
another server group so that our traffic is routed
to the new servers using Elastic IP. When we
need to upgrade from old to new versions or in
case of some failures we can easily access this
service to use[31].
2. Use multiple Availability Zones: Availability
Zones are conceptually such as logical
datacenters. This concept insure our data with
high availability[31].
3. Use Amazon RDS Multi-AZ deployment
functionality to directly replicate database
updates across multiple Availability Zones..[31]
4. Maintain an Amazon Machine Image so that you
can restore and clone environments very easily
in a different Availability Zone; Maintain
multiple Database slaves across Availability
Zones. [31].
5. Utilize Amazon CloudWatch to get more
visibility and take appropriate actions in case of
performance degradation or hw failure. Setup an
Auto scaling group to maintain a fixed fleet size
so that it replaces bad condition or spoilt
Amazon EC2 instances by new ones. [31]
6. Use Amazon EBS and set up a time-based job
scheduler (cron) so that incremental snapshots
are automatically uploaded to Amazon S3 and
data is totally independent of your instances. [31]
7. Engage Amazon RDS and when you set the
waiting period for backups, that it can perform
backups itself. [31]
CCC... Specific details of AWS
As stated in [29,30,31,35,36] AWS includes the following
services and features and environments (following italics are
direct citations):
 Virtual computing environments, known as instances
Preconfigured templates for your instances, known as
Amazon Machine Images (AMIs), that package the
bits you need for your server (including the operating
system and additional software)
 Network firewalls built into Amazon VPC, and web
application firewall capabilities in AWS WAF let you
create private networks, and control access to your
instances and applications
 Encryption in transit with TLS across all services.
 Connectivity options that enable private, or
dedicated, connections from your office or on-
premises environment
 Ability to deploy DDoS mitigation technologies as
part of your auto-scaling or content delivery strategy.
 AWS Identity and Access Management (IAM) lets you
define individual user accounts with permissions
across AWS resources.
 Data encryption capabilities available in AWS
storage and database services, such as EBS, S3,
Glacier, Oracle RDS,SQL Server RDS, and Redshift
Flexible key management options, including AWS
Key Management Service, allowing you to choose
whether to have AWS manage the encryption keys or
enable you to keep complete control over your keys.
 Dedicated, hardware-based cryptographic key
storage using AWS CloudHSM, allowing you to
satisfy compliance requirements.
 A security assessment service, Amazon Inspector,
that automatically assesses applications for
vulnerabilities or deviations from best practices,
including impacted networks, OS, and attached
storage.
 Deployment tools to manage the creation and
decommissioning of AWS resources according to
organization standards.
 Inventory and configuration management tools,
including AWS Config, that identify AWS resources
and then track and manage changes to those
resources over time.
 Template definition and management tools, including
AWS CloudFormation to create standard,
preconfigured environments.
 Deep visibility into API calls through AWS
CloudTrail, including who, what, who, and from
where calls were made.
 AWS have a lot of complies: SOC 1/ISAE 3402, SOC
2, SOC 3, FISMA, DIACAP, and FedRAMP, PCI
DSS Level 1, ISO 9001, ISO 27001, ISO 27018
 Log aggregation options, streamlining investigations
and compliance reporting.
 Alert notifications through Amazon CloudWatch
when specific events occur or thresholds are
exceeded.
 There are several type of purchasing options
according on customers required; On-Demand
Instances, Reserved Instances, Spot Instances,
Dedicated Hosts.
 AWS Multi-Factor Authentication for privileged
accounts, including options for hardware-based
authenticators.
 AWS Directory Service allows you to integrate and
federate with corporate directories to reduce
administrative overhead and improve end-user
experience.
 AWS Direct Connect, you can establish a dedicated
network connection between AWS and your
datacenter, office, or collocation environment. In
many cases, this can provide both lower costs and a
higher level of service than Internet-based
connections.
 Amazon S3 and Amazon Glacier automatically
replicate data across multiple data centers and is
designed to deliver 99.999999999% durability.
 Various configurations of CPU, memory, storage,
and networking capacity for your instances, known
as instance types.
 Secure login information for your instances using key
pairs (AWS stores the public key, and you store the
private key in a secure place).
 Storage volumes for temporary data that's deleted
when you stop or terminate your instance, known as
instance store volumes.
 Persistent storage volumes for your data using
Amazon Elastic Block Store (Amazon EBS), known as
Amazon EBS volumes.
 Each AZ has independent infrastructure (power,
cooling, network and security) so they are isolated
with other others. therefore, failure behind one AZ
will not affect the others.
 Multiple physical locations for your resources, such
as instances and Amazon EBS volumes, known as
regions and Availability Zones.
Figure 6: Availability Zones in AWS [36]
 A firewall that enables you to specify the protocols,
ports, and source IP ranges that can reach your
instances using security groups.
 Static IP addresses for dynamic cloud computing,
known as Elastic IP addresses.
 Metadata, known as tags, that you can create and
assign to your Amazon EC2 resources.
 Virtual networks you can create that are logically
isolated from the rest of the AWS cloud, and that you
can optionally connect to your own network, known
as virtual private clouds (VPCs).
CONCLUSION
Consequently, many properties like as cost-effective of
infrastructure resources, managing infrastructure, availability,
and scalability are provided to small and medium scale
business enterprises by cloud computing. In technical terms,
cloud services are trade elements and they don’t guarantee the
continuity of the applications of the customer. Indeed, what
they guarantee is the availability of infrastructure and
components offered to the customer. Consequently, to ensure
the continuity of customer’s system, fault tolerance should be
deployed in the cloud. Analyzing these FT methods and
understanding their restrictions are our eventual targets in
order to build a FT method to manage all fault types in
different aspects. And in this study, we examined amazon web
service properties and indicated the way they tolerate faults.
REFERENCES
1] M. Armbrust , (2009) Above the clouds: a Berkeley view of cloud
computing. UC Berkeley Technical Report [online]. Available from :
https://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-
28.pdf/.
2] P. Mell , T.Grance , (2011), The NIST Definition of Cloud
Computing[online], Available from :
http://faculty.winthrop.edu/domanm/csci411/Handouts/NIST.pdf/.
3] J. Geelan , (2009), Twenty One Experts Define Cloud
Computing[online], Available from : http://virtualization.sys-
con.com/node/612375/.
4] Diversity Limited, (2011), Revolution Not Evolution How Cloud
Computing Different from Traditional IT and Why it Matters[online],
Available from :
http://userpages.umbc.edu/~dgorin1/451/cloud/Revolution_Not_Evoluti
on-Whitepaper.pdf/.
5] L. Youseff, M. Butrico, D. Da Silva ,Toward a Unified Ontology of
Cloud Computing[online], Available from:
https://storagemadeeasy.com/files/8f047da34a2d3a3528136ba8b59a465
d.pdf/.
6] Intel,(2013), Virtualization and Cloud Computing[online], Available
from :
http://www.intel.com/content/dam/www/public/us/en/documents/guides/
cloud-computing-virtualization-building-private-iaas-guide.pdf/.
7] L. Krutz , R.Vines, (2010), Cloud Security: a Comprehensive Guide to
Secure Cloud Computing[online],Available from:
https://drive.google.com/file/d/0B-
W0l4MahMzLVVp0UVgyNnh5bnM/.
8] M. Williams , (2010), A Quick Start Guide to Cloud Computing
[online], Available from:
https://23510310jarinfo.files.wordpress.com/2011/09/a-quick-start-
guide-to-cloud-computing.pdf/.
9] C. Barnatt., (2010), A Brief Guide to Cloud Computing [online],
Available from:
http://www.explainingcomputers.com/cloud/BGT_Cloud_Computing_E
xtract.pdf/.
10]B. Furht , Escalante A., (2011), Handbook of Cloud Computing
[online], Available from:
https://studytm.files.wordpress.com/2014/03/hand-book-of-cloud-
computing.pdf/.
11]Sun Microsystems, (2009), Introduction to Cloud Computing
Architecture[online], Available from:
https://java.net/jira/secure/attachment/29265/CloudComputing.pdf/.
12]Dialogic Corporation,(2010) Introduction to Cloud Computing[online],
Available from :
http://www.dialogic.com/~/media/products/docs/whitepapers/12023-
cloud-computing-wp.pdf/.
13]B. Sosinsky , (2011), Cloud Computing Bible[online], Available from:
http://cs.ecust.edu.cn/~yhq/course_files/cloud/Cloud%20Computing%20
Bible.pdf/.
14]F. Machida, E. Andrade, D. S. Kim, K. S. Trivedi, (2011), Candy:
Component-based Availability Modeling Framework for Cloud Service
Management Using SysML[online],Available from :
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6076779/.
15]C. Gong ,J. Liu ,Q. Zhang ,H. Chen ,Z. Gong , (2010), The
Characteristics of Cloud Computing [online], Available from :
http://www.postdm.post.ir/_ITCenter/Documents/TheCharacteristicsofC
loudComputing_20140722_154207.pdf/.
16]A. Amal Ganesh ,M. Sandhya,S. Shankar , (2014), “Study on Fault
Tolerance methods in Cloud Computing [online], Available from
:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6779432/.
17]Z. Amin,N. Sethi,H. Singh,(2015) Review on Fault Tolerance
Techniques in Cloud Computing[online], Available from:
http://research.ijcaonline.org/volume116/number18/pxc3902768.pdf
18]F. Machida , D.S. Kim , J. S. Park., K. S. Trivedi, (2008), Toward
Optimal Virtual Machine Placement and Rejuvenation Scheduling in a
Virtualized Data Center[online], Available from :
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5355515/.
19]C. Engelmann ,G. R. Vallee , T. Naughton ,S. L. Scott , (2009),
Proactive Fault Tolerance Using Preemptive Migration [online],
Available from: http://www.christian-
engelmann.info/publications/engelmann09proactive.pdf /.
20]K. Plankensteiner , R. Prodan, T. Fahringer , (2009), A New Fault
Tolerance Heuristic for Scientific Workflows in Highly Distributed
Environments Based on Resubmission Impact [online], Available from:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5380852/.
21]A. Bala ,I. Chana , (2012), Fault Tolerance- Challenges, Techniques and
Implementation in Cloud Computing[online], Available from :
https://www.researchgate.net/publication/266525159_Fault_Tolerance-
Challenges_Techniques_and_Implementation_in_Cloud_Computing/.
22]Y. Kwon ,M. Balazinska , A. Greenberg , (2008), “Fault Tolerant
Stream Processing using a Distributed, Replicated File System[online],
Available From: http://goo.gl/vzhK6l/.
23]S. M. Ghoreyshi, (2013), Energy-Efficient Resource Management of
Cloud Datacenters Under Fault Tolerance Constraints[online], Available
from:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6604493/.
24]S. Lin, M. Huang ,K. Lai ,K. Huang , (2008), Design and
Implementation of Job Migration Policies in P2P Grid Systems [online],
Available from :
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4780655/.
25]P. K. Patra ,H. Singh , G. Singh , (2013), Fault Tolerance Techniques
and Comparative Implementation in Cloud Computing[online], Availble
from:https://www.researchgate.net/publication/258789870_Fault_Tolera
nce_Techniques_and_Comparative_Implementation_in_Cloud_Computi
ng/.
26]Y. M. Essa,(2016), A Survey of Cloud Computing Fault Tolerance:
Techniques and Implementation [online],, Available from:
http://www.ijcaonline.org/research/volume138/number13/essa-2016-
ijca-909055.pdf/.
27]S. Gokuldev ,M. Valarmathi , (2013), Fault Tolerant System for
Computational and Service Grid [online], Available from:
http://www.ijeit.com/vol%202/Issue%2010/IJEIT1412201304_47.pdf/.
28]R.Garg , and A. K. Singh., (2011), Fault Tolerance in Grid Computing:
State of the Art and Open Issues [online], Available from:
http://airccse.org/journal/ijcses/papers/0211cses07.pdf/.
29]Amazon, (2015),Amazon Web Services: Overview of Security
Processes[online], Available from:
https://d0.awsstatic.com/whitepapers/aws-security-whitepaper.pdf/.
30]Amazon, (2015),Amazon Web Services: Overview of Amazon Web
Services [online], Available from:
https://d0.awsstatic.com/whitepapers/aws-overview.pdf/.
31]Amazon, (2011), Architecting for the Cloud: Best Practices [online],
Available from:
https://media.amazonwebservices.com/AWS_Cloud_Best_Practices.pdf
32]L. Youseff, M. Butrico, D. Da Silva ,Toward a Unified Ontology of
Cloud Computing[online], Available from:
https://storagemadeeasy.com/files/8f047da34a2d3a3528136ba8b59a465
d.pdf/.
33]1] P. Latchoumy, P. S. A. Khader , (2011), Survey on Fault Tolerance in
Grid Computing[online], Available from:
http://airccse.org/journal/ijcses/papers/1111ijcses07.pdf/.
34]Minister of Transport, Maritime and Communication of Turkey (Binali
Yildirim speech about cloud (2012),’the cloud’, Available:
https://www.youtube.com/watch?v=10UZwW6563E/
35]Amazon documents website, http://docs.aws.amazon.com/.
36]Amazon AWS website, http://aws.amazon.com/,

More Related Content

PDF
Final Year IEEE Project 2013-2014 - Cloud Computing Project Title and Abstract
PDF
Cloud Computing for Universities Graduation Project
PPTX
Cloud computing
DOCX
Cloud computing seminar report
PDF
Cloud Computing
PPTX
Cloud computing 1
DOCX
Group seminar report on cloud computing
PDF
Challenges and solutions in Cloud computing for the Future Internet
Final Year IEEE Project 2013-2014 - Cloud Computing Project Title and Abstract
Cloud Computing for Universities Graduation Project
Cloud computing
Cloud computing seminar report
Cloud Computing
Cloud computing 1
Group seminar report on cloud computing
Challenges and solutions in Cloud computing for the Future Internet

What's hot (20)

DOC
cloud computing documentation
PPTX
Cloud computing – An Overview
DOCX
Cloud Computing By Faisal Shehzad
DOCX
Cloud computing
PDF
www.iosrjournals.org 57 | Page Latest development of cloud computing technolo...
PDF
Cloud Computing: Overview & Utility
DOCX
The seminar report on cloud computing
PDF
Application of Virtualisation and CloudComputing for Development and Runtime ...
DOCX
Cloud computing report
PPSX
Vendor classification & rating
DOC
Cloud Computing Documentation Report
DOCX
Cloud computing for enterprise
PDF
Cloud computing Report
PPTX
Cloud computing
PPTX
Cloud computing
DOCX
Public cloud: A Review
DOC
Cloud Computing Documentation Report
PDF
Ijirsm choudhari-priyanka-backup-and-restore-in-smartphone-using-mobile-cloud...
PPTX
Cloud Computing in Business and facts
PDF
Basics of Cloud Computing
cloud computing documentation
Cloud computing – An Overview
Cloud Computing By Faisal Shehzad
Cloud computing
www.iosrjournals.org 57 | Page Latest development of cloud computing technolo...
Cloud Computing: Overview & Utility
The seminar report on cloud computing
Application of Virtualisation and CloudComputing for Development and Runtime ...
Cloud computing report
Vendor classification & rating
Cloud Computing Documentation Report
Cloud computing for enterprise
Cloud computing Report
Cloud computing
Cloud computing
Public cloud: A Review
Cloud Computing Documentation Report
Ijirsm choudhari-priyanka-backup-and-restore-in-smartphone-using-mobile-cloud...
Cloud Computing in Business and facts
Basics of Cloud Computing
Ad

Similar to Fault Tolerance in AWS Distributed Cloud Computing (20)

PDF
PDF
G017324043
PDF
Data Security Model Enhancement In Cloud Environment
PDF
A Short Appraisal on Cloud Computing
PDF
02_Cloud-Intro.pdf cloud introduction introduction
PDF
Cloud computing course and tutorials
PDF
Cloud Computing Essays
DOCX
Cloud Computing Security Issues in Infrastructure as a Service” report
PDF
Cloud computing writeup
DOC
Cloud computing (3)
DOCX
Introduction to cloud computing
PPTX
Cloud computing
PDF
AI for cloud computing A strategic guide.pdf
DOC
Cloud computing
PDF
A STUDY OF THE ISSUES AND SECURITY OF CLOUD COMPUTING
DOCX
Cloud Computing
PDF
A017620123
PDF
Design & Development of a Trustworthy and Secure Billing System for Cloud Com...
PDF
Best cloud computing training institute in noida
PPTX
Cloud Computing FUNDAMENTAL TUTORIAL FOR YOU
G017324043
Data Security Model Enhancement In Cloud Environment
A Short Appraisal on Cloud Computing
02_Cloud-Intro.pdf cloud introduction introduction
Cloud computing course and tutorials
Cloud Computing Essays
Cloud Computing Security Issues in Infrastructure as a Service” report
Cloud computing writeup
Cloud computing (3)
Introduction to cloud computing
Cloud computing
AI for cloud computing A strategic guide.pdf
Cloud computing
A STUDY OF THE ISSUES AND SECURITY OF CLOUD COMPUTING
Cloud Computing
A017620123
Design & Development of a Trustworthy and Secure Billing System for Cloud Com...
Best cloud computing training institute in noida
Cloud Computing FUNDAMENTAL TUTORIAL FOR YOU
Ad

Recently uploaded (20)

PPTX
PPT_M4.3_WORKING WITH SLIDES APPLIED.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Centralized Business Email Management_ How Admin Controls Boost Efficiency & ...
PPTX
ppt for upby gurvinder singh padamload.pptx
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PDF
The Internet -By the Numbers, Sri Lanka Edition
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PDF
www-codemechsolutions-com-whatwedo-cloud-application-migration-services.pdf
PPTX
ENCOR_Chapter_11 - ‌BGP implementation.pptx
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
innovation process that make everything different.pptx
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
DOCX
Unit-3 cyber security network security of internet system
PDF
Behind the Smile Unmasking Ken Childs and the Quiet Trail of Deceit Left in H...
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPT_M4.3_WORKING WITH SLIDES APPLIED.pptx
Paper PDF World Game (s) Great Redesign.pdf
Module 1 - Cyber Law and Ethics 101.pptx
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Centralized Business Email Management_ How Admin Controls Boost Efficiency & ...
ppt for upby gurvinder singh padamload.pptx
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Tenda Login Guide: Access Your Router in 5 Easy Steps
The Internet -By the Numbers, Sri Lanka Edition
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
www-codemechsolutions-com-whatwedo-cloud-application-migration-services.pdf
ENCOR_Chapter_11 - ‌BGP implementation.pptx
Unit-1 introduction to cyber security discuss about how to secure a system
innovation process that make everything different.pptx
Decoding a Decade: 10 Years of Applied CTI Discipline
RPKI Status Update, presented by Makito Lay at IDNOG 10
Unit-3 cyber security network security of internet system
Behind the Smile Unmasking Ken Childs and the Quiet Trail of Deceit Left in H...
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
INTERNET------BASICS-------UPDATED PPT PRESENTATION

Fault Tolerance in AWS Distributed Cloud Computing

  • 1. Fault Tolerance in Amazon Web Service (AWS) Caner Kaya Computers and System Engineering Tallinn University of Technology Tallinn, Estonia [email protected] Abstract— The cloud computing enable information technologies solutions by using the visual machines to provide resource-sharing and using on demand basis; so within this complex, this area is becoming more attractive for researching. Upon the rapid development of these technologies, the fault tolerance of cloud computing has become one of the most important topic for information technologies. This requirement has become forefront since, this system needs reliability and must be ready all the time. This case-study, review the techniques that protect the cloud computing and user systems from process fault. One of the indications is as shown below, that, the cloud computing is prone to create faults. The main goals of the fault tolerance are to protect financial loses, to achieve the restoration of the system. The case study has review the scenario that the fault, repetitions could be solved by checkpoints and back-ups. The Amazon AWS is shown asan example for the fault-tolerance. Keywords- Cloud Computing; Fault Tolerance; Dependability ; Availability ; Redundancy; Human Factor: Replication ;Amazon Web Services. III... INTRODUCTION That processing and storage technologies have developed rapidly and Internet has been successful made computing resources more affordable, more powerful and easier to access compared to the past. Thanks to this technological improvement, a new computing model, which is called cloud computing, could be developed. In cloud computing, resources such as CPU and storage are presented to users as general services to be leased through internet or upon demand. In the environment of cloud computing, service provider has two conventional functions: infrastructure providers regulate cloud platforms and rent resources within the frame of a user-based price tariff and service providers lease resources from one or more infrastructure providers to provide service for end users. Over the recent years, the development of cloud computing altered Information Technology Industry (IT) considerably, thus major companies like Google, Amazon and Microsoft cloud platforms seek the ability to provide more powerful and reliable cloud platforms with lower costs, and business enterprises want to restructure their existing business models to benefit from such a new phenomenon. In fact, as it can be seen below, many fascinating properties presented by cloud computing attract business owners.  No up-front investment: A pricing model called pay- as-you-go is used by cloud computing. In order to obtain interest from cloud computing, no infrastructure investment is required from the service provider. The method is simply leasing needed resources form the cloud and paying only for the usage. Lower operating cost: In a cloud environment, it is possible to designate resources more than one time based on the demand. Thus, no provision capacities are required for the peak load anymore. Thus, in case of low demand, resources can be released and operating costs can be lowered.  Highly scalable: Large amount of resources is pulled from data centers provided by infrastructure providers – this gives straightforward and easy access to the resources. The service is expandable to large scales – it allows to handle rapid increase in service demands (flash-crowd effect etc.) This model is often referred as surge computing [1]  Easy access: Clouds normally host web-based services. Thus, more than one device with an Internet connection can easily access these services. Even smart phones and PDAs are suitable for providing access as well as desktop and laptop computers. Reducing business risks and maintenance expenses: As infrastructures for the service are outsourced to the clouds, infrastructure providers take over business risks like hardware failure and with more expertise and equipment, service providers are more likely to remove the risks. Moreover, expenses like hardware maintenance and personnel training costs can be reduced by the business owner. On the other hand, while cloud computing provides many possibilities for IT Industry, some difficulties unique to its structure, which should be carefully handled, might be encountered. In this study, we examine a cloud computing research with key concepts, properties of architecture, cutting edge practices and drawbacks of the research.
  • 2. IIIIII... TECHNOLOGY OVERVIEW AAA... Definition of Cloud Computing Today, cloud computing has become a wide-spread phenomenon. Cloud computing can be defined as a new practice to use information technology as a Internet-based service. Perhaps, there may be different views regarding cloud computing. In fact, every individual engaging in different aspects of cloud computing may define it differently. Following are different definitions of cloud computing by major companies, organizations and individuals. “National Institute of Standards and Technology” (NIST) team has a more precise and technical definition: “Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” [2] Irving Wladawsky Berger (IBM) definition : "When virtualizing applications to be used by people who care nothing about computers or technology - as is mostly the case with Clouds - the key thing we want to virtualize or hide from the user is complexity. Most people want to deal with an application or a service, not software. ... The more intelligent we want [computers and computer applications] to be - that is, intuitive, exhibiting common sense and not making us have to constantly take care of them - the more smart software it will take. But with cloud computing, our expectation is that all that software will be virtualized or hidden from us and taken care of by systems and/or professionals that are somewhere else - out there in The Cloud." [3] BBB... Cloud Computing Service Models Figure 1: three main cloud service models There are three main service models:  SaaS – Software-as-a-Service “National Institute of Standards and Technology” (NIST) definition: “The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure . The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings.” [5]  IaaS – Infrastructure-as-a-Service: IaaS is a model in which hardware, software, servers and other elements of infrastructure are hosted by a third-party provider in support of users. User applications are also hosted by IaaS providers and they take over tasks such as maintenance of the system, back up and resiliency planning. Highly scalable resources provided by IaaS platforms can be adapted according to demand. Thus, IaaS is appropriate for temporary works with sudden changes or experimental works. Management task automation, dynamic scaling, virtualization of desktop and services based on policy are among other properties of IaaS environment. Pricing of IaaS is hourly, weekly or monthly based on the usage. Some providers also charge customers based on the amount of virtual machine space they use. Thanks to this pay- as-you-go pricing model, in-house hardware and software capital and costs are removed. On the other hand, if IaaS environments of users aren’t supervised carefully, there may be charges for unauthorized services. Amazon Web Services (AWS), Windows Azure, Google Compute Engine, Rackspace Open Cloud, and IBM SmartCloud Enterprise are among major IaaS providers.  PaaS – Platform-as-a-Service is a cloud computing model delivering applications over the Internet. In a PaaS model, a new application hardware and software tools required for application development are delivered to users by a cloud provider. Hardware and software are hosted in the infrastructure of a PaaS provider. Therefore, PaaS users don’t need to have in house hardware and software and still can develop and run applications. Also, some variant services are available from three main service models:
  • 3.  DaaS – Desktop-as-a-Service has a multi-tenancy architecture and the service is purchased on a subscription basis. In the DaaS delivery model, the service provider manages the back-end responsibilities of data storage, backup, security and upgrades. Typically, the customer's personal data is copied to and from the virtual desktop during logon/logoff and access to the desktop is device, location and network independent.  XaaS – Everthing-as-a-Service was first developed as software-as-a-service (SaaS) then expanded in time and includes services such as infrastructure-as-a- service, platform-as-a-service, storage-as-a-service, desktop-as-a-service, disaster recovery-as-a-service, and even nascent operations like marketing-as-a- service and healthcare-as-a-service.  CaaS – Communication-as-a-Service, allows the consumer to utilize Enterprise level VoIP, VPNs, PBX and Unified Communications free from expensive investment of purchase.  MaaS – Monitoring-as-a-Service is online state monitoring, continuously tracking definite situation of applications, networks, systems, instances or any element deployable within the cloud. CCC... Definition of Virtualization Intel definition is; “Virtualization abstracts compute resources—typically as virtual machines (VMs)—with associated storage and networking connectivity. The cloud determines how those virtualized resources are allocated, delivered, and presented. Virtualization is not necessary to create a cloud environment, but it enables rapid scaling of resources in a way that nonvirtualized environments find hard to achieve. “[6] DDD... Service Attributes  On-demand self-services Consumers can access cloud resources such as server or storage by using websites or web services interface whenever they need them. Automatically, they can order, customize, pay without interaction with the cloud provider’s personnel. [7]  Broad network access Because cloud computing services are web-based technology, a consumer can gain resources over the Internet using standard methods such as heterogeneous OSs, or thick and thin platform as laptops, and smart phones [8]. Therefore cloud computing is device independent. [9]  Resource pooling Cloud computing supports multi-tenant resource usage. In other words, a pool of provider’s computing resources is shared among a large number of consumers. These virtual resources allocate and reallocate relying on consumer demand [2]. In fact, resource pooling in cloud computing based on abstraction concept, this mean that the exact location of resources is not stated (e.g. VMs, processing, memory, storage, or connectivity). However, it may be able to specify location at a higher level of abstraction as country name, state, or datacenter. [2]  Rapid elasticity Resources can be supplied or released quickly and elastically so that consumers capable to scale up/add or scale down resources in an automatic or manual method, as well as in various quantities and at different time [7]; much or less electricity required from the power grid. Figure 2 : Automated elasticity in AWS[32]  Measured services A metered system is used which makes consumers pay-per- use billing, for example the amount of storage billed by the day, the amount of processing power billing by the hour, as well as network I/O bandwidth and number of transactions. [10] EEE... Deployment models  Private cloud “internal cloud” For the unique use of a specific association or business that provides full authority over data, maintenance, security, and quality of services, a private cloud is built. A business might be built and managed its own private cloud in-house or the operation may be fully done by third party providers on the premises [12]. Thus, a pool of computing resources across applications, departments or business units can be shared by enterprises and projects Unlike the public cloud, this model needs considerable up-front costs, continuous maintenance, hardware,
  • 4. software, datacenter, and internal expertise also [8]. Therefore, private clouds are considered as the secure style of IaaS. [9]  Public Cloud: Services and utilities are available for the general public, and are used in pay-per-use consuming model. Public clouds are run by the third party providers and their physical infrastructure and resources are often hosted off-premises from consumers. Such cloud reduces consumer risk and cost throughout, providing elastic and even provisional extension to enterprise infrastructure [11]. Common examples of public clouds are Amazon AWS, such as EC2, S3, Microsoft Azure, and Rackspace Cloud Suite.  Community cloud A number of organizations share the cloud infrastructure to serve a common function or purpose alternatively share similar concerns such as security requirements, policies, missions, regulatory compliance needs, and so on [12]. Additionally, a constituent or a third party can manage the community cloud. [13]  Hybrid cloud Finally, combining multiple public and private cloud models by standardized technology to distribute applications across them is known as hybrid cloud. The fact that hybrid clouds have the ability to spread applications and data from one cloud to another [7]. This model can be applied by many enterprises that use public cloud for general computing while customers’ data is kept within a private cloud [8]. Most popular and wide-spread hybrid clouds are Amazon Virtual Private Cloud, Skylap Virtual La band they offer hybrid cloud services.  Virtual Private Cloud: Virtual Private Cloud (VPC) can be defined as an alternative solution for the restrictions of public and private cloud. A VPC is a platform running on top of public clouds, basically. The difference of VPS is that VPC leverages virtual private network (VPN) technology allowing customized topology designs and security settings as firewall rules for service providers. That VPC virtualizes underlying communication network as well as servers and applications makes it a more holistic design. Besides, thanks to the virtualized network layer, most companies are provided with a smooth transition from proprietary service infrastructure to a cloud based infrastructure by VPC. IIIIIIIII...HISTORY OF CLOUD COMPUTING  1950s - During this decade, the word 'cloud' still refers to a visible mass of condensed water vapor floating in the atmosphere. The mainframe and time sharing are born introducing the concept of shared, centralized compute resources.  1969 – The first working prototype of ARPANET is launched, linking four geographically dispersed computers over what is now known as the Internet.  Late 1970s – The term ’client-server’ come into use defining the computing model where clients access data and applications from a central server over a local area network.  1995 – Pictures of clouds start showing up in network diagrams denoting anything too complicated for non- technical people to understand.  1999 – Salesforce.com launches, becoming the first company to make enterprise applications available from a website.  1999 – Google launches a fledgling search service that returns impressive results.  2003 – Web 2.0 is born, characterized by rich multimedia, user-generated content and dynamic interfaces.  2006 - Amazon launches amazon web services (AWS), giving users a new way to store data offsite and rent compute cycles as a service.  2007 – Netflix launches streaming service and binge- watching is born.  2008 – The concept of private cloud emerges, viewed by enterprises as a more secure version of the named ‘public cloud’.  2008 – Dropbox launch for a personal cloud storage service.  2009 – Browser-based cloud enterprise applications like google Apps are introduced revolutionizing the market for productivity applications.  2010 – The open-source cloud launches like OpenStack.  2011 – Hybrid cloud emerges, combining public and private cloud environments to the delight of trigger- shy IT departments.  2011 – Microsoft’s ‘to the cloud’ commercials launch, attempting to explain how the cloud can benefit mere mortals.  2011 – Apple launches iCloud letting people automatically can back-up all content on phone.  2012 – Google launches google drive with free cloud storage for digital packrats.  2012 – In Turkey, people start to learn cloud computing from a Turkish politician who was Minister of Transport, Maritime and Communication of Turkey (Binali Yildirim). He said that ‘Nowadays, there emerged something called ’cloud system’. Nowadays, everybody drops something in it and takes from there, what required. I understand like that, it might be a different thing. There is no systematic ‘thing’, anymore. You stack everything in it, everybody takes what he/she needs, however nothing gets mixed up. You find whatever you want. That information technology… If you ponder too much, then you’ll get crazy! You’ll use it and benefit from it for your work.”[34]
  • 5. IIIVVV... DEPENDABILITY TREE IN CLOUD COMPUTING Figure 3: Dependability Tree AAA... Attributes Data centers should provide those listed below for the High-availability assurance of the cloud services;  Failure-isolated zone: They are named as Availability Zones in Amazon EC2. Users practicing IaaS know the places of their application instances. These geographical locations of cloud datacenter are known as zones and they isolate failure in one zone from the other. Thus, distributing the users’ application instances between the multiple zone can increase the availability rate.  Automatic scale-up: This function provides automatical start and stop their instances depending on load. But, the customer must determine how they wish to scale according to the changing demand. This function is useful as it provides the high-availability of applications in case of running server process failures. Also in financial terms ,this function is more sensible.  Configurable load balancer: dynamically configuration of load balancer in distributing the requests to a different zone assists fulfilling high- availability. [14] BBB... Threats  Provider-inner faults: prevalent methods recovering services from failure are redundancy, backup or stop and restart services.  Provider-user: Faulty nodes may result from network congestions, hacker attack, browser collapse, time out of the request, or malicious.  User-across: sharing critical resources among users throughout may cause chaos in a cloud computing system due to unsafe access to the resources. [15]  Datacenter hardware failures: processor, hard disk drive, integrated circuit socket and memory [16].  Datacenter software faults: lead to application failures  Crash faults: either stop functioning of the system components or not returning to a right condition might cause crash faults (e.g. hard disk crash)  Byzantine faults: this malevolent fault leads the system components behave arbitrary or maliciously and causes production of incorrect and different output values. CCC... Means Fault avoidance aims to prevent faults from occurring in the operational system. It limits introduction of faults during system construction. It includes fault prevention, fault removal, and fault forecasting [3]. Removing any possible faults creeping into a system before it goes operational is the function of fault prevention. Fault removal attempts to find and remove the causes of errors. Hence, fault avoidance contributes to the improvement of the quality of both the components and the systems. Fault forecasting evaluates, estimates or ranks the system behavior during fault activation. VVV... FAULT TAXONOMY IN THE CLOUD COMPUTING There are different types of faults in the cloud, however the cloud is prone to most of them. Fault resolving mechanism includes various fault tolerance techniques at either task level or workflow level[17]. Figure 4: Fault Tolerance Taxonomy
  • 6. AAA... Proactive Fault tolerance Proactive fault tolerance is a function that foresees the fault before they emerge, replace the components which might cause it with the properly working ones and preventing faults and errors before the need to recovery. Preemptive migration, software rejuvenation etc. follow this policy.  Software Rejuvenation-the system is planned for periodic reboots and with every reboot the system starts with a new state. For applications which run for a long period without stopping, and thus contributes to the risk of failure or decreased the reliability and performance, rejuvenation is a low-cost and proactive technique for fault management. So that, a software application is called software aging. Rejuvenation solution scheduling periodic stopping the running of a software and reboot it so as to cleaning and refreshing its internal state. [18]  Self-healing: It automatically controls failure of an instance of an application running on multiple virtual machines .  Preemptive Migration : Preemptive migration is an avoiding failure technique base on a feedback-loop control mechanism where an application is constantly monitored and preventative action is taken to avoid application failure. However, not all failures can be expected and covered by preemptive, so that combination of pro/ post-active fault tolerance technology will provide a sophisticated mechanism. [19] BBB... Reactive fault tolerance Reactive fault tolerance techniques are used to reduce the impact of failures on a system when the failures have actually occurred. This technique provides robustness to a system. Techniques based on this policy are checkpoint/Restart and retry [25].  User-Defined Exception Handling The ability of allowing users to specify the defined exceptions to handle task specific failures relying on the context of the task, as well as, the ability to define customs procedures to handle these errors as well.  Task Resubmission A new technique tries to re- execute the same task whenever a failed task is detected and is utilized during the workflow execution phase. However, a task may resubmit either to the same or to the different resources low at run time without interrupting the workflow of the system. [21]  Retry: Repeatedly, tries the failed task on the same cloud resources. [22]  S-Guard: A fault tolerance technique is used for distributed stream processing engines (SPEs) such as email, online games, e-commerce, instance messaging, and search. It is relied on a rollback recovery that capable of checkpoint the state of stream processing nodes recurrently and restart failed nodes from last checkpoint. [23]  Replication The availability of replicated resources is a key requirement for the forming of fault tolerant systems in the cloud. Simply, replication means several copies of an application with the same input-set are executed simultaneously on alternative sites [24]. For instance, proxy server and the caching in web browser can be considered as a form of replication. The main target of replication is guaranteeing at least one replica to complete the task correctly in case others fail [33]. More than one replication mechanisms such as active, passive, or semi-active have been used in the cloud computing.  Job Migration Moves a job’s state from a particular machine (node) to another when a node cannot be completely executed or processed the tasks. However tasks can be migrated by using HAproxy tool and load balancing. [25, 26]  Checkpoint / Restart (C/R) recovery: C/R is the typical technique to tolerate failure on unreliable systems. By saving a snapshot of running application on a stable storage periodically so as to restart the application from a latest checkpointing image in case of a crash. [27] Until today, researchers examined many checkpoint strategy types, but only three checkpoint fault tolerance strategies are widely used among them in cloud computing. - Full checkpoint: is a traditional mechanism which saves the total state of the application or the system periodically to a storage platform. The drawback of this mechanism is the time which is consumed to make a snapshot of a whole system. And also the consumed of a large storage to save the whole system running states. [28] - Incremental checkpoint: The first checkpoint is full while the subsequent checkpoints only save pages that have been modified. This procedure produces a large recovery overhead due to the system must recover from the starting checkpoint. [29] - Hybrid checkpoint: is a combination between the full checkpoint and the incrementing
  • 7. strategies. Therefore a balance between the checkpoint overhead and the fault recovery overhead should achieve. VVVIII... FAILURE DETECTOR Failure detector can be defined as an application or system used in order to determine node failures or crashes. Failure detectors can be determined as reliable or unreliable according to yielded results. Correctness properties of failure detectors:  Completeness: Process failure should be detected by at least one other non-faulty process. Completeness describes the capability of failure detector of suspecting every failed process permanently..  Accuracy: Failure predictions should be accurate and contain no mistakes Less number of false positives result in high accuracy. 100% accuracy is hardly feasible – real life failure detectors guarantee the completeness but have some faults in their accuracy either practical or probabilistic. So, there always be some trade-offs between completeness and accuracy.  Speed: Prediction time of a failure should be as less as possible.  Scale: The load should be low and equally distributed – each process in a group should have low overall network load. VVVIIIIII... FEATURES OF AMAZON WEB SERVICE OVERVIEW Figure 5: AWS structure [31] AAA... Definition of AWS features The following are service definitions from [31] and other sources, so, the text is used here like a citation:  Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. You can bundle the operating system, application software and associated configuration settings into an Amazon Machine Image (AMI) allows to bundle the operating system and application software and associate configuration settings with that. AMI then can be used to provide multiple virtualized instances as well as decommission them using simple web service calls to scale capacity up and down quickly, as your capacity requirement changes. There are On-Demand Instances in which the instances can be paid by the hour or Reserved Instances in which you pay a low, one-time payment and receive a lower usage rate to run the instance than with an On-Demand Instance or Spot Instances where unused capacity could be bid to further reduce cost of the product. Instances can be launched in one or more geographical regions. Each region has multiple Availability Zones. Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. [31]  Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications you run on AWS. Amazon CloudWatch is a service which can be used to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources. One can monitor AWS resources such as Amazon EC2 instances, Amazon DynamoDB tables, and Amazon RDS DB instances, as well as custom metrics generated by your applications and services, and any log files your applications generate with Amazon CloudWatch. Amazon CloudWatch can be used to gain system- wide visibility into resource utilization, application performance, and operational health. One can use these insights to react with the application and to keep its smooth run. [31]  Amazon Virtual Private Cloud (Amazon VPC) Within your logically isolated network, Amazon VPC provides complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. [31]  Amazon Relational Database Service (Amazon RDS) provides an easy way to setup, operate and scale a relational database in the cloud. A DB Instance can be launched and access to a full-featured MySQL database is provided. One should not worry about common database administration tasks like backups,
  • 8. patch management etc. – the service includes these features[31].  Amazon Simple Queue Service (Amazon SQS) computers and other components of the system can use this service is a reliable, highly scalable, hosted distributed queue to store messages[31].  Amazon Simple Notifications Service (Amazon SNS) provides a simple way to notify applications or people from the cloud by creating Topics and using a publish-subscribe protocol[31]. BBB... AWS specific tactics for best practice implementing: 1. Failover carefully using Elastic IPs: We can dynamically and speedily re-map and failover to another server group so that our traffic is routed to the new servers using Elastic IP. When we need to upgrade from old to new versions or in case of some failures we can easily access this service to use[31]. 2. Use multiple Availability Zones: Availability Zones are conceptually such as logical datacenters. This concept insure our data with high availability[31]. 3. Use Amazon RDS Multi-AZ deployment functionality to directly replicate database updates across multiple Availability Zones..[31] 4. Maintain an Amazon Machine Image so that you can restore and clone environments very easily in a different Availability Zone; Maintain multiple Database slaves across Availability Zones. [31]. 5. Utilize Amazon CloudWatch to get more visibility and take appropriate actions in case of performance degradation or hw failure. Setup an Auto scaling group to maintain a fixed fleet size so that it replaces bad condition or spoilt Amazon EC2 instances by new ones. [31] 6. Use Amazon EBS and set up a time-based job scheduler (cron) so that incremental snapshots are automatically uploaded to Amazon S3 and data is totally independent of your instances. [31] 7. Engage Amazon RDS and when you set the waiting period for backups, that it can perform backups itself. [31] CCC... Specific details of AWS As stated in [29,30,31,35,36] AWS includes the following services and features and environments (following italics are direct citations):  Virtual computing environments, known as instances Preconfigured templates for your instances, known as Amazon Machine Images (AMIs), that package the bits you need for your server (including the operating system and additional software)  Network firewalls built into Amazon VPC, and web application firewall capabilities in AWS WAF let you create private networks, and control access to your instances and applications  Encryption in transit with TLS across all services.  Connectivity options that enable private, or dedicated, connections from your office or on- premises environment  Ability to deploy DDoS mitigation technologies as part of your auto-scaling or content delivery strategy.  AWS Identity and Access Management (IAM) lets you define individual user accounts with permissions across AWS resources.  Data encryption capabilities available in AWS storage and database services, such as EBS, S3, Glacier, Oracle RDS,SQL Server RDS, and Redshift Flexible key management options, including AWS Key Management Service, allowing you to choose whether to have AWS manage the encryption keys or enable you to keep complete control over your keys.  Dedicated, hardware-based cryptographic key storage using AWS CloudHSM, allowing you to satisfy compliance requirements.  A security assessment service, Amazon Inspector, that automatically assesses applications for vulnerabilities or deviations from best practices, including impacted networks, OS, and attached storage.  Deployment tools to manage the creation and decommissioning of AWS resources according to organization standards.  Inventory and configuration management tools, including AWS Config, that identify AWS resources and then track and manage changes to those resources over time.  Template definition and management tools, including AWS CloudFormation to create standard, preconfigured environments.  Deep visibility into API calls through AWS CloudTrail, including who, what, who, and from where calls were made.  AWS have a lot of complies: SOC 1/ISAE 3402, SOC 2, SOC 3, FISMA, DIACAP, and FedRAMP, PCI DSS Level 1, ISO 9001, ISO 27001, ISO 27018  Log aggregation options, streamlining investigations and compliance reporting.  Alert notifications through Amazon CloudWatch when specific events occur or thresholds are exceeded.  There are several type of purchasing options according on customers required; On-Demand Instances, Reserved Instances, Spot Instances, Dedicated Hosts.  AWS Multi-Factor Authentication for privileged
  • 9. accounts, including options for hardware-based authenticators.  AWS Directory Service allows you to integrate and federate with corporate directories to reduce administrative overhead and improve end-user experience.  AWS Direct Connect, you can establish a dedicated network connection between AWS and your datacenter, office, or collocation environment. In many cases, this can provide both lower costs and a higher level of service than Internet-based connections.  Amazon S3 and Amazon Glacier automatically replicate data across multiple data centers and is designed to deliver 99.999999999% durability.  Various configurations of CPU, memory, storage, and networking capacity for your instances, known as instance types.  Secure login information for your instances using key pairs (AWS stores the public key, and you store the private key in a secure place).  Storage volumes for temporary data that's deleted when you stop or terminate your instance, known as instance store volumes.  Persistent storage volumes for your data using Amazon Elastic Block Store (Amazon EBS), known as Amazon EBS volumes.  Each AZ has independent infrastructure (power, cooling, network and security) so they are isolated with other others. therefore, failure behind one AZ will not affect the others.  Multiple physical locations for your resources, such as instances and Amazon EBS volumes, known as regions and Availability Zones. Figure 6: Availability Zones in AWS [36]  A firewall that enables you to specify the protocols, ports, and source IP ranges that can reach your instances using security groups.  Static IP addresses for dynamic cloud computing, known as Elastic IP addresses.  Metadata, known as tags, that you can create and assign to your Amazon EC2 resources.  Virtual networks you can create that are logically isolated from the rest of the AWS cloud, and that you can optionally connect to your own network, known as virtual private clouds (VPCs). CONCLUSION Consequently, many properties like as cost-effective of infrastructure resources, managing infrastructure, availability, and scalability are provided to small and medium scale business enterprises by cloud computing. In technical terms, cloud services are trade elements and they don’t guarantee the continuity of the applications of the customer. Indeed, what they guarantee is the availability of infrastructure and components offered to the customer. Consequently, to ensure the continuity of customer’s system, fault tolerance should be deployed in the cloud. Analyzing these FT methods and understanding their restrictions are our eventual targets in order to build a FT method to manage all fault types in different aspects. And in this study, we examined amazon web service properties and indicated the way they tolerate faults. REFERENCES 1] M. Armbrust , (2009) Above the clouds: a Berkeley view of cloud computing. UC Berkeley Technical Report [online]. Available from : https://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009- 28.pdf/. 2] P. Mell , T.Grance , (2011), The NIST Definition of Cloud Computing[online], Available from : http://faculty.winthrop.edu/domanm/csci411/Handouts/NIST.pdf/. 3] J. Geelan , (2009), Twenty One Experts Define Cloud Computing[online], Available from : http://virtualization.sys- con.com/node/612375/. 4] Diversity Limited, (2011), Revolution Not Evolution How Cloud Computing Different from Traditional IT and Why it Matters[online], Available from : http://userpages.umbc.edu/~dgorin1/451/cloud/Revolution_Not_Evoluti on-Whitepaper.pdf/. 5] L. Youseff, M. Butrico, D. Da Silva ,Toward a Unified Ontology of Cloud Computing[online], Available from: https://storagemadeeasy.com/files/8f047da34a2d3a3528136ba8b59a465 d.pdf/. 6] Intel,(2013), Virtualization and Cloud Computing[online], Available from : http://www.intel.com/content/dam/www/public/us/en/documents/guides/ cloud-computing-virtualization-building-private-iaas-guide.pdf/.
  • 10. 7] L. Krutz , R.Vines, (2010), Cloud Security: a Comprehensive Guide to Secure Cloud Computing[online],Available from: https://drive.google.com/file/d/0B- W0l4MahMzLVVp0UVgyNnh5bnM/. 8] M. Williams , (2010), A Quick Start Guide to Cloud Computing [online], Available from: https://23510310jarinfo.files.wordpress.com/2011/09/a-quick-start- guide-to-cloud-computing.pdf/. 9] C. Barnatt., (2010), A Brief Guide to Cloud Computing [online], Available from: http://www.explainingcomputers.com/cloud/BGT_Cloud_Computing_E xtract.pdf/. 10]B. Furht , Escalante A., (2011), Handbook of Cloud Computing [online], Available from: https://studytm.files.wordpress.com/2014/03/hand-book-of-cloud- computing.pdf/. 11]Sun Microsystems, (2009), Introduction to Cloud Computing Architecture[online], Available from: https://java.net/jira/secure/attachment/29265/CloudComputing.pdf/. 12]Dialogic Corporation,(2010) Introduction to Cloud Computing[online], Available from : http://www.dialogic.com/~/media/products/docs/whitepapers/12023- cloud-computing-wp.pdf/. 13]B. Sosinsky , (2011), Cloud Computing Bible[online], Available from: http://cs.ecust.edu.cn/~yhq/course_files/cloud/Cloud%20Computing%20 Bible.pdf/. 14]F. Machida, E. Andrade, D. S. Kim, K. S. Trivedi, (2011), Candy: Component-based Availability Modeling Framework for Cloud Service Management Using SysML[online],Available from : http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6076779/. 15]C. Gong ,J. Liu ,Q. Zhang ,H. Chen ,Z. Gong , (2010), The Characteristics of Cloud Computing [online], Available from : http://www.postdm.post.ir/_ITCenter/Documents/TheCharacteristicsofC loudComputing_20140722_154207.pdf/. 16]A. Amal Ganesh ,M. Sandhya,S. Shankar , (2014), “Study on Fault Tolerance methods in Cloud Computing [online], Available from :http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6779432/. 17]Z. Amin,N. Sethi,H. Singh,(2015) Review on Fault Tolerance Techniques in Cloud Computing[online], Available from: http://research.ijcaonline.org/volume116/number18/pxc3902768.pdf 18]F. Machida , D.S. Kim , J. S. Park., K. S. Trivedi, (2008), Toward Optimal Virtual Machine Placement and Rejuvenation Scheduling in a Virtualized Data Center[online], Available from : http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5355515/. 19]C. Engelmann ,G. R. Vallee , T. Naughton ,S. L. Scott , (2009), Proactive Fault Tolerance Using Preemptive Migration [online], Available from: http://www.christian- engelmann.info/publications/engelmann09proactive.pdf /. 20]K. Plankensteiner , R. Prodan, T. Fahringer , (2009), A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact [online], Available from: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5380852/. 21]A. Bala ,I. Chana , (2012), Fault Tolerance- Challenges, Techniques and Implementation in Cloud Computing[online], Available from : https://www.researchgate.net/publication/266525159_Fault_Tolerance- Challenges_Techniques_and_Implementation_in_Cloud_Computing/. 22]Y. Kwon ,M. Balazinska , A. Greenberg , (2008), “Fault Tolerant Stream Processing using a Distributed, Replicated File System[online], Available From: http://goo.gl/vzhK6l/. 23]S. M. Ghoreyshi, (2013), Energy-Efficient Resource Management of Cloud Datacenters Under Fault Tolerance Constraints[online], Available from: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6604493/. 24]S. Lin, M. Huang ,K. Lai ,K. Huang , (2008), Design and Implementation of Job Migration Policies in P2P Grid Systems [online], Available from : http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4780655/. 25]P. K. Patra ,H. Singh , G. Singh , (2013), Fault Tolerance Techniques and Comparative Implementation in Cloud Computing[online], Availble from:https://www.researchgate.net/publication/258789870_Fault_Tolera nce_Techniques_and_Comparative_Implementation_in_Cloud_Computi ng/. 26]Y. M. Essa,(2016), A Survey of Cloud Computing Fault Tolerance: Techniques and Implementation [online],, Available from: http://www.ijcaonline.org/research/volume138/number13/essa-2016- ijca-909055.pdf/. 27]S. Gokuldev ,M. Valarmathi , (2013), Fault Tolerant System for Computational and Service Grid [online], Available from: http://www.ijeit.com/vol%202/Issue%2010/IJEIT1412201304_47.pdf/. 28]R.Garg , and A. K. Singh., (2011), Fault Tolerance in Grid Computing: State of the Art and Open Issues [online], Available from: http://airccse.org/journal/ijcses/papers/0211cses07.pdf/. 29]Amazon, (2015),Amazon Web Services: Overview of Security Processes[online], Available from: https://d0.awsstatic.com/whitepapers/aws-security-whitepaper.pdf/. 30]Amazon, (2015),Amazon Web Services: Overview of Amazon Web Services [online], Available from: https://d0.awsstatic.com/whitepapers/aws-overview.pdf/. 31]Amazon, (2011), Architecting for the Cloud: Best Practices [online], Available from: https://media.amazonwebservices.com/AWS_Cloud_Best_Practices.pdf 32]L. Youseff, M. Butrico, D. Da Silva ,Toward a Unified Ontology of Cloud Computing[online], Available from: https://storagemadeeasy.com/files/8f047da34a2d3a3528136ba8b59a465 d.pdf/. 33]1] P. Latchoumy, P. S. A. Khader , (2011), Survey on Fault Tolerance in Grid Computing[online], Available from: http://airccse.org/journal/ijcses/papers/1111ijcses07.pdf/. 34]Minister of Transport, Maritime and Communication of Turkey (Binali Yildirim speech about cloud (2012),’the cloud’, Available: https://www.youtube.com/watch?v=10UZwW6563E/ 35]Amazon documents website, http://docs.aws.amazon.com/. 36]Amazon AWS website, http://aws.amazon.com/,