SlideShare a Scribd company logo
Received February 4, 2016, accepted March 10, 2016, date of publication April 27, 2016, date of current version May 9, 2016.
Digital Object Identifier 10.1109/ACCESS.2016.2558456
Towards a Virtual Domain Based
Authentication on MapReduce
IBRAHIM LAHMER AND NING ZHANG
School of Computer Science, The University of Manchester, Manchester M13 9PL, U.K.
Corresponding author: I. Lahmer (ibrahim.lahmer@manchester.ac.uk)
This research was sponsored by the Ministry of Higher Education and Scientific Research of Libya and partially supported by National Oil
Corporation Libya (NOC-Libya).
ABSTRACT This paper has proposed a novel authentication solution for the MapReduce (MR) model, a new
distributed and parallel computing paradigm commonly deployed to process BigData by major IT players,
such as Facebook and Yahoo. It identifies a set of security, performance, and scalability requirements that are
specified from a comprehensive study of a job execution process using MR and security threats and attacks
in this environment. Based on the requirements, it critically analyzes the state-of-the-art authentication
solutions, discovering that the authentication services currently proposed for the MR model is not adequate.
This paper then presents a novel layered authentication solution for the MR model and describes the core
components of this solution, which includes the virtual domain based authentication framework (VDAF).
These novel ideas are significant, because, first, the approach embeds the characteristics of MR-in-cloud
deployments into security solution designs, and this will allow the MR model be delivered as a software
as a service in a public cloud environment along with our proposed authentication solution; second,
VDAF supports the authentication of every interactions by any MR components involved in a job execution
flow, so long as the interactions are for accessing resources of the job; third, this continuous authentication
service is provided in such a manner that the costs incurred in providing the authentication service should
be as low as possible.
INDEX TERMS MapReduce, authentication for mapreduce, cloud computing security, security
requirements, security threats.
I. INTRODUCTION
MapReduce (the MR model) is a new parallel programming
paradigm. It is proposed to process large volumes of data.
Data processing is carried out in two phases: map and reduce.
The map phase takes a set of data and converts it into
another set of data called key/value pairs to produce the
intermediate results of the MR computation. The reduce
phase then takes these intermediate results as its input and
combines these data to produce an output and this output
is the final result of the MR computation. More details
as how MR works can be found in [1]–[3]. To carry out
the two-phase MR computation, a set of distributed nodes
(hereafter referred to as MR components) are used. Figure 1
shows a Generic MapRedcue Computational (GMC) model
that we have constructed based on the most recent MR
application framework [1]–[3]. From the figure, it can be
seen that a distributed set of MR components interact with
each other and collaboratively execute a client’s job. The
entire process for this job execution, i.e. from when the
job is submitted to when the final computational result is
ready for collection, is referred to as a job execution flow
(or a job work-flow). The MR components can generally be
classified into two main categories: master nodes and slave
nodes. The Resource Manager and Name Node, shown in
Figure 1, are examples of master nodes, and the rest are slave
nodes. In this version of the MR model implementation, a
client submits his job to the Resource Manager. The Resource
Manager assigns the tasks of the job to a set of slave nodes
that contains containers to run the Map and Reduce Tasks.
However, in the classic MR model implementation [1], [2],
a client submits a job to the Job Tracker directly and the
Job Tracker then assigns Map and Reduce Tasks to a set
of slave nodes (indicated in Figure (1) by using dash-dot
lines labeled as ‘(3), (4), and (7)’). The two sets of MR
components, respectively run on two large clusters of nodes
are typically referred to as the Processing Framework (PF)
cluster and Distributed File System (DFS) cluster [3]. The
GMC model, shown in Figure 1, is derived to capture the
interactions among different MR components in the newer
MR model implementation (although what has been captured
can also be applied to the classic MR model implementation).
More details about the MR components, and their
1658
2169-3536 
 2016 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
FIGURE 1. Job execution work-flow in the GMC model.
functionalities, of both versions of the MR model
implementations (i.e. MR application frameworks) are
available in [2].
The MR model, owing to its scalability, robustness and
simple to use as a parallel and distributed programming
framework, is becoming more and more widely used [4], [5].
Hadoop, an implementation of the MR model, has been
adopted by many companies including the major IT players
in the world such as Facebook, eBay, IBM and Yahoo. These
implementations are largely done in their respective private
clouds. However, recently there are efforts to implement the
MR model in public clouds [6], [7].
A major concern of using the MR model in a public cloud is
its inadequate security provision, such as authentication. The
MR model was initially intended for use in private networks,
so the issue of security was not a design consideration [8].
Since its introduction, lots of efforts have been made to
improve the performance of this model making it more
efficient rather than making it more secure. Deploying
the MR model in an open environment, such as public
clouds, without adequate security provisioning would put
the clients’ jobs and their data at risks. This is because, in
such an environment, different jobs submitted by different
clients typically share the same set of physical nodes and
software resources. The clients have very little control over
(1) on which nodes their MR components (assigned to their
respective jobs) are executed, and (2) on which DFS nodes
the data associated to their jobs are stored. These could make
the jobs and the data more vulnerable to security threats and
attacks [1], [9]–[11].
VOLUME 4, 2016 1659
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
Our work focuses on addressing identity related threats
and attacks in deploying the newer version of the MR model
in an open environment. To understand the security issues
in this context and to capture the requirements necessary
to address the issues, in this paper, we categorise the
MR components involved in a job execution flow into
two categories, MR Infrastructure (MR-Inf.) Components
and MR-Job Components. MR-Inf. Components are the
MR components that serve every job submitted by any
clients. These components are not job specific. Examples
of MR-Inf Components are Resource Manager and Name
Node. MR-Job Components are the MR components that
are invoked specially for a particular job submitted by a
client. This set of components is job specific and their
invocations and existence are purely for serving this particular
job. Examples of MR-Job Components are Job Tracker
(also called Application Master), Task Tracker (i.e. Node
Manager) and Map and Reduce Tasks (i.e. Containers).
As indicated in Figure 1, running the MR model in an open
environment, three observations can be made: (1) clients
typically access the MR application remotely via the Internet,
(2) each client’s input data are partitioned and stored
on a set of distributed and shared Data Nodes, (3) the
MR components that are involved in executing the tasks
(MR-Job Components) sprawled for a single client’s job
are executed by multiple nodes, and these nodes may
also host the tasks sprawled for other jobs submitted by
other clients. An authentication solution designed to secure
the jobs and their data in such an environment should
consider three aspects, and these are: (a) the authentication
of a Client to the MR application (i.e. Client-to-MR
authentication), (b) the mutual authentication among MR
components (i.e. MR-Comp-to-MR-Comp authentication),
and (c) data authenticity (i.e. Data-Authenticity, which
covers both origin authentication and integrity protections).
Client-to-MR authentication is to guard the entry gate to
the MR application making sure only authorised users
(i.e. the clients of the MR application) could submit jobs
to the MR application. In other words, the authentication
solution should be able to verify that a client who seeks to
submit a job to the MR application is indeed whom he claims
to be. MR-Comp-to-MR-Comp authentication is to make sure
that an MR component seeking to retrieve any resources
associated to a client’s job is whom it claims to be. The
third aspect, Data-Authenticity, is to protect the authenticity
(i.e. origin and integrity) of data generated in both map and
reduce phases, making sure that any unauthorised access of,
and/or alterations made, to the data can be detected.
The importance of addressing the above authentication
issues and the requirements that should be satisfied by an
authentication solution designed for the MR model have been
discussed in literature [1], [9]. However, so far, little has
been done in term of designing such a solution. As part of
our effort on designing a secure and effective authentication
solution for the MR model, in this paper, we critically analyse
the state-of-the-art MR authentication methods. The purpose
of this critical analysis is to examine the suitability and
effectiveness of existing authentication methods (proposed
for the MR model) taking into considerations of the
features and characteristics of an MR application in an
open environment such as a public cloud, so as to identify
areas for improvement. It should be mentioned that our
analysis of existing authentication methods proposed for
an MR application has been previously published in [24].
However, this paper extends this analysis by (i) specifying
design requirements for such an authentication solution, and
analysing existing authentication methods proposed for the
MR model against these requirements, (ii) further analysing
what are missing in these methods in light of the features
and characteristics of the MR model being deployed in an
open environment, and (iii) providing a high level analysis
of the MR model and its components in executing a client’s
job, highlighting the functionalities of, and the interactions
among, the MR components, (iv) proposing a novel approach
to MR authentication, a layered authentication solution to
the MR model that supports the newer version of the MR
implementation. This solution is proposed to tackle the
missing bits we have identified in existing authentication
solutions designed for MR.
In detail the remaining part of this paper is structured
as follows. Section 2 specifies a set of authentication
requirements based on our observations on, and security
analysis of, the MR model being deployed in an open
environment. In Sections 3 and 4, we critically analyse the
existing work on MR authentication against the specified
requirements. The analysis covers the authentication methods
already adopted by the MR model (Section 3), and those
recently proposed in literature for the MR model (Section 4).
Section 5 gives a high-level analysis of the MR model,
highlighting the functionalities of, and the interactions among
the MR components in executing a client’s job, and this
analysis leads to our novel proposal, a layered authentication
solution to the MR model. Finally, Section 6 concludes the
paper with further discussions and outline of our future work.
II. REQUIREMENTS FOR AN MR
AUTHENTICATION SERVICE
This section specifies a set of requirements for the
design of an authentication service for an MR application
implemented in an open environment. The specification of
the requirements has taken into account of the characteristics
of the implementation and the outcome of a threat analysis
carried out on the MR model. Related work has been reported
in [1], [3], and [9].
A. ENTITY IDENTIFICATION AND
CREDENTIAL REVOCATION
To authenticate clients, MR components and jobs submitted
to the MR application, each of these entities (or components)
should have a unique identifier. The names (acronyms) of the
identifiers along with entities they each represent are given in
the following:
1660 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
(1) Clients IDs (Client-ID): a unique identifier for each
client. This is usually a static ID, and it is typically
the username of a user who has registered with the
MR application and is running the client (i.e. MR-Client).
(2) MR-Inf. Components IDs (MR-Inf.Comp-ID): Each
MR-Inf. Component should have a unique identifier and these
identifiers are static ones.
(3) MR-Job Components IDs (MR-JobComp-ID): Each set
of MR-Job Components serving a particular job should have
a unique identifier to identify this set of MR-Job Components
from the sets of MR-Job Components serving other jobs’.
These IDs are dynamic ones.
(4) MR-Job Hosting Nodes IDs (MR-JobHostNode-ID):
Each MR-Job-HostNode should be uniquely identifiable, and
these IDs are also static identifiers.
(5) MR-Jobs IDs (MR-Job-ID): Each MR-Job should have
a unique identifier to distinguish different jobs submitted by
the same client or by different clients.
(6) Framework (cluster) ID: If there are two or more parties
providing hosting nodes, then the hosting nodes provided by
a single party may be treated as one cluster, and each cluster
should be identified by a unique identifier.
Authentication is carried out by demonstrating (by a
claimant), and verifying (by a verifier), the knowledge of
a secret uniquely associated to an identity. Therefore there
is a need for secure issuance, acquisition and revocation of
an identity secret (which is also part of the corresponding
credential). This leads to the following requirements:
1) ENTITY IDENTIFICATION (OR. REGISTRATION)
There should be a secure method for a new client or a new
MR component to be identified by the MR application and to
establish secret associated to the identity.
2) CREDENTIAL REVOCATION
There should be secure methods for revoking any
credential(s) issued to the identity of an entity involved in a
job execution at any point during the job execution or after the
job execution. This may take place when a job is completed,
or when a related MR-JobHostNode fails or is disconnected.
B. ENTITY AUTHENTICATIONS
Entity authentication is to make sure that a communicating
entity is the one that it claims to be. Multiple entities
(i.e. components) in the MR model are involved in a
job (MR-Job) execution. Some of these components are
static components, while others are dynamic ones. The static
components are identified by static identities that, once given,
remain the same during the lifetimes of the components. The
static components can be further classified into two groups:
one is MR Clients and the other is MR-Inf. Components.
MR-Inf. Components are shared by different MR-Jobs.
Resource Manager and Name Node, shown in Figure 1,
are MR-Inf. Components, so they are static components
and their identities are static too. The dynamic components
are identified by dynamic identities. A dynamic identity is
assigned to a dynamic component when the component is
assigned to an MR-Job. If this MR-Job is completed in which
case the component may be assigned to another MR-Job,
and if this is the case, this component will be assigned
with a new identity. Job Tracker, Task Tracker and Map and
Reduce tasks (i.e. Containers) are dynamic components and
they are identified by dynamic identities. In an authentication
solution designed for the MR model, all the components
taking part, or being involved in a job execution, being static
or dynamic, should be securely identified and authenticated.
In detail, with reference to the MR model depicted in
Figure 1, the authentication task should satisfy the following
requirements:
1) MUTUAL AUTHENTICATION BETWEEN AN MR
CLIENT AND AN MR-INF. COMPONENT
This is to ensure that only an authorized client can connect
to the MR application. Hereafter this is referred to as
the Client-to-MR-App authentication and MR-App-to-Client
authentication. More specifically, this should cover the
mutual authentication between a Client and the Resource
Manager and between a Client and the Name Node.
2) MUTUAL AUTHENTICATION BETWEEN AN MR-JOB
COMPONENT AND AN MR-INF. COMPONENT
This is to ensure that an MR-Job component involved
in the execution of a client’s job is authenticated to the
MR application, so as to ensure that any access to a client’s
job input and output data can be granted in a secure manner.
The mutual authentication between the Job Tracker of a job
and the Name Node, and between a Reduce Task of the
job and the Name Node are examples of this authentication
requirement.
3) MUTUAL AUTHENTICATION BETWEEN ANY
PAIR OF MR-JOB COMPONENTS
This is to ensure that any access to a client’s job intermediate
data can be granted in a secure manner.
4) MUTUAL AUTHENTICATION BETWEEN AN MR-JOB
COMPONENT’S HOSTING NODE AND
AN MR-INF COMPONENT
This is to ensure that any new physical node assigned
to hosting MR-Job Component/s of a client’s job (e.g. a
new hosting node of a Task Tracker) is authenticated to
a MR-Inf. Component and vice versa. In this way, we
can ensure that any two MR-Job Components’ Hosting
Nodes can authenticate to each other. Hereafter this
requirement is referred to as MR-Job-HostNode-to-MR-App
and MR-App-to-MR-Job-HostNode authentication.
5) MUTUAL AUTHENTICATION BETWEEN DOMAINS
(I.E. CROSS-PROVIDER AUTHENTICATION)
This authentication is needed when a third party is involved
in an MR-Job, and it is to ensure that any new physical node
which belongs to a third party domain and involved in hosting
VOLUME 4, 2016 1661
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
MR-Job Components is authenticated to the MapReduce
domain where the job is submitted and mastered.
C. AUTHENTICITY OF DATA AND PROTOCOL MESSAGES
1) DATA AUTHENTICITY
This is to ensure the origin authentication and integrity
protection of data that are saved in, or produced by, the
MR application. In other words, the protection should be
applied to input data, intermediate data and output data of
any job processed by the MR application.
2) AUTHENTICITY OF PROTOCOL MESSAGES
The origin authentication and integrity protection should
also be applied to all the protocol’ messages facilitating the
tasks of authentication in an MR application. The protocol
messages are of two types: authentication requests and
authentication responses.
D. CONFIDENTIALITY OF PROTOCOL MESSAGES
Confidentiality of Authentication Requests and Replies: This
is a protection of authentication requests and replies from any
unauthorized disclosure. To counter eavesdropping attacks,
the confidentiality of any such request or response sent
between MR components throughout an MR-Job execution
should be protected.
E. PERFORMANCE AND SCALABILITY REQUIREMENTS
1) MINIMIZING COMMUNICATION OVERHEAD
In accomplishing the task of authentication for an MR-Job,
the communication overhead introduced should be as low
as possible. This means that the number of authentication
messages, and the length of each message should be as low
as possible.
2) MINIMIZING COMPUTATIONAL OVERHEAD
the computational overhead incurred in accomplishing the
task of authentication for an MR-Job should be as small as
possible.
3) MAXIMIZING SCALABILITY
The MR application scales by simply adding new nodes
(members) to the shared clusters. Any authentication solution
designed for the MR application should scale similarly.
F. SUPPORT FOR UPDATING OF AUTHENTICATION
CREDENTIALS
There may be cases where an execution of an MR-Job
takes a long time, and, in such cases, for security reasons,
authentication secrets or credentials may need to be renewed
or updated. Therefore, any authentication solution designed
for the MR application should support the renewal or
updating of authentication secrets or credentials.
In the next section, we critically analyze authentication
methods proposed for the MR model based on the
requirements specified above. These authentication methods
include those ever adopted by the MR model and also those
published in literature.
III. AUTHENTICATION METHODS EVER ADOPTED
BY THE MR MODEL
Two authentication methods have been adopted by the
MR model so far [1], [4], [13]. The first one [1], [4], adopted
in the early generation of the model, assumed the use of an
independent authentication service outside the MR model,
e.g. an authentication service come with the host operating
system (OS) running on a physical node. This is the so
called OS-based authentication method. In other words, the
MR model then did not have its own authentication service.
Rather it relied on the use of an authentication facility
provided by the OSes of the physical nodes in which an MR
application is deployed.
The second method, used in the most recently deployed
MR model was proposed by O. Malley et al. from the
Yahoo Hadoop team (hereafter referred to as O. Malley
method) [13]. This method is symmetric key based
authentication and it is largely built on the Kerberos
authentication protocol. At the time of writing this paper,
the Kerberos authentication protocol is still a default mode
of authentication for an MR application deployed in a private
cloud [1], [8]. Figure 2 summarizes the authentication process
using this method. As shown in the figure, a client or
MR component first authenticates itself to the authentication
server. Upon successful authentication, the MR component
will obtain a Ticket Granting Ticket (TGT), which is then
used to acquire a service ticket. The service ticket is then used
by the MR component to access resources located on other
MR components. This authentication process consists of
six steps (steps 1 to 6, as shown in the figure), and is identical
for all the MR components in the application. Assuming
that a client is to write his job into the MR application as
part of a job submission process, to authenticate himself
to the application, the client first makes an authentication
FIGURE 2. Kerberos protocol messages exchanges in the MR model.
1662 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
request to the Authentication Service (AS). The AS generates
a response containing a TGT, which is encrypted using a
key derived from the client’s password and sent to the client.
Then the client uses this TGT to request a service ticket by
sending the TGT along with an authenticator to demonstrate
the secret in the TGT to Ticket Granting Service (TGS). Once
the client receives this service ticket, he uses it to access the
Name Node in DFS. The same steps are taken by any other
MR components, such as Task Trackers, to get admitted to
the cluster and to access other (remote) MR components for
retrieving data or other resources used by the client’s job.
Once a Task Tracker (or a Client) is authenticated, obtains
a service ticket and is admitted to the Master Node in the MR
application, both the Task Tracker and Master Node will use
the shared key in the service ticket to authenticate to each
other [1], [13].
This authentication method is a one-factor authentication
method. The one factor used by a client to authenticate
himself to the AS is the client’s password. Knowing the
password would allow any entity to acquire a service ticket
in the name of the client and to access any resources granted
to the client. In other words, for an attacker to impersonate
a legitimate component (e.g. a client or a Task Tracker), the
attacker needs to obtain a service ticket. To access the ticket,
the attacker needs to know the password of the (legitimate)
client to whom the ticket has been issued. If a client’s
password is compromised, then all the resources assigned to
the client will be at risk. In addition, the attacker could use
this compromised account to launch further attacks in the MR
application. In other words, the security level offered by this
one-factor authentication method is the same as that offered
by the password chosen by a client. If a client chooses a weak
password, then the risks imposed on the MR application will
increase accordingly.
With regard to communication overheads introduced by
this authentication method, we should work out how many
protocol messages are generated and used per job submission
(i.e. in each authentication instance), while assuming the
length of each such message is approximately the same.
For each authentication instance, three rounds (R) of
communications are required. Two of the three rounds are
between a client (or an MR component, or MR-Req-Comp,
for short) and the AS, and the third round is between an
MR-Req-Comp and another the MR component that manages
some resources (MR-Res-Comp). Each round consists of two
messages (Msg), one request (Req) and one response (Res).
Table 1 shows the number of communication rounds (along
with the number of protocol messages exchanged) versus the
numbers of MR components involved per job.
Deploying MR in a cloud environment is a shared
computational environment, and, in such an environment,
there are multiple possible usecases. For example, one
client may submit a single job at any given time (hereafter
referred to as the OneClient-OneJob usecase), one client may
submit multiple jobs simultaneously (OneClient-MultiJobs)
or multiple clients and each may submit one or more jobs
TABLE 1. Number of communication rounds for MR component/s
authentication using Kerberos.
FIGURE 3. A number of protocol (i.e. authentication) messages generated
in an authentication process/es under specified usecase scenarios.
simultaneously (MultiCleints-MultiJobs). Table 2 shows the
number of communication rounds and the total number
of protocol messages generated for different numbers of
MR-Req-Comp each job may require in each of the three
usecases. The table uses the notation, yC/zJ, to indicate the
different usecases, i.e. 1C/1J for Case-1, meaning one client
y=1, and one job z=1; 1C/zJ for Case-2, one client y=1,
and multiple jobs z>1, and yC/zJ for Case-3, where multiple
clients y>1, and multiple jobs z>1.
Figure 3 plots the results for three example cases: 1C-1J-
16Comp, 1C-6J-16Comp and 7C-4J-16Comp capturing
different numbers of clients, jobs and MR components in each
case. For Case-3, we assume that there are 7 clients and each
client submits 4 jobs. The number of components involved
in each job execution is 16 MR components. Detailed values
VOLUME 4, 2016 1663
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
FIGURE 4. Number of protocol (i.e. authentication) messages versus the number of: (A) MR components, (B) Jobs, and (C) Clients.
TABLE 2. Number of communication rounds for authentication in three
different usecases of the MR application.
with regard to the number of clients, the number of jobs
per client, the number of MR components per job, and the
number of protocol messages required for authentication in
each of these cases are given in the figure. It can be seen
from the figure that, for Case-3, the number of protocol
messages generated for the authentication of these clients and
the associated MR components used for the execution of the
jobs submitted by the clients reaches more than 2700. If the
number of clients, and/or the number of jobs submitted per
client, goes up, this message number will increase sharply.
To further examine the effects of different factors on the
scalability of the solution, we have calculated the number
of protocol messages required versus the number of MR
components used per job, the number of job submitted
per client and the number of clients submitting the jobs,
respectively. Figure 4(A) shows the number of protocol
messages generated versus the number of MR components
used per job. The figure plots the results for further three
cases by changing the number of clients (y) and the number
of jobs submitted per client to {y = 1, z = 1}, {y=2,
z=2}, and {y=3, z=3}, respectively. As can be seen from
the figure that, if there are only three clients each submitting
three jobs, then the total number of MR components required
to execute these jobs is about 70, but the number of protocol
messages required for authenticating the clients and the
1664 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
MR components are more than 4000. This is a significant
increase in comparison with the number of clients and the
number of jobs submitted by the clients, and could impose a
significant risk on creating a performance bottleneck in the
cluster. Figure 4(B) shows the number of protocol messages
generated versus the number of submitted jobs. From
the results shown in the figure, it can be seen that, when
the number of clients (y) is fixed at 3, i.e. y = 3 and the
number of MR components used per job at n = 30, as the
number of jobs submitted per client increases from 1 to 4,
the total protocol messages generated will increase from
about 500 to over 2000. Figure 4(C) shows how the number of
protocol messages increase as the number of clients accessing
the MR application increases, where the number of jobs
submitted per client and the number of MR components per
jobs are fixed.
IV. AUTHENTICATION METHODS
PUBLISHED IN LITERATURE
In addition to the authentication methods described above,
there are also methods that have been proposed for the
MR model in the research domain. These methods can largely
be classified into two groups, symmetric key based and
asymmetric key based. The authentication methods proposed
by Somu et al. [14] and Rubika et al. [15] are symmetric key
based, and their focus is on verifying the identities of clients
requesting to access an MR application. On the other hand,
the methods proposed by Wei et al. [16], Ruan et al. [18]
are an asymmetric key based. They focus on verifying the
authenticity of an MR component. In addition, the method
proposed by Zhao et al. [19] is also asymmetric key based,
but this method provides both clients’ authentication and
MR components’ authentication. In this section, we give an
overview of these methods.
A. SOMU AND RUBIKA AUTHENTICATION METHODS
Somu et al. [14] proposed an authentication method
(hereafter referred to as the Somu method) for the Hadoop
MR model. This method is symmetric key based. It is
similar to the O. Malley method in that both methods use a
single authentication factor, relying on the use of a client’s
username and password, to authenticate the client to the
MR application. However, unlike the O. Malley method, the
Somu method uses two further ideas to strengthen the security
level of the authentication service. The two ideas are: (1) the
introduction of a one-time pad key (session valid only), and
(2) the use of the principle of the separation of duties. The
ciphertext of a client’s password, encrypted using the client’s
one-time pad key, is stored in the Registration Server (one of
the two servers used to implement the authentication service)
and the ciphertext of the client’s one-time pad key, encrypted
using the client’s password, is stored in the other server, a
Backend Server. The two ideas are used in such a manner
that no passwords or encrypted passwords are sent over the
channel and no cleartext passwords are stored in any of the
two servers, thus minimize the exposure of clients’ long-term
credentials, i.e. the passwords.
FIGURE 5. Authentication steps of the Somu method.
Figure 5 depicts the authentication process using the
Somu method. As shown in the figure, two servers (the
Registration Server and the Backend Server) are involved in
an authentication process (in verifying a client’s ID). The
verification makes use of three ciphertexts, Ciphertext-1,
Ciphertext-3 and Ciphertext-4. Ciphertext-1 is the client’s
password encrypted using a one-time pad key belonging
to the client, and it is stored in the Registration Server.
Ciphertext-3 is the one-time pad key encrypted with
the user’s password and it is pre-stored in the Backend
Server. Ciphertext-4 is generated by the Registration Server
each time when an authentication request is received.
It is generated by encrypting the one-time pad key using
the user’s password. Figure 5 shows the steps of the
Somu authentication method. First, the client sends an
authentication request to the Registration Server and this
request contains the client’s username. The Registration
Server forwards this request to the Backend Server.
The Backend Server uses the username to fetch and
return Ciphertext-3 (it is pre-stored) to the client through
the Registration Server. The client decrypts Ciphertext-3
using his password, and sends the pad key back to the
VOLUME 4, 2016 1665
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
Registration Server. These steps are indicated by messages
1, 2, 3, 4 and 5, in Figure 5. The Registration Server
then uses the pad key to decrypt Ciphertext-1 to obtain the
password and then uses the password to encrypt the pad
key to generate Ciphertext-4. The Registration Server then
sends Ciphertext-4 to the Backend Server, as indicated by
messages 6, 7, and 8. Finally as indicated by messages 9, 10,
11 and 12, the Backend Server compares Ciphertext-4 with
Ciphertext-3 and if the two are equal, the Backend Server will
send a positive notification to the Registration Server, which
contains the client’s Username. The Registration Server
compares the Username received from Backend Server with
the one received from the user. If they match, then the login
process is successful.
The Somu authentication method supports client
authentication with a stronger level of protection of
clients’ long-term credentials (passwords) than the methods
discussed earlier. This protection involves the use of a
symmetric one-time pad key and two authentication servers.
A client’s password is encrypted with the one-time pad key,
the one-time pad key is encrypted with the password, and
the two encrypted items are, respectively, stored on two
different servers. To impersonate a client, an attacker needs
to guess or obtain the client’s password. Getting hold of the
client’s password by stealing the ciphertext stored on either
of the two servers is computationally difficult. For example,
if the attacker can steal Ciphertext-1 (the encryption of the
password using the one-time pad key) from the Registration
Server, to access the password, the attacker will need to guess
the pad key or to use a dictionary attack to guess the password.
However, this is computationally difficult as the pad key used
is valid for one session only. Once the client logs off a session,
a new pad key will be generated and used to reencrypt the
password [14]. The dictionary attack is also subject to the
difficulty brought by the use of the one-time pad key. If an
attacker could steal Ciphertext-3 (i.e. the encrypted pad key
using the password) from the Backend Server, then only a
dictionary attack could be used to guess the password, as
the encryption is not reversible here. Another advantage of
this authentication method is that, similar to the O. Malley
method, the Somu method does not require any transmission
of clients’ long-term credentials (e.g. paswword) over the
channel.
However, against our requirements detailed in Section II,
the Somu authentication method has two limitations. Firstly,
it only supports gate-level authentication. In other words,
it only supports the client’s authentication to the MR
application; it does not provide any mechanism to support
the authentication of one MR component to another (e.g.
the authentication of a Task Tracker to the Name Node).
Secondly, the authentication method is more costly in terms
of communication overheads than the methods discussed
earlier. The number of communication rounds, as shown in
Figure 5, which are required for only one client authentication
instance, is 4 rounds (2 messages each round). This is
1 round more than what is required by the O. Malley method
(the O. Malley method requires 3 rounds of communications
for a client to authenticate itself to access one service).
Rubika et al. [15] has also proposed an authentication
method (hereafter referred to as the Rubika method) for
the MR application. This method uses three servers for
authentication, an Authentication Server, and two backend
servers, Backend Server 1, and Backend Server 2. Figure 6
shows the registration and authentication processes of this
method. To register, a client submits his username and
password to the Authentication Server (or a password is
created for the client). The server divides the password, a
set of ASCII letter, into three values, m1, m2, and m3,
and it also generates three random numbers, c1, c2 and c3.
Then the Authentication Server uses the two sets of values,
{m1, m2, m3} and {c1, c2, c3}, to generate a new set of
values called angles that are denoted as {θ1, θ2 and θ3}. The
Username and the random numbers {c1, c2 and c3} are stored
in Backend Server 1 and the Username and {θ1, θ2 and θ3}
are stored in Backend Server 2. These two sets of values are
used to authenticate the client when the client makes an access
request to the MR application.
FIGURE 6. Registration and authentication processes of the Rubika
method.
1666 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
As described above, the Rubika method uses three servers
for authentication, but only one of the three servers, the
Authentication Server, is exposed to the public (i.e. accessible
to users). The other two servers, the backend servers, are
used to store password-verifiers. In other words, with this
approach, there is nothing related to the clients’ passwords
that are stored in the server accessible by the public.
In the Somu method, on the other hand, clients’ encrypted
passwords are stored in the registration server which is
accessible to the public. In addition, with the Rubika method,
to compromise a password by stealing the password verifier,
an attacker would have to compromise two servers, as
each password verifier is divided into two portions and
each portion is stored on a different server. These two
measures make the Rubika method more secure than the
Somu method. The authors has also claimed that, by using
the two-portion password verifiers and alienate passwords,
their method is robust against replay and password guessing
attacks. Additionally, although the Rubika method uses three
servers, rather than two as in the case of the Somu method,
the communication overhead incurred in the Rubika method
is lower than the Somu method. The Rubika method only
needs three rounds of requests and replies for one client
authentication instance. This is one round less than the Somu
method.
B. WEI’s AUTHENTICATION METHOD
Both Somu and Rubika authentication methods are designed
to support client authentication only. They do not consider
the authentication issues between different MR components.
Wei et al addressed this gap by proposing a SecureMR
Framework [16]. The Framework (hereafter referred to as
the Wei method) is aimed at protecting the integrity of
MR data processing services, namely the messages sent by
Map and Reduce tasks, and the data processed or generated
by the tasks. For the latter, both intermediate data and
final computational results from an MR job execution are
protected. For example, a Reduce task (Reducer) verifies the
authenticity of intermediate data produced by a Map task
(Mapper), and a client should verify the authenticity of the
final result generated by a Reducer. The method also supports
consistency checks of intermediate data and final results from
a MR job execution. This is done by replicating some Map
and Reduce tasks and assign them to different workers. At the
end of the computation, the master compares the results
produced by different sets of tasks. If the results are identical,
then the consistency of the results (both intermediate results
or final results) is assured.
The verification process is carried out collaboratively
between the Master and a worker (i.e. Mapper). Two protocol
messages, Assign and Commit, are used to authenticate and
verify the authenticity of both the task and data produced by
the task. For example, as shown in Figure 7, to assign a Map
task to a Mapper, the Master sends the Mapper an Assign
message containing the ID of this Mapper, MapperID, and
the location of the data, DataLocation. The Master signs the
FIGURE 7. The Wei method: to ensure message or data authenticity.
message using his private key and then encrypts the message
with the Mapper’s public key. When the Mapper receives
the Assign message, the Mapper decrypts the message by
its private key and verifies the signature using the Master
public key. Upon positive verification, the Mapper executes
the task assigned. After the task execution is completed. The
Mapper hashes each partition of the computational result
(intermediate data) and signs the hashed values by his private
key, and then constructs and sends a Commit message to
the Master. The Commit message contains the signed data
partitions of the result. Upon the receipt of this message,
the Master verifies the Commit message using the Mapper’s
public key. If the Master receives more than one Commit
message from different Mappers but for the same map task
(replicated task), the Master will compare the signed values
contained in the different Commit messages to see if they are
consistent with each other [16].
The above method is also used to ensure the authenticity
of any intermediate data assigned to a Reducer by a Master.
The Reducer verifies the authenticity of the intermediate data
which are produced by the Mapper using the Mapper’s public
key. However, the method used to verify the authenticity
of the final result produced by an MR job execution is
different from the one discussed above. In the latter case, a
secure verification component is installed into the MR client
application, the Master and client verify the authenticity of
the output data by using an additional phase, called Verify
phase, [16].
In addition to achieving message and data authenticity,
the Wei method also protects the confidentiality of protocol
messages (i.e. Assign and Commit messages). This is done by
encrypting the entire protocol message with the recipient’s
public key (after signing the message with sender’s private
key).
The major difference between the Wei method and
the Somu and Rubika methods is that the Wei method
ensures the authenticity of messages sent from one MR-Job
Component to another and from an MR-Job component to
an MR-Inf Component and the data or results produced by
the MR-Job components. These protections are provided
by using digital signatures, so the method also provides
the property of non-repudiation of origin protecting against
false denial of having generated or transmitted a message.
However, as discussed in [12] and [17], a public key
cryptosystem is computationally more costly in comparison
with a symmetric key cryptosystem, especially when it is
applied to a large-scale computational environment such as a
VOLUME 4, 2016 1667
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
Cloud environment where a large number (possibly hundreds
or thousands [5]) of jobs may need to be processed and a large
number of distributed components are involved. Furthermore,
the Wei method has an extra phase (verification phase) in
addition to the map and reduce phases. This extra phase is
used to verify the authenticity of the final result produced by
an MR job execution. The performance evaluation presented
in the paper has not considered the costs as introduced
by this extra verification phase; it has only considered the
communication costs of these scenarios: Master-to-Mapper,
Master-to-Reducer, and Mapper-to-Reducer.
C. RUAN’s AUTHENTICATION METHOD
Ruan et al have proposed a trust-based authentication solution
for the MR application, called a Trusted MapReduce (TMR)
Framework [18]. The TMR Framework uses the notion of
trust and a public key cryptosystem based authentication
method to facilitate the authentication between MR
components. The authentication process is carried out in
two phases. The first phase is for initial trust (attestation)
establishment, and is carried out when an MR component
(e.g. a worker) sends a connecting request to another MR
component (e.g. a master). The second phase is for periodical
trust updates between the worker and the master, and it is
carried out regularly during the lifetime of the job execution.
When a worker first registers with a master, it generates a pair
of public and private keys, and this pair of keys is called an
Attestation Identity Key (AIK) pair. The worker then sends
the public key to the master.
This TMR Framework is similar to the Wei method in that
it uses a public-key cryptosystem based authentication and it
is different in that it can provide continuous authentication
between different MR components. However, the TMR
Framework design has not considered the authentication
of a client to the MR application, nor the issue of secure
distribution of public keys. It assumes that the AIK public
key should either be certified by a trusted third party
(e.g. Privacy-CA) [18] before run-time, or sent in a secure
channel from one MR component (i.e. worker) to another.
In addition, with this method, the master has to keep
the public keys of all the workers to provide continuous
authentication between the master and each worker.
D. ZHAO’s AUTHENTICATION METHOD
J. Zhao et al have proposed an authentication method to
support the authentication of a client to an MR application
and authentication between a pair of MR components [19].
A user logs into the master node (of the MR application) using
his username and password. The master node has a Database
that contains users’ login information along with their access
rights. The master node verifies the password submitted by
the user. If the verification is positive, the user will be allowed
to submit a job to the MR application and a user instance is
created for the user to indicate that the user has an active job.
The subsequent authentication between the MR components
associated to the user instance (i.e. job) is achieved by using
two types of certificates, proxy and slave certificates. The
proxy certificates is used to authenticate the Job Tracker (the
master node), linked to this user instance, to Task Tracker
(the slave node, i.e. the worker), while the slave certificate
is used to authenticate the slave node to the master node.
The proxy certificate contains the public key of the master
node and CA-ID (Certificate Authority Identity). The slave
certificate contains the public key of the corresponding slave
node along with the CA-ID. When the master applies for a
proxy certificate for a user instance, a secure connection is set
up between the master node and a Certificate Authority (CA)
using the Secure Socket Layer (SSL) protocol. In this way,
both the CA and the master node can be authenticated to each
other by using this protocol. Then the master node generates
a pair of public and private keys (Mpub and Mprv) for the
user instance. The master node keeps the private key and
sends the public key to the CA through the secure channel
just established. The master node also generates a user session
which will be used for later communication with the allocated
slave nodes. The CA adds some information such as key life
time to form the first part of the proxy certificate and signs it
with CA’s private key. The same generation and certification
process is also applied to the corresponding slave certificate.
The proxy certificate is sent to all the slave nodes that are
involved in the user instance (job), and the slave certificates
are sent to the master node.
This method provides mutual authentication between a
master and a set of slave nodes involved in a user’s job.
This certificate based mutual authentication can mitigate
a number of threats such as Man-In-The-Middle (MITM)
attack between the master and slave nodes. A handshaking
protocol is used to facilitate the mutual authentication. Also,
as a secure channel is used between the CA on one side and
the master or a slave node on the other, the messages sent in
the channels are confidentiality and integrity protected.
To evaluate the performance of this method, the authors
have implemented the authentication method assuming the
following usecases: (1) one master with one slave, (2) one
master with two slaves and (3) one master with three slaves,
and 20 jobs were submitted. The results show that the
execution time taken by the master node to authenticate three
slave nodes is about the double of the execution time taken to
authenticate two slave nodes. This means that the execution
time may be excessively high if the number of nodes increases
to hundreds or even thousands. The high level cost is mainly
due to the use of the asymmetric key based cryptosystem, the
use of a third party (CA), and the need to issue and distribute
proxy and slave certificates securely.
E. QUAN’s AUTHENTICATION METHOD
Q. Quan et al. have extended the work presented
in [13] and [16] (the Malley and Wei methods), focusing on
for file authenticity protection and key exchange [20]. The
authors believed that the authentication methods proposed
in [13] and [16] mainly provide user identity and service
integrity verifications, while the most needed method to
1668 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
secure the MR model is to provide a mechanism to protect
the data itself. Based on this belief, they proposed a method
to protect data confidentiality and integrity in the MR
application. This method makes a hybrid use of the public and
symmetric key cryptosystems, i.e. a pair of MR components
use a public key cryptosystem to securely exchange a shared
symmetric key and then use this symmetric key to encrypt the
data. The following steps summarize this method.
1. Shared key exchange:
- An MR component, A, generates a symmetric
key, encrypts it using another MR component, B,
public key, and then sends the ciphertext to B.
- B decrypts it using its own private key.
- Now both A and B share the same secret key
which is used to encrypt and decrypt any data
(file) sent between the two components.
2. Data confidentiality and integrity protections:
- A’s file content (data) is hashed using a hash
function such as MD5.
- A signs the hashed value of the file content along
with other items (that form the file header), such
as file ID, file name, and time stamp, using A’s
private key, and then sends the lot to B.
- B verifies the signature using A’s public key.
- B calculates the hash value of the file content
(data) after decrypting it using the shared key.
It then compares the hash value with the hash
value sent within the encrypted (signed) header.
The merit of this method is that it does not use asymmetric
key cryptosystem for encrypting and decrypting the data itself
(as the data could be big), rather it uses it to encrypt and
decrypt a symmetric key and the file header which has a small
size in comparison to the size of the data itself (file content).
This is because of the high computational cost of using
the asymmetric key cryptosystem for a big data [12], [17].
This method ensures the authenticity and confidentiality of
the client data as well as any data sent between any two
MR components. However, this method does not provide the
authentication of data if the data are not already read by
an authenticated MR component; it assumes that any MR
component, that reads, or needs to access, a client’s data, has
already been authenticated.
Tables 3 and 4, respectively, summarize the related works
against the requirements specified for an MR authentication
service and the properties specified in Section 3 based on the
analysis conducted on the MR model in [1], [3], and [9].
V. WHAT IS MISSING
The critical analysis of the existing authentication methods,
presented in section IV, shows that some methods are
designed to support gate-level authentication (i.e. the authen-
tication of users or clients to the MR application), while
others only protect the integrity and origin authentication
of protocol messages and data sent among different MR
components. Though there are efforts on supporting mutual
authentication between different MR components, these
efforts are largely based on the use of public key credentials.
Public key (i.e. asymmetric key) based solutions require the
involvement of a third party (CA) for credential issuance
and distribution. The costs incurred in such solutions are
usually high. In addition, these methods have not considered
mutual authentication between an MR-Job Component and
an MR-Inf. Component (Name Node). Mutual authentication
between an MR-Job Component and an MR-Inf. Component
is necessary and important as the former need to request for
data (e.g. input files) or other resources from the latter during
a job execution.
The lack of an adequate authentication service specifically
designed for the MR model will make the model vulnerable
to security threats and attacks. The threats and attacks are
not just those in relation to identity thefts, impersonation
or replays attacks. A successful compromise of a client’s
account with an MR application will give attackers a better
chance to launch other attacks, gaining unauthorized access to
data and/or interrupt other job executions. This is particularly
the case if the MR model is deployed in a shared environment.
To address this open issue, the next section presents
a high-level analysis of the MR model, highlighting the
functionality of its components and the interactions among
the components when executing a job submitted by a client.
This analysis will lead to our initial idea of using a layered
approach to authentication in the MR model being deployed
in a shared environment.
VI. HIGH LEVEL ANALYSIS AND IDEA
From the GMC model shown in Figure 1, we can see that,
when executing a job (i.e. in a job execution flow), multiple
MR components are involved. Each component executes a
well-defined function and the multiple components interact
with one another to collaboratively accomplish the job
execution [2], [3], [9]. The MR components, either of one
job or multiple jobs submitted by a single or multiple clients,
are hosted in two shared clusters, Processing Framework (PF)
and Distributed File System (DFS). Each interaction between
a pair of MR component is a client-server interaction
and must be authenticated. Examples of these client-server
interactions are the reading and writing requests made to
the Name Node. Each request is a procedure call. These
calls could be for (i) reading data (job resource) (typically
initiated by a Job Tracker or a Task Tracker), or (ii) writing
data submitted by a client or produced from a job execution
(e.g. input and output data of a job) (typically initiated,
respectively, by clients and Reduce Tasks). Other examples
of the client-server interactions are those initiated to and from
the Resource Manager. When a client submits a new job,
he needs to make a new job-submission request. Upon the
receipt of the job-submission request, the Resource Manager
needs to make a resources-allocation request to a Job Tracker.
All these requests are interactions and are made by using
procedure calls. In other words, these calls could be for
(i) submitting a new job (typically initiated by clients), or
VOLUME 4, 2016 1669
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
TABLE 3. Related works versus design requirements for authentication services for the MR model.
(ii) allocating a Job Tracker to master the Map and Reduce
Tasks assigned to different Task Trackers related to a job
execution.
Depending on their functionalities, the interactions
involved in a job execution can be classified into three
groups: (i) those for submitting a job, (ii) those for allocating
resources for the execution of the job, and (iii) those for
reading or writing data related to the job execution.
The first group (Group-1) of interactions takes place when
a client submits a job to the MR model. To submit a job, the
client makes the job submission via the Resource Manager,
and writes the data for the job execution into the Name Node.
The Resource Manager is the master node in the PF cluster,
and the Name Node is the master node in the DFS cluster.
In other words, the first group of interactions is between a
client and the two master nodes, one in each cluster. It should
be emphasized that one client may submit multiple jobs, and
there will be multiple clients submitting jobs. Hereafter we
shall refer the interactions taking place for the submission
and execution of a single job as one set, and one such set
of interactions consists of the interactions from all the three
groups, i.e. {the set of interactions for the execution of one
job} = {a subset of Group-1 interactions}+{a subset of
Group-2 interactions}+{a subset of Group-3 interactions},
where all the subsets are all related to the submission and
execution of a particular job.
1670 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
TABLE 4. Protection against some security threats for proposed methods.
The interactions in the second group (Group-2) are for
allocating job resources. Here, for each job, three nodes are
involved in this group of interactions, the Resource Manager,
the Name Node and a Job Tracker allocated for the job.
When a job is admitted, the Resource Manager will allocate
a Job Tracker for the job. This Job Tracker will be assigned
(i.e. allocated) multiple Task Trackers. The Map and Reduce
tasks of this job will be executed on these Task Trackers. The
Group-2 interactions also include the interactions carried out
by the Name Node to manage and maintain a set of Data
Nodes. The Data Nodes host data for all the jobs that are
submitted. As can be seen from our discussions here, the
functionalities of the Resource Manager and Name Node are
for managing the executions of all the jobs that are submitted
by the same client or by different clients. The Resource
Manager and the Name Node serve all the jobs submitted.
They are shared by different jobs and are identified by static
identities. These components are not invoked because of a job
submission; they are there to serve every job submitted by
any client. It is for this reason, the two components are called
MR Infrastructure components (i.e. MR-Inf. Components, for
short) As mentioned above, the Group-2 interactions are for
providing resources for the executions of jobs. As shown
in the GMC model (Figure 1), for a single job execution,
these interactions are those of Resource Manager to Job
Tracker (RM-JT), Job Tracker to Task Tracker (JT-TT), and
Name Node to Data Node (NN-DN). These interactions are
different from Group-1 interactions, as these are performed
for accomplishing cluster functions and for serving the
execution of a particular job. This is due to the following
observations: (i) both Job Trackers and Data Nodes are,
respectively, the slave nodes of PF and DFS clusters, (ii) any
interactions initiated by a cluster master node (i.e. RM or NN)
towards a cluster slave node (i.e. RM to JT, or NN to DN), or
from a slave node of the PF cluster to another slave node in
the same cluster, do not involve any access (read, write or
retrieve1) to any data of a job. Also, the RM and the NN are
the only RM and NN, which initiate Group-2 interactions, and
the JT is the only JT that initiates Group-2 interactions for a
particular job. In other words, there is no other RM or NN to
1Retrieve involves both read and write; read from remote server (DN) and
write locally to another server (TT).
initiate such interactions in the cluster, and there is no another
JT to initiate such interactions for the same job [2].
The third group (Group-3) interactions are for executing
a job submitted by a client. For each job, four types
of components are involved in this group of interactions
(i.e. these components, assigned to a particular job, initiate
this group of interactions): the Job Tracker (JT), Task
Trackers (TTs), Map Tasks (MTs) and Reduce Tasks (RTs).
In executing the job, the JT retrieves the input splits of the
data (the data of the job submitted by a particular client) from
the DFS. The JT can then start managing the tasks (MTs
and RTs). TTs also retrieve the data (for the job execution)
from the DFS. TTs can then start executing the MTs and
RTS assigned to them. In executing the tasks, RTs read
the intermediate data, which is produced by MTs, from the
respective Task Trackers. RTs also write the output results
of their computations into the DFS. Group-3 interactions
can actually be seen as different subgroups (subsets) of
interactions, each subgroup of them is performed by a set of
components invoked for a particular job. In other words, the
set of components are a JT that is invoked for a particular
job, and TTs, MTs and RTs, all of which are associated to
the JT. This set of components are created or invoked when
a job is submitted, and they are terminated or reassigned
when the job execution is completed. The existence of this
set of components is purely for serving this job. Therefore
these components are identified by dynamic identities, and
the identities are short-lived and so are the secrets issued to
them. For this reason, we refer them as MR-Job Components.
We can further use an example to explain the Group-3
interactions. As shown in the GMC model (Figure 1),
during the execution of this particular job (i.e. in a job
execution flow), this set of Group-3 interactions are for
executing this job and can be identified as follows: Job
Tracker to Name Node (JT-NN), Task Tracker to Name
Node (TT-NN), Reduce Task to Task Tracker (RT-TT), and
Reduce Task to Name Node (RT-NN). These interactions
are for executing/processing the job, i.e. they perform
(belong to) jobs functions. Group-3 interactions may be
invoked concurrently by MR-Job Components serving
different jobs. Because of this we need to distinguish the
interactions based on Job IDs, i.e. which job a particular
interaction, or a set of interactions, actually serve.
VOLUME 4, 2016 1671
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
Three groups of interactions and the MR components
performing the interactions take part in executing a
client’s job, but neither the interactions nor the operations
(or functionalities) of the components are in the control of
the client. The components perform their functionalities and
interactions to execute the client’s job (or process the client
data) on behalf of the client. Therefore, there is an open issue
here, i.e. how could a client trust such a shared computational
environment? This issue is particularly important if the data
processed by the job are privacy or security sensitive.
To achieve effective authentication in an MR environment,
the authentication solution should capture the characteristics
of this environment. The characteristics that should be
captured in the design of an authentication solution for
MR can be summarized as follows: (1) this is a shared
environment with one or more clusters; (2) each cluster
hosts a set of distributed MR components, and these
components can be classified into MR-Inf. Components
and MR-Job Components; (3) the MR-Job Components are
job-dependent, i.e. they are invoked for a particular job
submitted by a client; (4) multiple jobs submitted by the
same client or by different clients may be hosted by the
MR environment, or, in other words, the MR environment
typically executes multiple jobs submitted by the same client,
and/or different clients, at any one given time. Based on these
observations, we can single out the set of MR components
(MR-Inf. Components and MR-Job Components) that are
involved in executing a particular job and give this set of
component a name, an MR-Job Domain. In other words, each
job will have a unique identity and this identity is also used
to index an MR-Job Domain that refers to the set of MR
components involved in serving a particular job.
We here propose a domain-based authentication approach
for the newer MR implementation. The novel idea behind this
approach is that the MR components that serve a particular
job is singled out as a MR-Job Domain and the components
in this MR-Job Domain are responsible for authenticating
themselves to each other. This is actually an idea of isolation,
i.e. we isolate the MR-Inf. Components and MR-Job
Components, which are involved in executing a given job,
into one set, and require this set of components to authenticate
to each other, so that only the components in this set are
allowed to access the resource belonging to (or owned by)
this particular job and any component outside this domain
is not allowed to access to the resource. To implement this
idea, we here propose a novel framework, named as the
Virtual Domain based Authentication Framework (VDAF).
This framework is said to be virtual domain based, because
(1) MR-Job Components are dynamic, (2) more than one
MR-Job Domain may co-exist at any one given time in the
two clusters (PF and DFS), and (3) these MR-Job Domains
work on the top of another group of entities, master and slave
nodes of the two clusters. From this point on, this latter group
of entities, i.e. master and slave nodes of the two clusters,
is referred to as the shared cluster infrastructure (Inf.)
domain (i.e. MR-Inf. Domain). The MR-Inf. Domain should
have its own authentication method, referred to as MR-Inf.
Authentication. The MR-Inf. Authentication method is likely
to be different from VDAF, the authentication method we
propose for a MR-Job Domain. The classification of MR
components into different groups, the structure of the groups
into different layers, and use of different authentication
methods for different layers indicate a layered approach to
MR authentication. The next section describes a layered
authentication model we propose for MR, i.e. the MR
Layered Authentication Model.
A. MR LAYERED AUTHENTICATION MODEL
Based on the analysis and discussions in the section
above, we propose to use the MR Layered Authentication
Model (MR-LAM) to realize the whole task of authentication
for MapReduce. As shown in Figure 8, MR-LAM
consists of three authentication layers. These are MR-Inf.
Domain Authentication Layer (Layer-1), MR-Job Domain
Authentication Layer (Layer-2), and MR Components
Authentication Layer (Layer-3).
FIGURE 8. MR layered authentication model (A layered approach to
authentication).
The first layer, the MR-Inf. Domain Authentication Layer,
serves the authentication of both the clusters’ server nodes
and the MR components (MR-Inf Components and MR-Job
Components). Layer-1, the MR-Inf. Domain Authentication
Layer, is responsible for the authentication of any new
physical node joining in a cluster of the MR application and
the mutual authentication between any pair of physical nodes
in the cluster. In other words, an Authentication Service (AS)
for this layer should support two authentication tasks. The
first task is the initial authentication of any new server
nodes wanting to join the MR-Inf. Domain. A node should
only be admitted to becoming a member of the MR-Inf.
Domain if the node has been successfully authenticated.
For example, considering the case shown in Figure 1, if
many jobs are submitted and the Resource Manager in the
PF cluster is running out of resources on the slave nodes
and/or if the IT administrator, looking after the cluster, has
1672 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
decided to bring in a new slave node into the cluster, then
the new node should be authenticated by the AS before it
is allowed to become a member of this cluster. The second
task performed by Layer-1 authentication service is to support
a continuous mutual authentication among the server nodes
in the MR-Inf. Domain. These two authentication tasks can
be achieved using the Kerberos authentication solution. The
Kerberos solution is a preferred authentication method for
the MR-Inf. Domain, because the Kerberos is still the default
mode of authentication service already deployed for the
cluster infrastructure in private clouds, and many operating
system (OS) of the clusters’ server nodes, such as Microsoft
windows OS and Red Hat OS [21]–[23], already support this
authentication method. Basically, if the Kerberos solution is
used as the default mode of the authentication service in
a cluster, all the slave nodes (i.e. all the members of the
MR-Inf. Domain) in the cluster should support this mode of
authentication service. They will use Kerberos to authenticate
themselves to the master node in the cluster and to establish
shared secret keys between the services hosted by the slave
and master nodes (i.e. MR components). In this case, the
MR-Inf. Domain authentication service is provided through
the use of the Kerberos solution. However, the clients who
have their jobs admitted into the MR cluster are not members
of the MR-Inf. Domain themselves, but they should be
authenticated before their jobs could be admitted. As a client
will be part of a MR-Job Domain at Layer-2, a client should
be authenticated by the authentication service provided at
Layer-2. In other words, the second layer of the authentication
model, the MR-Job Domain Authentication Layer, should be
able to provide means for clients’ authentication.
For Layer-2, i.e. the MR-Job Domain Authentication
Layer, the authentication task is for mutual authentication
among the MR components that are involved in serving a
particular job. In other words, for this layer of authentication,
an authentication method, different from the one used in
Layer-1, may be used. As mentioned earlier, we propose
to use the VDAF to support this layer of authentication.
With VDAF, the MR components that serve a particular
job are collectively referred to as a MR-Job Domain. The
components in a MR-Job Domain perform authentication
among themselves. At this layer, each client is registered
with an AS. Upon successful registration, the AS issues
a long-term access credential to the client so that the
client can use this credential to submit jobs to the MR
application. During the execution of this job, the client
would be able to make use of the MR components
(MR-Inf Components and MR-Job Components) or resources
provided through these components, so long as these
components and resources are assigned to the job. Also,
at Layer-2, the authentication method and protocols used
by each MR-Job Domain are expected to be the same,
but the secrets used in each such domain are different and
they should be protected against exposure to other domains.
As mentioned earlier, each MR-Job Domain has its own
MR-Job Components involved in the execution of a job
submitted by a particular client. The client generates and
manages the credentials (authentication secrets and other
data) used to authenticate the MR-Job Components in this
domain. In other words, this layer is responsible for providing
the identification and authentication service by which the
MR-Job Components assigned to each job can be securely
identified and authenticated and do so at every interaction
among themselves throughout the execution cycle of the job.
Also, the Resource Manager, which manages the resources
of the MR-Inf. Domain, is also involved in the authentication
of all the MR-Job Domains. For the MR-Job Domain level
(i.e. Level-2) authentication, the Resource Manager works
as a relay to deliver the access credentials of each MR-Job
Domain2 to the client and the MR-Job Components in that
Domain (more details to follow in our future work).
The third layer (Level-3) is the MR Components Authenti-
cation Layer. As mentioned earlier, some of the MR compo-
nents are shared by more than one job (these components
will carry static and semi-permanent IDs), while others are
exclusively used by the tasks created for a particular job
submitted. These components will carry dynamic IDs – they
are created when the job is created, but discarded when the
execution of the job is completed. Hence, at Level-3, it is
assumed that each of the MR components (either an MR-Inf
Component or an MR-Job Component) has an authentication
module. These authentication modules, depending on the
hosting MR components, can be respectively named as Job
Tracker Authentication (JT-AuthN) Modules, Task Tracker
Authentication (TT-AuthN) Modules, Map Task
Authentication (MT-AuthN) Modules, Reduce Task Authenti-
cation (RT-AuthN) Modules, Name Node Authenti-
cation (NN-AuthN) Module, Data Node Authentication
(DN-AuthN) Modules, and Resource Manager
Authentication (RM-AuthN) Relay Module. By embedding
these authentication modules into their respective MR
components, we can provide MR-Inf. Components and
MR-Job Components with authentication services supporting
the authentication among themselves and preventing
unauthorized access to data or resources assigned to
(or owned by) a particular job domain. The implementation
of the Layer-2 and Layer-3 authentication services will be
described in more details in our future paper.
VII. CONCLUSION AND FUTURE WORK
This paper has critically analyzed existing authentication
methods designed for the MR model. It has also presented
a high-level analysis of how an authentication service may
be provided for the MR model and given a high-level idea of
using a layered approach to the authentication in this context.
The analysis of existing authentication methods has indicated
that providing an inadequate authentication service to the MR
model or deploying an authentication service that fails to
capture the characteristics of the MR model would put clients’
2The access credentials of an MR-Job Domain are the access credentials
of both the client and the MR-Job Components of the MR-Job Domain.
VOLUME 4, 2016 1673
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
jobs and the resources hosted in an MR application at a high
level of risks.
Providing an adequate authentication service for the MR
model is a challenging task. This is due to the characteristics
that the MR model is usually deployed in a shared
infrastructural environment, and in such an environment,
it is difficult to distinguish between a compromised and a
trustworthy MR component. In addition, the hosting nodes
in this environment are distributed, and possibly provided by
multiple providers.
The VDAF facilitates authentication on per job basis
(i.e. Job Authentication (Job-AuthN)) and do so during
the entire execution cycle of the job. It covers a chain of
authentication tasks, namely, (i) from a user to the user’s
client running on the user’s machine, (ii) from the client to
the MR application (using the AS), and (iii) messages sent
by the MR components; these are transactions sent from an
MR-Job Component to an MR-Inf Component and from one
MR-Job Component to another MR-Job Component. Among
these authentication tasks, from (i) to (iii), data authentication
should be provided. In other words, an authentication solution
for MR should not only guard the gate into the system, but
also guard every resource access during the execution of
a job, as the execution involves multiple MR components
distributed across multiple nodes of both PF and DFS
clusters. Implementing a simple but effective authentication
solution for the MR model is needed. So as part of our
work to design such a solution, this paper has also given a
high-level overview of how this solution may be designed.
Our proposed idea is to use a layered approach to tackle
the complex task of authentication in MR. This approach
takes into account of the authentication requirements of all
the components of the MR model. The authentication service
should be provided at multiple levels and do so securely
and efficiently. To satisfy the authentication requirements as
detailed in R2.1 to R2.3, we have analyzed and identified
all the possible interactions among the MR components.
The interactions of C-RM, C-NN, JT-NN, TT-NN, RT-TT
and RT-NN have been considered in our design of VDAF.
We will also address the requirement of R2.4 and R2.5, taking
into account of the interactions of RM-JT, NN-DN and JT-TT,
in the design of an authentication solution for the MR-Inf.
Domain Authentication Layer.
The way we design the VDAF can also make the MR
model be delivered as a Software as a Service (SaaS) in
a public cloud environment. This is due to two factors.
First, the MR-Inf. Domain is likely to be managed remotely
from a central location by an MR provider and not by a
client, and the client is not involved in issuing and managing
(i.e. is not in control of) the authentication credentials or
secrets of the members of the MR-Inf. Domain (i.e. the
MR-Inf components which are actually the master nodes
of both PF and DFS clusters) and MR-Job Component’s
Hosting Nodes (which are actually the slave nodes of both
clusters). In other words, a client does not have to install
or configure the master and slave nodes of both shared
clusters, the client is not involved in issuing and managing
the authentication credentials or secrets of the MR-Inf
components and the MR-Job Components’ Hosting Nodes,
rather the client controls the authentication credentials or
secrets of the MR-Job Components that are involved in
executing his job (i.e. his own MR-Job Domain). Therefore,
a client only needs to make a new job-submission request
which includes uploading the data of his MR-Job Domain.
Secondly, an MR application delivers its computing services
in a ‘‘one-to-many’’ manner. This means that one MR-Inf.
Domain hosts multiple MR-Job Domains and each MR-Job
Domain serves the execution of a particular job submitted by
a particular client. Each client has his own MR-Job Domain
secrets (i.e. isolated from other MR-Job Domains), and the
client controls the credentials or secrets of his own MR-Job
Domain.
The detailed design of the VDAF for MR-Job Domains will
be presented in our future work.
ACKNOWLEDGMENT
The authors would like to thank their colleagues from
the School of Computer Science at the University of
Manchester, Manchester, U.K., who provided facilities that
greatly assisted this paper. The authors would also like to
thank the reviewers for their valuable comments to this paper.
REFERENCES
[1] J. Dyer and N. Zhang, ‘‘Security issues relating to inadequate
authentication in MapReduce applications,’’ in Proc. Int. Conf. High
Perform. Comput. Simulation (HPCS), Jul. 2013, pp. 281–288.
[2] T. White, ‘‘How the MapReduce works,’’ in Hadoop: The Definitive Guide,
3rd ed. Tokyo, Japan: O’Reilly Inc., 2012.
[3] I. Lahmer and N. Zhang, ‘‘MapReduce: MR model abstraction for future
security study,’’ in Proc. 7th Int. Conf. Secur. Inf. Netw., 2014, pp. 392–398.
[4] C. Lam, ‘‘Introducing hadoop, and managing hadoop,’’ in Hadoop in
Action. Greenwich, U.K.: Manning Publications Co, 2010.
[5] P. Zikopoulos, C. Eaton, D. Deroos, T. Deutsch, and G. Lapis,
Understanding Big Data: Analytics for Enterprise Class Hadoop and
Streaming Data. New York, NY, USA: McGraw-Hill, 2012.
[6] J. Dean and S. Ghemawat, ‘‘MapReduce: Simplified data processing on
large clusters,’’ Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008.
[7] J. Xiao and Z. Xiao, ‘‘High-integrity MapReduce computation in cloud
with speculative execution,’’ in Theoretical and Mathematical Foundations
of Computer Science. Heidelberg, Germany: Springer-Verlag, 2011,
pp. 397–404.
[8] B. Lakhe, ‘‘Introducing Hadoop and its security,’’ in Practical Hadoop
Security. New York, NY, USA: Apress, 2014.
[9] I. Lahmer and N. Zhang, ‘‘MapReduce: A security analysis and
authentication requirement specification,’’ in Proc. 2nd Int. Conf. Comput.
Inf. Syst. (ICCIS), World Congr. Comput. Appl. Inf. Syst., 2015, pp. 65–71.
[10] D. A. B. Fernandes, L. F. B. Soares, J. V. Gomes, M. M. Freire, and
P. R. M. Inácio ‘‘Security issues in cloud environments: A survey,’’ Int.
J. Inf. Secur., vol. 13, no. 2, pp. 113–170, Apr. 2014.
[11] J. M. Kizza, ‘‘Cloud computing and related security issues,’’ in Guide
to Computer Network Security. London, U.K.: Springer-Verlag, 2013,
pp. 465–489.
[12] A. Kumar, S. Jakhar, and S. Makkar, ‘‘Comparative analysis between DES
and RSA algorithms,’’ Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 2,
no. 7, pp. 386–391, Jul. 2012.
[13] O. O’Malley, K. Zhang, S. Radia, R. Marti, and C. Harrell, ‘‘Hadoop
security design,’’ Yahoo, Inc., Sunnyvale, CA, USA, Tech. Rep., 2009.
[14] N. Somu, A. Gangaa, and V. S. S. Sriram, ‘‘Authentication service in
Hadoop using one time pad,’’ Indian J. Sci. Technol., vol. 7, pp. 56–62,
Apr. 2014.
1674 VOLUME 4, 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR
[15] S. Rubika, G. S. Sadasivam, and K. A. Kumari, ‘‘A novel authentication
service for Hadoop in cloud environment,’’ in Proc. IEEE Int. Conf. Cloud
Comput. Emerg. Markets (CCEM), Oct. 2012, pp. 1–6.
[16] W. Wei, J. Du, T. Yu, and X. Gu, ‘‘SecureMR: A service integrity assurance
framework for MapReduce,’’ in Proc. ACSAC, Dec. 2009, pp. 73–82.
[17] B. Padmavathi and S. R. Kumari, ‘‘A survey on performance analysis of
DES, AES and RSA algorithm along with LSB substitution technique,’’
Int. J. Sci. Res., vol. 2, no. 4, pp. 170–174, 2013.
[18] A. Ruan and A. Martin, ‘‘TMR: Towards a trusted MapReduce
infrastructure,’’ in Proc. IEEE 8th World Congr. Services, Jun. 2012,
pp. 141–148.
[19] J. Zhao, J. Tao, and A. Streit, ‘‘Enabling collaborative MapReduce on
the cloud with a single-sign-on mechanism,’’ Computing, vol. 98, no. 1,
pp. 55–72, Jan. 2014.
[20] Q. Quan, W. Tian-Hong, Z. Rui, and X. Ming-Jun, ‘‘A model of cloud data
secure storage based on HDFS,’’ in Proc. 12th IEEE Int. Conf. Comput.
Inf. Sci. (ICIS), Jun. 2013, pp. 173–178.
[21] Microsoft Technical Team. (2016). Securing Server Clusters, Microsoft
Technet Library, accessed on Jan. 20, 2016. [Online]. Available: https://
technet.microsoft.com/en-us/library/cc785088%28v=ws.10%29.aspx
[22] Microsoft Technical Team. (2016). Applying Kerberos Authentication
in a Clustered Environment, Microsoft Technet Library, accessed on
Jan. 20, 2016. [Online]. Available: https://technet.microsoft.com/enus/
library/cc738070%28v=ws.10%29.aspx
[23] Red Hat Technical Team. (2015). ‘Creating Domains: Kerberos
Authentication’ in Deployment Guide: Deployment, Configuration
and Administration of Red Hat Enterprise Linux 6, accessed
on Jan. 21, 2016. [Online]. Available: https://access.redhat.com
/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_
Guide/Configuring_Domains-Setting_up_Kerberos_Authentication.html
[24] I. Lahmer and N. Zhang, ‘‘MapReduce: A critical analysis of existing
authentication methods,’’ in Proc. 10th Int. Conf. Internet Technol. Secured
Trans. (ICITST), Dec. 2015, pp. 302–313.
IBRAHIM LAHMER received the B.Sc. (Hons.)
degree in computer engineering from the
University of Tripoli, Libya, in 2008, and the
M.Sc. (Hons.) degree in computer and network
security from Middlesex University, London,
U.K., in 2010. He became CCENT, CCNA R&S,
CCNA Security, and MCSA Certified, as he
works as a Network and Security Administrator
with National Oil Corporation for three years.
Currently, he has been sponsored to do a research
on computer networking and security with the School of Computer Science,
The University of Manchester. His research interests includes authentication
in distributed systems. He received the British Computing Society Prize for
the best postgraduate computing project in London 2011.
NING ZHANG received the B.Sc. degree in
electronics engineering from Dalian Maritime
University, Dalian, China, and the Ph.D. degree
in electronics engineering from the University of
Kent, Canterbury, U.K. She is currently a Senior
Lecturer with the School of Computer Science,
The University of Manchester, Manchester, U.K.
Her current research interests include security
in networked and distributed systems, applied
cryptography, data privacy, and trust and digital
right managements. She has authored papers and acted as referees and
reviewers in these topic areas.
VOLUME 4, 2016 1675
www.redpel.com +917620593389
www.redpel.com +917620593389

More Related Content

PDF
Multi dimensional customization modelling based on metagraph for saas multi-t...
PDF
Parallel and Distributed System IEEE 2015 Projects
PDF
Parallel and Distributed System IEEE 2015 Projects
PDF
PDF
A Metamodel and Graphical Syntax for NS-2 Programing
PDF
Concept for a web map implementation with faster query response
PDF
11.concept for a web map implementation with faster query response
PDF
Burr Type III Software Reliability Growth Model
Multi dimensional customization modelling based on metagraph for saas multi-t...
Parallel and Distributed System IEEE 2015 Projects
Parallel and Distributed System IEEE 2015 Projects
A Metamodel and Graphical Syntax for NS-2 Programing
Concept for a web map implementation with faster query response
11.concept for a web map implementation with faster query response
Burr Type III Software Reliability Growth Model

What's hot (6)

PDF
Static Analysis of Computer programs
PDF
M035484088
PDF
AN OPEN JACKSON NETWORK MODEL FOR HETEROGENEOUS INFRASTRUCTURE AS A SERVICE O...
PDF
Building a new CTL model checker using Web Services
PDF
Intelligent Workload Management in Virtualized Cloud Environment
PDF
International Journal of Computational Engineering Research(IJCER)
Static Analysis of Computer programs
M035484088
AN OPEN JACKSON NETWORK MODEL FOR HETEROGENEOUS INFRASTRUCTURE AS A SERVICE O...
Building a new CTL model checker using Web Services
Intelligent Workload Management in Virtualized Cloud Environment
International Journal of Computational Engineering Research(IJCER)
Ad

Similar to Towards a virtual domain based authentication on mapreduce (20)

PPT
Tri hug 2010 wei
PDF
Global bigdata conf_01282013
PDF
Big data security and privacy issues in the
PDF
BIG DATA SECURITY AND PRIVACY ISSUES IN THE CLOUD
PDF
Enhancing highly-collaborative access control system using a new role-mappin...
PDF
Cloud Security and Data Integrity with Client Accountability Framework
PDF
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017
PDF
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTS
PDF
Actor Critic Approach based Anomaly Detection for Edge Computing Environments
PDF
SecRBAC: Secure data in the Clouds
PDF
Conference Paper: Enabling Privacy Mechanisms in Apache Storm
PDF
D04501036040
PPTX
Взгляд на облака с точки зрения HPC
PDF
NetApp CTO Predictions 2018
PDF
Data Sharing with Sensitive Information Hiding in Data Storage using Cloud Co...
PPTX
Building an enhanced task based access control in cloud
PDF
THE CRYPTO CLUSTERING FOR ENHANCEMENT OF DATA PRIVACY
PDF
Design of access control framework for big data as a service platform
PDF
A provenance policy based access
PDF
International Journal of Computational Engineering Research(IJCER)
Tri hug 2010 wei
Global bigdata conf_01282013
Big data security and privacy issues in the
BIG DATA SECURITY AND PRIVACY ISSUES IN THE CLOUD
Enhancing highly-collaborative access control system using a new role-mappin...
Cloud Security and Data Integrity with Client Accountability Framework
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTS
Actor Critic Approach based Anomaly Detection for Edge Computing Environments
SecRBAC: Secure data in the Clouds
Conference Paper: Enabling Privacy Mechanisms in Apache Storm
D04501036040
Взгляд на облака с точки зрения HPC
NetApp CTO Predictions 2018
Data Sharing with Sensitive Information Hiding in Data Storage using Cloud Co...
Building an enhanced task based access control in cloud
THE CRYPTO CLUSTERING FOR ENHANCEMENT OF DATA PRIVACY
Design of access control framework for big data as a service platform
A provenance policy based access
International Journal of Computational Engineering Research(IJCER)
Ad

More from redpel dot com (20)

PDF
An efficient tree based self-organizing protocol for internet of things
PDF
Validation of pervasive cloud task migration with colored petri net
PDF
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
PDF
Toward a real time framework in cloudlet-based architecture
PDF
Protection of big data privacy
PDF
Privacy preserving and delegated access control for cloud applications
PDF
Performance evaluation and estimation model using regression method for hadoo...
PDF
Frequency and similarity aware partitioning for cloud storage based on space ...
PDF
Multiagent multiobjective interaction game system for service provisoning veh...
PDF
Efficient multicast delivery for data redundancy minimization over wireless d...
PDF
Cloud assisted io t-based scada systems security- a review of the state of th...
PDF
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
PDF
Bayes based arp attack detection algorithm for cloud centers
PDF
Architecture harmonization between cloud radio access network and fog network
PDF
Analysis of classical encryption techniques in cloud computing
PDF
An anomalous behavior detection model in cloud computing
PDF
A tutorial on secure outsourcing of large scalecomputation for big data
PDF
A parallel patient treatment time prediction algorithm and its applications i...
PDF
A mobile offloading game against smart attacks
PDF
A distributed video management cloud platform using hadoop
An efficient tree based self-organizing protocol for internet of things
Validation of pervasive cloud task migration with colored petri net
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Toward a real time framework in cloudlet-based architecture
Protection of big data privacy
Privacy preserving and delegated access control for cloud applications
Performance evaluation and estimation model using regression method for hadoo...
Frequency and similarity aware partitioning for cloud storage based on space ...
Multiagent multiobjective interaction game system for service provisoning veh...
Efficient multicast delivery for data redundancy minimization over wireless d...
Cloud assisted io t-based scada systems security- a review of the state of th...
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
Bayes based arp attack detection algorithm for cloud centers
Architecture harmonization between cloud radio access network and fog network
Analysis of classical encryption techniques in cloud computing
An anomalous behavior detection model in cloud computing
A tutorial on secure outsourcing of large scalecomputation for big data
A parallel patient treatment time prediction algorithm and its applications i...
A mobile offloading game against smart attacks
A distributed video management cloud platform using hadoop

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Lesson notes of climatology university.
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
Pharma ospi slides which help in ospi learning
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
TR - Agricultural Crops Production NC III.pdf
Insiders guide to clinical Medicine.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Sports Quiz easy sports quiz sports quiz
Lesson notes of climatology university.
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
GDM (1) (1).pptx small presentation for students
2.FourierTransform-ShortQuestionswithAnswers.pdf
RMMM.pdf make it easy to upload and study
Pharma ospi slides which help in ospi learning
O7-L3 Supply Chain Operations - ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
Complications of Minimal Access Surgery at WLH

Towards a virtual domain based authentication on mapreduce

  • 1. Received February 4, 2016, accepted March 10, 2016, date of publication April 27, 2016, date of current version May 9, 2016. Digital Object Identifier 10.1109/ACCESS.2016.2558456 Towards a Virtual Domain Based Authentication on MapReduce IBRAHIM LAHMER AND NING ZHANG School of Computer Science, The University of Manchester, Manchester M13 9PL, U.K. Corresponding author: I. Lahmer ([email protected]) This research was sponsored by the Ministry of Higher Education and Scientific Research of Libya and partially supported by National Oil Corporation Libya (NOC-Libya). ABSTRACT This paper has proposed a novel authentication solution for the MapReduce (MR) model, a new distributed and parallel computing paradigm commonly deployed to process BigData by major IT players, such as Facebook and Yahoo. It identifies a set of security, performance, and scalability requirements that are specified from a comprehensive study of a job execution process using MR and security threats and attacks in this environment. Based on the requirements, it critically analyzes the state-of-the-art authentication solutions, discovering that the authentication services currently proposed for the MR model is not adequate. This paper then presents a novel layered authentication solution for the MR model and describes the core components of this solution, which includes the virtual domain based authentication framework (VDAF). These novel ideas are significant, because, first, the approach embeds the characteristics of MR-in-cloud deployments into security solution designs, and this will allow the MR model be delivered as a software as a service in a public cloud environment along with our proposed authentication solution; second, VDAF supports the authentication of every interactions by any MR components involved in a job execution flow, so long as the interactions are for accessing resources of the job; third, this continuous authentication service is provided in such a manner that the costs incurred in providing the authentication service should be as low as possible. INDEX TERMS MapReduce, authentication for mapreduce, cloud computing security, security requirements, security threats. I. INTRODUCTION MapReduce (the MR model) is a new parallel programming paradigm. It is proposed to process large volumes of data. Data processing is carried out in two phases: map and reduce. The map phase takes a set of data and converts it into another set of data called key/value pairs to produce the intermediate results of the MR computation. The reduce phase then takes these intermediate results as its input and combines these data to produce an output and this output is the final result of the MR computation. More details as how MR works can be found in [1]–[3]. To carry out the two-phase MR computation, a set of distributed nodes (hereafter referred to as MR components) are used. Figure 1 shows a Generic MapRedcue Computational (GMC) model that we have constructed based on the most recent MR application framework [1]–[3]. From the figure, it can be seen that a distributed set of MR components interact with each other and collaboratively execute a client’s job. The entire process for this job execution, i.e. from when the job is submitted to when the final computational result is ready for collection, is referred to as a job execution flow (or a job work-flow). The MR components can generally be classified into two main categories: master nodes and slave nodes. The Resource Manager and Name Node, shown in Figure 1, are examples of master nodes, and the rest are slave nodes. In this version of the MR model implementation, a client submits his job to the Resource Manager. The Resource Manager assigns the tasks of the job to a set of slave nodes that contains containers to run the Map and Reduce Tasks. However, in the classic MR model implementation [1], [2], a client submits a job to the Job Tracker directly and the Job Tracker then assigns Map and Reduce Tasks to a set of slave nodes (indicated in Figure (1) by using dash-dot lines labeled as ‘(3), (4), and (7)’). The two sets of MR components, respectively run on two large clusters of nodes are typically referred to as the Processing Framework (PF) cluster and Distributed File System (DFS) cluster [3]. The GMC model, shown in Figure 1, is derived to capture the interactions among different MR components in the newer MR model implementation (although what has been captured can also be applied to the classic MR model implementation). More details about the MR components, and their 1658 2169-3536 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 2. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR FIGURE 1. Job execution work-flow in the GMC model. functionalities, of both versions of the MR model implementations (i.e. MR application frameworks) are available in [2]. The MR model, owing to its scalability, robustness and simple to use as a parallel and distributed programming framework, is becoming more and more widely used [4], [5]. Hadoop, an implementation of the MR model, has been adopted by many companies including the major IT players in the world such as Facebook, eBay, IBM and Yahoo. These implementations are largely done in their respective private clouds. However, recently there are efforts to implement the MR model in public clouds [6], [7]. A major concern of using the MR model in a public cloud is its inadequate security provision, such as authentication. The MR model was initially intended for use in private networks, so the issue of security was not a design consideration [8]. Since its introduction, lots of efforts have been made to improve the performance of this model making it more efficient rather than making it more secure. Deploying the MR model in an open environment, such as public clouds, without adequate security provisioning would put the clients’ jobs and their data at risks. This is because, in such an environment, different jobs submitted by different clients typically share the same set of physical nodes and software resources. The clients have very little control over (1) on which nodes their MR components (assigned to their respective jobs) are executed, and (2) on which DFS nodes the data associated to their jobs are stored. These could make the jobs and the data more vulnerable to security threats and attacks [1], [9]–[11]. VOLUME 4, 2016 1659 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 3. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR Our work focuses on addressing identity related threats and attacks in deploying the newer version of the MR model in an open environment. To understand the security issues in this context and to capture the requirements necessary to address the issues, in this paper, we categorise the MR components involved in a job execution flow into two categories, MR Infrastructure (MR-Inf.) Components and MR-Job Components. MR-Inf. Components are the MR components that serve every job submitted by any clients. These components are not job specific. Examples of MR-Inf Components are Resource Manager and Name Node. MR-Job Components are the MR components that are invoked specially for a particular job submitted by a client. This set of components is job specific and their invocations and existence are purely for serving this particular job. Examples of MR-Job Components are Job Tracker (also called Application Master), Task Tracker (i.e. Node Manager) and Map and Reduce Tasks (i.e. Containers). As indicated in Figure 1, running the MR model in an open environment, three observations can be made: (1) clients typically access the MR application remotely via the Internet, (2) each client’s input data are partitioned and stored on a set of distributed and shared Data Nodes, (3) the MR components that are involved in executing the tasks (MR-Job Components) sprawled for a single client’s job are executed by multiple nodes, and these nodes may also host the tasks sprawled for other jobs submitted by other clients. An authentication solution designed to secure the jobs and their data in such an environment should consider three aspects, and these are: (a) the authentication of a Client to the MR application (i.e. Client-to-MR authentication), (b) the mutual authentication among MR components (i.e. MR-Comp-to-MR-Comp authentication), and (c) data authenticity (i.e. Data-Authenticity, which covers both origin authentication and integrity protections). Client-to-MR authentication is to guard the entry gate to the MR application making sure only authorised users (i.e. the clients of the MR application) could submit jobs to the MR application. In other words, the authentication solution should be able to verify that a client who seeks to submit a job to the MR application is indeed whom he claims to be. MR-Comp-to-MR-Comp authentication is to make sure that an MR component seeking to retrieve any resources associated to a client’s job is whom it claims to be. The third aspect, Data-Authenticity, is to protect the authenticity (i.e. origin and integrity) of data generated in both map and reduce phases, making sure that any unauthorised access of, and/or alterations made, to the data can be detected. The importance of addressing the above authentication issues and the requirements that should be satisfied by an authentication solution designed for the MR model have been discussed in literature [1], [9]. However, so far, little has been done in term of designing such a solution. As part of our effort on designing a secure and effective authentication solution for the MR model, in this paper, we critically analyse the state-of-the-art MR authentication methods. The purpose of this critical analysis is to examine the suitability and effectiveness of existing authentication methods (proposed for the MR model) taking into considerations of the features and characteristics of an MR application in an open environment such as a public cloud, so as to identify areas for improvement. It should be mentioned that our analysis of existing authentication methods proposed for an MR application has been previously published in [24]. However, this paper extends this analysis by (i) specifying design requirements for such an authentication solution, and analysing existing authentication methods proposed for the MR model against these requirements, (ii) further analysing what are missing in these methods in light of the features and characteristics of the MR model being deployed in an open environment, and (iii) providing a high level analysis of the MR model and its components in executing a client’s job, highlighting the functionalities of, and the interactions among, the MR components, (iv) proposing a novel approach to MR authentication, a layered authentication solution to the MR model that supports the newer version of the MR implementation. This solution is proposed to tackle the missing bits we have identified in existing authentication solutions designed for MR. In detail the remaining part of this paper is structured as follows. Section 2 specifies a set of authentication requirements based on our observations on, and security analysis of, the MR model being deployed in an open environment. In Sections 3 and 4, we critically analyse the existing work on MR authentication against the specified requirements. The analysis covers the authentication methods already adopted by the MR model (Section 3), and those recently proposed in literature for the MR model (Section 4). Section 5 gives a high-level analysis of the MR model, highlighting the functionalities of, and the interactions among the MR components in executing a client’s job, and this analysis leads to our novel proposal, a layered authentication solution to the MR model. Finally, Section 6 concludes the paper with further discussions and outline of our future work. II. REQUIREMENTS FOR AN MR AUTHENTICATION SERVICE This section specifies a set of requirements for the design of an authentication service for an MR application implemented in an open environment. The specification of the requirements has taken into account of the characteristics of the implementation and the outcome of a threat analysis carried out on the MR model. Related work has been reported in [1], [3], and [9]. A. ENTITY IDENTIFICATION AND CREDENTIAL REVOCATION To authenticate clients, MR components and jobs submitted to the MR application, each of these entities (or components) should have a unique identifier. The names (acronyms) of the identifiers along with entities they each represent are given in the following: 1660 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 4. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR (1) Clients IDs (Client-ID): a unique identifier for each client. This is usually a static ID, and it is typically the username of a user who has registered with the MR application and is running the client (i.e. MR-Client). (2) MR-Inf. Components IDs (MR-Inf.Comp-ID): Each MR-Inf. Component should have a unique identifier and these identifiers are static ones. (3) MR-Job Components IDs (MR-JobComp-ID): Each set of MR-Job Components serving a particular job should have a unique identifier to identify this set of MR-Job Components from the sets of MR-Job Components serving other jobs’. These IDs are dynamic ones. (4) MR-Job Hosting Nodes IDs (MR-JobHostNode-ID): Each MR-Job-HostNode should be uniquely identifiable, and these IDs are also static identifiers. (5) MR-Jobs IDs (MR-Job-ID): Each MR-Job should have a unique identifier to distinguish different jobs submitted by the same client or by different clients. (6) Framework (cluster) ID: If there are two or more parties providing hosting nodes, then the hosting nodes provided by a single party may be treated as one cluster, and each cluster should be identified by a unique identifier. Authentication is carried out by demonstrating (by a claimant), and verifying (by a verifier), the knowledge of a secret uniquely associated to an identity. Therefore there is a need for secure issuance, acquisition and revocation of an identity secret (which is also part of the corresponding credential). This leads to the following requirements: 1) ENTITY IDENTIFICATION (OR. REGISTRATION) There should be a secure method for a new client or a new MR component to be identified by the MR application and to establish secret associated to the identity. 2) CREDENTIAL REVOCATION There should be secure methods for revoking any credential(s) issued to the identity of an entity involved in a job execution at any point during the job execution or after the job execution. This may take place when a job is completed, or when a related MR-JobHostNode fails or is disconnected. B. ENTITY AUTHENTICATIONS Entity authentication is to make sure that a communicating entity is the one that it claims to be. Multiple entities (i.e. components) in the MR model are involved in a job (MR-Job) execution. Some of these components are static components, while others are dynamic ones. The static components are identified by static identities that, once given, remain the same during the lifetimes of the components. The static components can be further classified into two groups: one is MR Clients and the other is MR-Inf. Components. MR-Inf. Components are shared by different MR-Jobs. Resource Manager and Name Node, shown in Figure 1, are MR-Inf. Components, so they are static components and their identities are static too. The dynamic components are identified by dynamic identities. A dynamic identity is assigned to a dynamic component when the component is assigned to an MR-Job. If this MR-Job is completed in which case the component may be assigned to another MR-Job, and if this is the case, this component will be assigned with a new identity. Job Tracker, Task Tracker and Map and Reduce tasks (i.e. Containers) are dynamic components and they are identified by dynamic identities. In an authentication solution designed for the MR model, all the components taking part, or being involved in a job execution, being static or dynamic, should be securely identified and authenticated. In detail, with reference to the MR model depicted in Figure 1, the authentication task should satisfy the following requirements: 1) MUTUAL AUTHENTICATION BETWEEN AN MR CLIENT AND AN MR-INF. COMPONENT This is to ensure that only an authorized client can connect to the MR application. Hereafter this is referred to as the Client-to-MR-App authentication and MR-App-to-Client authentication. More specifically, this should cover the mutual authentication between a Client and the Resource Manager and between a Client and the Name Node. 2) MUTUAL AUTHENTICATION BETWEEN AN MR-JOB COMPONENT AND AN MR-INF. COMPONENT This is to ensure that an MR-Job component involved in the execution of a client’s job is authenticated to the MR application, so as to ensure that any access to a client’s job input and output data can be granted in a secure manner. The mutual authentication between the Job Tracker of a job and the Name Node, and between a Reduce Task of the job and the Name Node are examples of this authentication requirement. 3) MUTUAL AUTHENTICATION BETWEEN ANY PAIR OF MR-JOB COMPONENTS This is to ensure that any access to a client’s job intermediate data can be granted in a secure manner. 4) MUTUAL AUTHENTICATION BETWEEN AN MR-JOB COMPONENT’S HOSTING NODE AND AN MR-INF COMPONENT This is to ensure that any new physical node assigned to hosting MR-Job Component/s of a client’s job (e.g. a new hosting node of a Task Tracker) is authenticated to a MR-Inf. Component and vice versa. In this way, we can ensure that any two MR-Job Components’ Hosting Nodes can authenticate to each other. Hereafter this requirement is referred to as MR-Job-HostNode-to-MR-App and MR-App-to-MR-Job-HostNode authentication. 5) MUTUAL AUTHENTICATION BETWEEN DOMAINS (I.E. CROSS-PROVIDER AUTHENTICATION) This authentication is needed when a third party is involved in an MR-Job, and it is to ensure that any new physical node which belongs to a third party domain and involved in hosting VOLUME 4, 2016 1661 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 5. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR MR-Job Components is authenticated to the MapReduce domain where the job is submitted and mastered. C. AUTHENTICITY OF DATA AND PROTOCOL MESSAGES 1) DATA AUTHENTICITY This is to ensure the origin authentication and integrity protection of data that are saved in, or produced by, the MR application. In other words, the protection should be applied to input data, intermediate data and output data of any job processed by the MR application. 2) AUTHENTICITY OF PROTOCOL MESSAGES The origin authentication and integrity protection should also be applied to all the protocol’ messages facilitating the tasks of authentication in an MR application. The protocol messages are of two types: authentication requests and authentication responses. D. CONFIDENTIALITY OF PROTOCOL MESSAGES Confidentiality of Authentication Requests and Replies: This is a protection of authentication requests and replies from any unauthorized disclosure. To counter eavesdropping attacks, the confidentiality of any such request or response sent between MR components throughout an MR-Job execution should be protected. E. PERFORMANCE AND SCALABILITY REQUIREMENTS 1) MINIMIZING COMMUNICATION OVERHEAD In accomplishing the task of authentication for an MR-Job, the communication overhead introduced should be as low as possible. This means that the number of authentication messages, and the length of each message should be as low as possible. 2) MINIMIZING COMPUTATIONAL OVERHEAD the computational overhead incurred in accomplishing the task of authentication for an MR-Job should be as small as possible. 3) MAXIMIZING SCALABILITY The MR application scales by simply adding new nodes (members) to the shared clusters. Any authentication solution designed for the MR application should scale similarly. F. SUPPORT FOR UPDATING OF AUTHENTICATION CREDENTIALS There may be cases where an execution of an MR-Job takes a long time, and, in such cases, for security reasons, authentication secrets or credentials may need to be renewed or updated. Therefore, any authentication solution designed for the MR application should support the renewal or updating of authentication secrets or credentials. In the next section, we critically analyze authentication methods proposed for the MR model based on the requirements specified above. These authentication methods include those ever adopted by the MR model and also those published in literature. III. AUTHENTICATION METHODS EVER ADOPTED BY THE MR MODEL Two authentication methods have been adopted by the MR model so far [1], [4], [13]. The first one [1], [4], adopted in the early generation of the model, assumed the use of an independent authentication service outside the MR model, e.g. an authentication service come with the host operating system (OS) running on a physical node. This is the so called OS-based authentication method. In other words, the MR model then did not have its own authentication service. Rather it relied on the use of an authentication facility provided by the OSes of the physical nodes in which an MR application is deployed. The second method, used in the most recently deployed MR model was proposed by O. Malley et al. from the Yahoo Hadoop team (hereafter referred to as O. Malley method) [13]. This method is symmetric key based authentication and it is largely built on the Kerberos authentication protocol. At the time of writing this paper, the Kerberos authentication protocol is still a default mode of authentication for an MR application deployed in a private cloud [1], [8]. Figure 2 summarizes the authentication process using this method. As shown in the figure, a client or MR component first authenticates itself to the authentication server. Upon successful authentication, the MR component will obtain a Ticket Granting Ticket (TGT), which is then used to acquire a service ticket. The service ticket is then used by the MR component to access resources located on other MR components. This authentication process consists of six steps (steps 1 to 6, as shown in the figure), and is identical for all the MR components in the application. Assuming that a client is to write his job into the MR application as part of a job submission process, to authenticate himself to the application, the client first makes an authentication FIGURE 2. Kerberos protocol messages exchanges in the MR model. 1662 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 6. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR request to the Authentication Service (AS). The AS generates a response containing a TGT, which is encrypted using a key derived from the client’s password and sent to the client. Then the client uses this TGT to request a service ticket by sending the TGT along with an authenticator to demonstrate the secret in the TGT to Ticket Granting Service (TGS). Once the client receives this service ticket, he uses it to access the Name Node in DFS. The same steps are taken by any other MR components, such as Task Trackers, to get admitted to the cluster and to access other (remote) MR components for retrieving data or other resources used by the client’s job. Once a Task Tracker (or a Client) is authenticated, obtains a service ticket and is admitted to the Master Node in the MR application, both the Task Tracker and Master Node will use the shared key in the service ticket to authenticate to each other [1], [13]. This authentication method is a one-factor authentication method. The one factor used by a client to authenticate himself to the AS is the client’s password. Knowing the password would allow any entity to acquire a service ticket in the name of the client and to access any resources granted to the client. In other words, for an attacker to impersonate a legitimate component (e.g. a client or a Task Tracker), the attacker needs to obtain a service ticket. To access the ticket, the attacker needs to know the password of the (legitimate) client to whom the ticket has been issued. If a client’s password is compromised, then all the resources assigned to the client will be at risk. In addition, the attacker could use this compromised account to launch further attacks in the MR application. In other words, the security level offered by this one-factor authentication method is the same as that offered by the password chosen by a client. If a client chooses a weak password, then the risks imposed on the MR application will increase accordingly. With regard to communication overheads introduced by this authentication method, we should work out how many protocol messages are generated and used per job submission (i.e. in each authentication instance), while assuming the length of each such message is approximately the same. For each authentication instance, three rounds (R) of communications are required. Two of the three rounds are between a client (or an MR component, or MR-Req-Comp, for short) and the AS, and the third round is between an MR-Req-Comp and another the MR component that manages some resources (MR-Res-Comp). Each round consists of two messages (Msg), one request (Req) and one response (Res). Table 1 shows the number of communication rounds (along with the number of protocol messages exchanged) versus the numbers of MR components involved per job. Deploying MR in a cloud environment is a shared computational environment, and, in such an environment, there are multiple possible usecases. For example, one client may submit a single job at any given time (hereafter referred to as the OneClient-OneJob usecase), one client may submit multiple jobs simultaneously (OneClient-MultiJobs) or multiple clients and each may submit one or more jobs TABLE 1. Number of communication rounds for MR component/s authentication using Kerberos. FIGURE 3. A number of protocol (i.e. authentication) messages generated in an authentication process/es under specified usecase scenarios. simultaneously (MultiCleints-MultiJobs). Table 2 shows the number of communication rounds and the total number of protocol messages generated for different numbers of MR-Req-Comp each job may require in each of the three usecases. The table uses the notation, yC/zJ, to indicate the different usecases, i.e. 1C/1J for Case-1, meaning one client y=1, and one job z=1; 1C/zJ for Case-2, one client y=1, and multiple jobs z>1, and yC/zJ for Case-3, where multiple clients y>1, and multiple jobs z>1. Figure 3 plots the results for three example cases: 1C-1J- 16Comp, 1C-6J-16Comp and 7C-4J-16Comp capturing different numbers of clients, jobs and MR components in each case. For Case-3, we assume that there are 7 clients and each client submits 4 jobs. The number of components involved in each job execution is 16 MR components. Detailed values VOLUME 4, 2016 1663 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 7. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR FIGURE 4. Number of protocol (i.e. authentication) messages versus the number of: (A) MR components, (B) Jobs, and (C) Clients. TABLE 2. Number of communication rounds for authentication in three different usecases of the MR application. with regard to the number of clients, the number of jobs per client, the number of MR components per job, and the number of protocol messages required for authentication in each of these cases are given in the figure. It can be seen from the figure that, for Case-3, the number of protocol messages generated for the authentication of these clients and the associated MR components used for the execution of the jobs submitted by the clients reaches more than 2700. If the number of clients, and/or the number of jobs submitted per client, goes up, this message number will increase sharply. To further examine the effects of different factors on the scalability of the solution, we have calculated the number of protocol messages required versus the number of MR components used per job, the number of job submitted per client and the number of clients submitting the jobs, respectively. Figure 4(A) shows the number of protocol messages generated versus the number of MR components used per job. The figure plots the results for further three cases by changing the number of clients (y) and the number of jobs submitted per client to {y = 1, z = 1}, {y=2, z=2}, and {y=3, z=3}, respectively. As can be seen from the figure that, if there are only three clients each submitting three jobs, then the total number of MR components required to execute these jobs is about 70, but the number of protocol messages required for authenticating the clients and the 1664 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 8. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR MR components are more than 4000. This is a significant increase in comparison with the number of clients and the number of jobs submitted by the clients, and could impose a significant risk on creating a performance bottleneck in the cluster. Figure 4(B) shows the number of protocol messages generated versus the number of submitted jobs. From the results shown in the figure, it can be seen that, when the number of clients (y) is fixed at 3, i.e. y = 3 and the number of MR components used per job at n = 30, as the number of jobs submitted per client increases from 1 to 4, the total protocol messages generated will increase from about 500 to over 2000. Figure 4(C) shows how the number of protocol messages increase as the number of clients accessing the MR application increases, where the number of jobs submitted per client and the number of MR components per jobs are fixed. IV. AUTHENTICATION METHODS PUBLISHED IN LITERATURE In addition to the authentication methods described above, there are also methods that have been proposed for the MR model in the research domain. These methods can largely be classified into two groups, symmetric key based and asymmetric key based. The authentication methods proposed by Somu et al. [14] and Rubika et al. [15] are symmetric key based, and their focus is on verifying the identities of clients requesting to access an MR application. On the other hand, the methods proposed by Wei et al. [16], Ruan et al. [18] are an asymmetric key based. They focus on verifying the authenticity of an MR component. In addition, the method proposed by Zhao et al. [19] is also asymmetric key based, but this method provides both clients’ authentication and MR components’ authentication. In this section, we give an overview of these methods. A. SOMU AND RUBIKA AUTHENTICATION METHODS Somu et al. [14] proposed an authentication method (hereafter referred to as the Somu method) for the Hadoop MR model. This method is symmetric key based. It is similar to the O. Malley method in that both methods use a single authentication factor, relying on the use of a client’s username and password, to authenticate the client to the MR application. However, unlike the O. Malley method, the Somu method uses two further ideas to strengthen the security level of the authentication service. The two ideas are: (1) the introduction of a one-time pad key (session valid only), and (2) the use of the principle of the separation of duties. The ciphertext of a client’s password, encrypted using the client’s one-time pad key, is stored in the Registration Server (one of the two servers used to implement the authentication service) and the ciphertext of the client’s one-time pad key, encrypted using the client’s password, is stored in the other server, a Backend Server. The two ideas are used in such a manner that no passwords or encrypted passwords are sent over the channel and no cleartext passwords are stored in any of the two servers, thus minimize the exposure of clients’ long-term credentials, i.e. the passwords. FIGURE 5. Authentication steps of the Somu method. Figure 5 depicts the authentication process using the Somu method. As shown in the figure, two servers (the Registration Server and the Backend Server) are involved in an authentication process (in verifying a client’s ID). The verification makes use of three ciphertexts, Ciphertext-1, Ciphertext-3 and Ciphertext-4. Ciphertext-1 is the client’s password encrypted using a one-time pad key belonging to the client, and it is stored in the Registration Server. Ciphertext-3 is the one-time pad key encrypted with the user’s password and it is pre-stored in the Backend Server. Ciphertext-4 is generated by the Registration Server each time when an authentication request is received. It is generated by encrypting the one-time pad key using the user’s password. Figure 5 shows the steps of the Somu authentication method. First, the client sends an authentication request to the Registration Server and this request contains the client’s username. The Registration Server forwards this request to the Backend Server. The Backend Server uses the username to fetch and return Ciphertext-3 (it is pre-stored) to the client through the Registration Server. The client decrypts Ciphertext-3 using his password, and sends the pad key back to the VOLUME 4, 2016 1665 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 9. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR Registration Server. These steps are indicated by messages 1, 2, 3, 4 and 5, in Figure 5. The Registration Server then uses the pad key to decrypt Ciphertext-1 to obtain the password and then uses the password to encrypt the pad key to generate Ciphertext-4. The Registration Server then sends Ciphertext-4 to the Backend Server, as indicated by messages 6, 7, and 8. Finally as indicated by messages 9, 10, 11 and 12, the Backend Server compares Ciphertext-4 with Ciphertext-3 and if the two are equal, the Backend Server will send a positive notification to the Registration Server, which contains the client’s Username. The Registration Server compares the Username received from Backend Server with the one received from the user. If they match, then the login process is successful. The Somu authentication method supports client authentication with a stronger level of protection of clients’ long-term credentials (passwords) than the methods discussed earlier. This protection involves the use of a symmetric one-time pad key and two authentication servers. A client’s password is encrypted with the one-time pad key, the one-time pad key is encrypted with the password, and the two encrypted items are, respectively, stored on two different servers. To impersonate a client, an attacker needs to guess or obtain the client’s password. Getting hold of the client’s password by stealing the ciphertext stored on either of the two servers is computationally difficult. For example, if the attacker can steal Ciphertext-1 (the encryption of the password using the one-time pad key) from the Registration Server, to access the password, the attacker will need to guess the pad key or to use a dictionary attack to guess the password. However, this is computationally difficult as the pad key used is valid for one session only. Once the client logs off a session, a new pad key will be generated and used to reencrypt the password [14]. The dictionary attack is also subject to the difficulty brought by the use of the one-time pad key. If an attacker could steal Ciphertext-3 (i.e. the encrypted pad key using the password) from the Backend Server, then only a dictionary attack could be used to guess the password, as the encryption is not reversible here. Another advantage of this authentication method is that, similar to the O. Malley method, the Somu method does not require any transmission of clients’ long-term credentials (e.g. paswword) over the channel. However, against our requirements detailed in Section II, the Somu authentication method has two limitations. Firstly, it only supports gate-level authentication. In other words, it only supports the client’s authentication to the MR application; it does not provide any mechanism to support the authentication of one MR component to another (e.g. the authentication of a Task Tracker to the Name Node). Secondly, the authentication method is more costly in terms of communication overheads than the methods discussed earlier. The number of communication rounds, as shown in Figure 5, which are required for only one client authentication instance, is 4 rounds (2 messages each round). This is 1 round more than what is required by the O. Malley method (the O. Malley method requires 3 rounds of communications for a client to authenticate itself to access one service). Rubika et al. [15] has also proposed an authentication method (hereafter referred to as the Rubika method) for the MR application. This method uses three servers for authentication, an Authentication Server, and two backend servers, Backend Server 1, and Backend Server 2. Figure 6 shows the registration and authentication processes of this method. To register, a client submits his username and password to the Authentication Server (or a password is created for the client). The server divides the password, a set of ASCII letter, into three values, m1, m2, and m3, and it also generates three random numbers, c1, c2 and c3. Then the Authentication Server uses the two sets of values, {m1, m2, m3} and {c1, c2, c3}, to generate a new set of values called angles that are denoted as {θ1, θ2 and θ3}. The Username and the random numbers {c1, c2 and c3} are stored in Backend Server 1 and the Username and {θ1, θ2 and θ3} are stored in Backend Server 2. These two sets of values are used to authenticate the client when the client makes an access request to the MR application. FIGURE 6. Registration and authentication processes of the Rubika method. 1666 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 10. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR As described above, the Rubika method uses three servers for authentication, but only one of the three servers, the Authentication Server, is exposed to the public (i.e. accessible to users). The other two servers, the backend servers, are used to store password-verifiers. In other words, with this approach, there is nothing related to the clients’ passwords that are stored in the server accessible by the public. In the Somu method, on the other hand, clients’ encrypted passwords are stored in the registration server which is accessible to the public. In addition, with the Rubika method, to compromise a password by stealing the password verifier, an attacker would have to compromise two servers, as each password verifier is divided into two portions and each portion is stored on a different server. These two measures make the Rubika method more secure than the Somu method. The authors has also claimed that, by using the two-portion password verifiers and alienate passwords, their method is robust against replay and password guessing attacks. Additionally, although the Rubika method uses three servers, rather than two as in the case of the Somu method, the communication overhead incurred in the Rubika method is lower than the Somu method. The Rubika method only needs three rounds of requests and replies for one client authentication instance. This is one round less than the Somu method. B. WEI’s AUTHENTICATION METHOD Both Somu and Rubika authentication methods are designed to support client authentication only. They do not consider the authentication issues between different MR components. Wei et al addressed this gap by proposing a SecureMR Framework [16]. The Framework (hereafter referred to as the Wei method) is aimed at protecting the integrity of MR data processing services, namely the messages sent by Map and Reduce tasks, and the data processed or generated by the tasks. For the latter, both intermediate data and final computational results from an MR job execution are protected. For example, a Reduce task (Reducer) verifies the authenticity of intermediate data produced by a Map task (Mapper), and a client should verify the authenticity of the final result generated by a Reducer. The method also supports consistency checks of intermediate data and final results from a MR job execution. This is done by replicating some Map and Reduce tasks and assign them to different workers. At the end of the computation, the master compares the results produced by different sets of tasks. If the results are identical, then the consistency of the results (both intermediate results or final results) is assured. The verification process is carried out collaboratively between the Master and a worker (i.e. Mapper). Two protocol messages, Assign and Commit, are used to authenticate and verify the authenticity of both the task and data produced by the task. For example, as shown in Figure 7, to assign a Map task to a Mapper, the Master sends the Mapper an Assign message containing the ID of this Mapper, MapperID, and the location of the data, DataLocation. The Master signs the FIGURE 7. The Wei method: to ensure message or data authenticity. message using his private key and then encrypts the message with the Mapper’s public key. When the Mapper receives the Assign message, the Mapper decrypts the message by its private key and verifies the signature using the Master public key. Upon positive verification, the Mapper executes the task assigned. After the task execution is completed. The Mapper hashes each partition of the computational result (intermediate data) and signs the hashed values by his private key, and then constructs and sends a Commit message to the Master. The Commit message contains the signed data partitions of the result. Upon the receipt of this message, the Master verifies the Commit message using the Mapper’s public key. If the Master receives more than one Commit message from different Mappers but for the same map task (replicated task), the Master will compare the signed values contained in the different Commit messages to see if they are consistent with each other [16]. The above method is also used to ensure the authenticity of any intermediate data assigned to a Reducer by a Master. The Reducer verifies the authenticity of the intermediate data which are produced by the Mapper using the Mapper’s public key. However, the method used to verify the authenticity of the final result produced by an MR job execution is different from the one discussed above. In the latter case, a secure verification component is installed into the MR client application, the Master and client verify the authenticity of the output data by using an additional phase, called Verify phase, [16]. In addition to achieving message and data authenticity, the Wei method also protects the confidentiality of protocol messages (i.e. Assign and Commit messages). This is done by encrypting the entire protocol message with the recipient’s public key (after signing the message with sender’s private key). The major difference between the Wei method and the Somu and Rubika methods is that the Wei method ensures the authenticity of messages sent from one MR-Job Component to another and from an MR-Job component to an MR-Inf Component and the data or results produced by the MR-Job components. These protections are provided by using digital signatures, so the method also provides the property of non-repudiation of origin protecting against false denial of having generated or transmitted a message. However, as discussed in [12] and [17], a public key cryptosystem is computationally more costly in comparison with a symmetric key cryptosystem, especially when it is applied to a large-scale computational environment such as a VOLUME 4, 2016 1667 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 11. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR Cloud environment where a large number (possibly hundreds or thousands [5]) of jobs may need to be processed and a large number of distributed components are involved. Furthermore, the Wei method has an extra phase (verification phase) in addition to the map and reduce phases. This extra phase is used to verify the authenticity of the final result produced by an MR job execution. The performance evaluation presented in the paper has not considered the costs as introduced by this extra verification phase; it has only considered the communication costs of these scenarios: Master-to-Mapper, Master-to-Reducer, and Mapper-to-Reducer. C. RUAN’s AUTHENTICATION METHOD Ruan et al have proposed a trust-based authentication solution for the MR application, called a Trusted MapReduce (TMR) Framework [18]. The TMR Framework uses the notion of trust and a public key cryptosystem based authentication method to facilitate the authentication between MR components. The authentication process is carried out in two phases. The first phase is for initial trust (attestation) establishment, and is carried out when an MR component (e.g. a worker) sends a connecting request to another MR component (e.g. a master). The second phase is for periodical trust updates between the worker and the master, and it is carried out regularly during the lifetime of the job execution. When a worker first registers with a master, it generates a pair of public and private keys, and this pair of keys is called an Attestation Identity Key (AIK) pair. The worker then sends the public key to the master. This TMR Framework is similar to the Wei method in that it uses a public-key cryptosystem based authentication and it is different in that it can provide continuous authentication between different MR components. However, the TMR Framework design has not considered the authentication of a client to the MR application, nor the issue of secure distribution of public keys. It assumes that the AIK public key should either be certified by a trusted third party (e.g. Privacy-CA) [18] before run-time, or sent in a secure channel from one MR component (i.e. worker) to another. In addition, with this method, the master has to keep the public keys of all the workers to provide continuous authentication between the master and each worker. D. ZHAO’s AUTHENTICATION METHOD J. Zhao et al have proposed an authentication method to support the authentication of a client to an MR application and authentication between a pair of MR components [19]. A user logs into the master node (of the MR application) using his username and password. The master node has a Database that contains users’ login information along with their access rights. The master node verifies the password submitted by the user. If the verification is positive, the user will be allowed to submit a job to the MR application and a user instance is created for the user to indicate that the user has an active job. The subsequent authentication between the MR components associated to the user instance (i.e. job) is achieved by using two types of certificates, proxy and slave certificates. The proxy certificates is used to authenticate the Job Tracker (the master node), linked to this user instance, to Task Tracker (the slave node, i.e. the worker), while the slave certificate is used to authenticate the slave node to the master node. The proxy certificate contains the public key of the master node and CA-ID (Certificate Authority Identity). The slave certificate contains the public key of the corresponding slave node along with the CA-ID. When the master applies for a proxy certificate for a user instance, a secure connection is set up between the master node and a Certificate Authority (CA) using the Secure Socket Layer (SSL) protocol. In this way, both the CA and the master node can be authenticated to each other by using this protocol. Then the master node generates a pair of public and private keys (Mpub and Mprv) for the user instance. The master node keeps the private key and sends the public key to the CA through the secure channel just established. The master node also generates a user session which will be used for later communication with the allocated slave nodes. The CA adds some information such as key life time to form the first part of the proxy certificate and signs it with CA’s private key. The same generation and certification process is also applied to the corresponding slave certificate. The proxy certificate is sent to all the slave nodes that are involved in the user instance (job), and the slave certificates are sent to the master node. This method provides mutual authentication between a master and a set of slave nodes involved in a user’s job. This certificate based mutual authentication can mitigate a number of threats such as Man-In-The-Middle (MITM) attack between the master and slave nodes. A handshaking protocol is used to facilitate the mutual authentication. Also, as a secure channel is used between the CA on one side and the master or a slave node on the other, the messages sent in the channels are confidentiality and integrity protected. To evaluate the performance of this method, the authors have implemented the authentication method assuming the following usecases: (1) one master with one slave, (2) one master with two slaves and (3) one master with three slaves, and 20 jobs were submitted. The results show that the execution time taken by the master node to authenticate three slave nodes is about the double of the execution time taken to authenticate two slave nodes. This means that the execution time may be excessively high if the number of nodes increases to hundreds or even thousands. The high level cost is mainly due to the use of the asymmetric key based cryptosystem, the use of a third party (CA), and the need to issue and distribute proxy and slave certificates securely. E. QUAN’s AUTHENTICATION METHOD Q. Quan et al. have extended the work presented in [13] and [16] (the Malley and Wei methods), focusing on for file authenticity protection and key exchange [20]. The authors believed that the authentication methods proposed in [13] and [16] mainly provide user identity and service integrity verifications, while the most needed method to 1668 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 12. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR secure the MR model is to provide a mechanism to protect the data itself. Based on this belief, they proposed a method to protect data confidentiality and integrity in the MR application. This method makes a hybrid use of the public and symmetric key cryptosystems, i.e. a pair of MR components use a public key cryptosystem to securely exchange a shared symmetric key and then use this symmetric key to encrypt the data. The following steps summarize this method. 1. Shared key exchange: - An MR component, A, generates a symmetric key, encrypts it using another MR component, B, public key, and then sends the ciphertext to B. - B decrypts it using its own private key. - Now both A and B share the same secret key which is used to encrypt and decrypt any data (file) sent between the two components. 2. Data confidentiality and integrity protections: - A’s file content (data) is hashed using a hash function such as MD5. - A signs the hashed value of the file content along with other items (that form the file header), such as file ID, file name, and time stamp, using A’s private key, and then sends the lot to B. - B verifies the signature using A’s public key. - B calculates the hash value of the file content (data) after decrypting it using the shared key. It then compares the hash value with the hash value sent within the encrypted (signed) header. The merit of this method is that it does not use asymmetric key cryptosystem for encrypting and decrypting the data itself (as the data could be big), rather it uses it to encrypt and decrypt a symmetric key and the file header which has a small size in comparison to the size of the data itself (file content). This is because of the high computational cost of using the asymmetric key cryptosystem for a big data [12], [17]. This method ensures the authenticity and confidentiality of the client data as well as any data sent between any two MR components. However, this method does not provide the authentication of data if the data are not already read by an authenticated MR component; it assumes that any MR component, that reads, or needs to access, a client’s data, has already been authenticated. Tables 3 and 4, respectively, summarize the related works against the requirements specified for an MR authentication service and the properties specified in Section 3 based on the analysis conducted on the MR model in [1], [3], and [9]. V. WHAT IS MISSING The critical analysis of the existing authentication methods, presented in section IV, shows that some methods are designed to support gate-level authentication (i.e. the authen- tication of users or clients to the MR application), while others only protect the integrity and origin authentication of protocol messages and data sent among different MR components. Though there are efforts on supporting mutual authentication between different MR components, these efforts are largely based on the use of public key credentials. Public key (i.e. asymmetric key) based solutions require the involvement of a third party (CA) for credential issuance and distribution. The costs incurred in such solutions are usually high. In addition, these methods have not considered mutual authentication between an MR-Job Component and an MR-Inf. Component (Name Node). Mutual authentication between an MR-Job Component and an MR-Inf. Component is necessary and important as the former need to request for data (e.g. input files) or other resources from the latter during a job execution. The lack of an adequate authentication service specifically designed for the MR model will make the model vulnerable to security threats and attacks. The threats and attacks are not just those in relation to identity thefts, impersonation or replays attacks. A successful compromise of a client’s account with an MR application will give attackers a better chance to launch other attacks, gaining unauthorized access to data and/or interrupt other job executions. This is particularly the case if the MR model is deployed in a shared environment. To address this open issue, the next section presents a high-level analysis of the MR model, highlighting the functionality of its components and the interactions among the components when executing a job submitted by a client. This analysis will lead to our initial idea of using a layered approach to authentication in the MR model being deployed in a shared environment. VI. HIGH LEVEL ANALYSIS AND IDEA From the GMC model shown in Figure 1, we can see that, when executing a job (i.e. in a job execution flow), multiple MR components are involved. Each component executes a well-defined function and the multiple components interact with one another to collaboratively accomplish the job execution [2], [3], [9]. The MR components, either of one job or multiple jobs submitted by a single or multiple clients, are hosted in two shared clusters, Processing Framework (PF) and Distributed File System (DFS). Each interaction between a pair of MR component is a client-server interaction and must be authenticated. Examples of these client-server interactions are the reading and writing requests made to the Name Node. Each request is a procedure call. These calls could be for (i) reading data (job resource) (typically initiated by a Job Tracker or a Task Tracker), or (ii) writing data submitted by a client or produced from a job execution (e.g. input and output data of a job) (typically initiated, respectively, by clients and Reduce Tasks). Other examples of the client-server interactions are those initiated to and from the Resource Manager. When a client submits a new job, he needs to make a new job-submission request. Upon the receipt of the job-submission request, the Resource Manager needs to make a resources-allocation request to a Job Tracker. All these requests are interactions and are made by using procedure calls. In other words, these calls could be for (i) submitting a new job (typically initiated by clients), or VOLUME 4, 2016 1669 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 13. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR TABLE 3. Related works versus design requirements for authentication services for the MR model. (ii) allocating a Job Tracker to master the Map and Reduce Tasks assigned to different Task Trackers related to a job execution. Depending on their functionalities, the interactions involved in a job execution can be classified into three groups: (i) those for submitting a job, (ii) those for allocating resources for the execution of the job, and (iii) those for reading or writing data related to the job execution. The first group (Group-1) of interactions takes place when a client submits a job to the MR model. To submit a job, the client makes the job submission via the Resource Manager, and writes the data for the job execution into the Name Node. The Resource Manager is the master node in the PF cluster, and the Name Node is the master node in the DFS cluster. In other words, the first group of interactions is between a client and the two master nodes, one in each cluster. It should be emphasized that one client may submit multiple jobs, and there will be multiple clients submitting jobs. Hereafter we shall refer the interactions taking place for the submission and execution of a single job as one set, and one such set of interactions consists of the interactions from all the three groups, i.e. {the set of interactions for the execution of one job} = {a subset of Group-1 interactions}+{a subset of Group-2 interactions}+{a subset of Group-3 interactions}, where all the subsets are all related to the submission and execution of a particular job. 1670 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 14. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR TABLE 4. Protection against some security threats for proposed methods. The interactions in the second group (Group-2) are for allocating job resources. Here, for each job, three nodes are involved in this group of interactions, the Resource Manager, the Name Node and a Job Tracker allocated for the job. When a job is admitted, the Resource Manager will allocate a Job Tracker for the job. This Job Tracker will be assigned (i.e. allocated) multiple Task Trackers. The Map and Reduce tasks of this job will be executed on these Task Trackers. The Group-2 interactions also include the interactions carried out by the Name Node to manage and maintain a set of Data Nodes. The Data Nodes host data for all the jobs that are submitted. As can be seen from our discussions here, the functionalities of the Resource Manager and Name Node are for managing the executions of all the jobs that are submitted by the same client or by different clients. The Resource Manager and the Name Node serve all the jobs submitted. They are shared by different jobs and are identified by static identities. These components are not invoked because of a job submission; they are there to serve every job submitted by any client. It is for this reason, the two components are called MR Infrastructure components (i.e. MR-Inf. Components, for short) As mentioned above, the Group-2 interactions are for providing resources for the executions of jobs. As shown in the GMC model (Figure 1), for a single job execution, these interactions are those of Resource Manager to Job Tracker (RM-JT), Job Tracker to Task Tracker (JT-TT), and Name Node to Data Node (NN-DN). These interactions are different from Group-1 interactions, as these are performed for accomplishing cluster functions and for serving the execution of a particular job. This is due to the following observations: (i) both Job Trackers and Data Nodes are, respectively, the slave nodes of PF and DFS clusters, (ii) any interactions initiated by a cluster master node (i.e. RM or NN) towards a cluster slave node (i.e. RM to JT, or NN to DN), or from a slave node of the PF cluster to another slave node in the same cluster, do not involve any access (read, write or retrieve1) to any data of a job. Also, the RM and the NN are the only RM and NN, which initiate Group-2 interactions, and the JT is the only JT that initiates Group-2 interactions for a particular job. In other words, there is no other RM or NN to 1Retrieve involves both read and write; read from remote server (DN) and write locally to another server (TT). initiate such interactions in the cluster, and there is no another JT to initiate such interactions for the same job [2]. The third group (Group-3) interactions are for executing a job submitted by a client. For each job, four types of components are involved in this group of interactions (i.e. these components, assigned to a particular job, initiate this group of interactions): the Job Tracker (JT), Task Trackers (TTs), Map Tasks (MTs) and Reduce Tasks (RTs). In executing the job, the JT retrieves the input splits of the data (the data of the job submitted by a particular client) from the DFS. The JT can then start managing the tasks (MTs and RTs). TTs also retrieve the data (for the job execution) from the DFS. TTs can then start executing the MTs and RTS assigned to them. In executing the tasks, RTs read the intermediate data, which is produced by MTs, from the respective Task Trackers. RTs also write the output results of their computations into the DFS. Group-3 interactions can actually be seen as different subgroups (subsets) of interactions, each subgroup of them is performed by a set of components invoked for a particular job. In other words, the set of components are a JT that is invoked for a particular job, and TTs, MTs and RTs, all of which are associated to the JT. This set of components are created or invoked when a job is submitted, and they are terminated or reassigned when the job execution is completed. The existence of this set of components is purely for serving this job. Therefore these components are identified by dynamic identities, and the identities are short-lived and so are the secrets issued to them. For this reason, we refer them as MR-Job Components. We can further use an example to explain the Group-3 interactions. As shown in the GMC model (Figure 1), during the execution of this particular job (i.e. in a job execution flow), this set of Group-3 interactions are for executing this job and can be identified as follows: Job Tracker to Name Node (JT-NN), Task Tracker to Name Node (TT-NN), Reduce Task to Task Tracker (RT-TT), and Reduce Task to Name Node (RT-NN). These interactions are for executing/processing the job, i.e. they perform (belong to) jobs functions. Group-3 interactions may be invoked concurrently by MR-Job Components serving different jobs. Because of this we need to distinguish the interactions based on Job IDs, i.e. which job a particular interaction, or a set of interactions, actually serve. VOLUME 4, 2016 1671 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 15. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR Three groups of interactions and the MR components performing the interactions take part in executing a client’s job, but neither the interactions nor the operations (or functionalities) of the components are in the control of the client. The components perform their functionalities and interactions to execute the client’s job (or process the client data) on behalf of the client. Therefore, there is an open issue here, i.e. how could a client trust such a shared computational environment? This issue is particularly important if the data processed by the job are privacy or security sensitive. To achieve effective authentication in an MR environment, the authentication solution should capture the characteristics of this environment. The characteristics that should be captured in the design of an authentication solution for MR can be summarized as follows: (1) this is a shared environment with one or more clusters; (2) each cluster hosts a set of distributed MR components, and these components can be classified into MR-Inf. Components and MR-Job Components; (3) the MR-Job Components are job-dependent, i.e. they are invoked for a particular job submitted by a client; (4) multiple jobs submitted by the same client or by different clients may be hosted by the MR environment, or, in other words, the MR environment typically executes multiple jobs submitted by the same client, and/or different clients, at any one given time. Based on these observations, we can single out the set of MR components (MR-Inf. Components and MR-Job Components) that are involved in executing a particular job and give this set of component a name, an MR-Job Domain. In other words, each job will have a unique identity and this identity is also used to index an MR-Job Domain that refers to the set of MR components involved in serving a particular job. We here propose a domain-based authentication approach for the newer MR implementation. The novel idea behind this approach is that the MR components that serve a particular job is singled out as a MR-Job Domain and the components in this MR-Job Domain are responsible for authenticating themselves to each other. This is actually an idea of isolation, i.e. we isolate the MR-Inf. Components and MR-Job Components, which are involved in executing a given job, into one set, and require this set of components to authenticate to each other, so that only the components in this set are allowed to access the resource belonging to (or owned by) this particular job and any component outside this domain is not allowed to access to the resource. To implement this idea, we here propose a novel framework, named as the Virtual Domain based Authentication Framework (VDAF). This framework is said to be virtual domain based, because (1) MR-Job Components are dynamic, (2) more than one MR-Job Domain may co-exist at any one given time in the two clusters (PF and DFS), and (3) these MR-Job Domains work on the top of another group of entities, master and slave nodes of the two clusters. From this point on, this latter group of entities, i.e. master and slave nodes of the two clusters, is referred to as the shared cluster infrastructure (Inf.) domain (i.e. MR-Inf. Domain). The MR-Inf. Domain should have its own authentication method, referred to as MR-Inf. Authentication. The MR-Inf. Authentication method is likely to be different from VDAF, the authentication method we propose for a MR-Job Domain. The classification of MR components into different groups, the structure of the groups into different layers, and use of different authentication methods for different layers indicate a layered approach to MR authentication. The next section describes a layered authentication model we propose for MR, i.e. the MR Layered Authentication Model. A. MR LAYERED AUTHENTICATION MODEL Based on the analysis and discussions in the section above, we propose to use the MR Layered Authentication Model (MR-LAM) to realize the whole task of authentication for MapReduce. As shown in Figure 8, MR-LAM consists of three authentication layers. These are MR-Inf. Domain Authentication Layer (Layer-1), MR-Job Domain Authentication Layer (Layer-2), and MR Components Authentication Layer (Layer-3). FIGURE 8. MR layered authentication model (A layered approach to authentication). The first layer, the MR-Inf. Domain Authentication Layer, serves the authentication of both the clusters’ server nodes and the MR components (MR-Inf Components and MR-Job Components). Layer-1, the MR-Inf. Domain Authentication Layer, is responsible for the authentication of any new physical node joining in a cluster of the MR application and the mutual authentication between any pair of physical nodes in the cluster. In other words, an Authentication Service (AS) for this layer should support two authentication tasks. The first task is the initial authentication of any new server nodes wanting to join the MR-Inf. Domain. A node should only be admitted to becoming a member of the MR-Inf. Domain if the node has been successfully authenticated. For example, considering the case shown in Figure 1, if many jobs are submitted and the Resource Manager in the PF cluster is running out of resources on the slave nodes and/or if the IT administrator, looking after the cluster, has 1672 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 16. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR decided to bring in a new slave node into the cluster, then the new node should be authenticated by the AS before it is allowed to become a member of this cluster. The second task performed by Layer-1 authentication service is to support a continuous mutual authentication among the server nodes in the MR-Inf. Domain. These two authentication tasks can be achieved using the Kerberos authentication solution. The Kerberos solution is a preferred authentication method for the MR-Inf. Domain, because the Kerberos is still the default mode of authentication service already deployed for the cluster infrastructure in private clouds, and many operating system (OS) of the clusters’ server nodes, such as Microsoft windows OS and Red Hat OS [21]–[23], already support this authentication method. Basically, if the Kerberos solution is used as the default mode of the authentication service in a cluster, all the slave nodes (i.e. all the members of the MR-Inf. Domain) in the cluster should support this mode of authentication service. They will use Kerberos to authenticate themselves to the master node in the cluster and to establish shared secret keys between the services hosted by the slave and master nodes (i.e. MR components). In this case, the MR-Inf. Domain authentication service is provided through the use of the Kerberos solution. However, the clients who have their jobs admitted into the MR cluster are not members of the MR-Inf. Domain themselves, but they should be authenticated before their jobs could be admitted. As a client will be part of a MR-Job Domain at Layer-2, a client should be authenticated by the authentication service provided at Layer-2. In other words, the second layer of the authentication model, the MR-Job Domain Authentication Layer, should be able to provide means for clients’ authentication. For Layer-2, i.e. the MR-Job Domain Authentication Layer, the authentication task is for mutual authentication among the MR components that are involved in serving a particular job. In other words, for this layer of authentication, an authentication method, different from the one used in Layer-1, may be used. As mentioned earlier, we propose to use the VDAF to support this layer of authentication. With VDAF, the MR components that serve a particular job are collectively referred to as a MR-Job Domain. The components in a MR-Job Domain perform authentication among themselves. At this layer, each client is registered with an AS. Upon successful registration, the AS issues a long-term access credential to the client so that the client can use this credential to submit jobs to the MR application. During the execution of this job, the client would be able to make use of the MR components (MR-Inf Components and MR-Job Components) or resources provided through these components, so long as these components and resources are assigned to the job. Also, at Layer-2, the authentication method and protocols used by each MR-Job Domain are expected to be the same, but the secrets used in each such domain are different and they should be protected against exposure to other domains. As mentioned earlier, each MR-Job Domain has its own MR-Job Components involved in the execution of a job submitted by a particular client. The client generates and manages the credentials (authentication secrets and other data) used to authenticate the MR-Job Components in this domain. In other words, this layer is responsible for providing the identification and authentication service by which the MR-Job Components assigned to each job can be securely identified and authenticated and do so at every interaction among themselves throughout the execution cycle of the job. Also, the Resource Manager, which manages the resources of the MR-Inf. Domain, is also involved in the authentication of all the MR-Job Domains. For the MR-Job Domain level (i.e. Level-2) authentication, the Resource Manager works as a relay to deliver the access credentials of each MR-Job Domain2 to the client and the MR-Job Components in that Domain (more details to follow in our future work). The third layer (Level-3) is the MR Components Authenti- cation Layer. As mentioned earlier, some of the MR compo- nents are shared by more than one job (these components will carry static and semi-permanent IDs), while others are exclusively used by the tasks created for a particular job submitted. These components will carry dynamic IDs – they are created when the job is created, but discarded when the execution of the job is completed. Hence, at Level-3, it is assumed that each of the MR components (either an MR-Inf Component or an MR-Job Component) has an authentication module. These authentication modules, depending on the hosting MR components, can be respectively named as Job Tracker Authentication (JT-AuthN) Modules, Task Tracker Authentication (TT-AuthN) Modules, Map Task Authentication (MT-AuthN) Modules, Reduce Task Authenti- cation (RT-AuthN) Modules, Name Node Authenti- cation (NN-AuthN) Module, Data Node Authentication (DN-AuthN) Modules, and Resource Manager Authentication (RM-AuthN) Relay Module. By embedding these authentication modules into their respective MR components, we can provide MR-Inf. Components and MR-Job Components with authentication services supporting the authentication among themselves and preventing unauthorized access to data or resources assigned to (or owned by) a particular job domain. The implementation of the Layer-2 and Layer-3 authentication services will be described in more details in our future paper. VII. CONCLUSION AND FUTURE WORK This paper has critically analyzed existing authentication methods designed for the MR model. It has also presented a high-level analysis of how an authentication service may be provided for the MR model and given a high-level idea of using a layered approach to the authentication in this context. The analysis of existing authentication methods has indicated that providing an inadequate authentication service to the MR model or deploying an authentication service that fails to capture the characteristics of the MR model would put clients’ 2The access credentials of an MR-Job Domain are the access credentials of both the client and the MR-Job Components of the MR-Job Domain. VOLUME 4, 2016 1673 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 17. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR jobs and the resources hosted in an MR application at a high level of risks. Providing an adequate authentication service for the MR model is a challenging task. This is due to the characteristics that the MR model is usually deployed in a shared infrastructural environment, and in such an environment, it is difficult to distinguish between a compromised and a trustworthy MR component. In addition, the hosting nodes in this environment are distributed, and possibly provided by multiple providers. The VDAF facilitates authentication on per job basis (i.e. Job Authentication (Job-AuthN)) and do so during the entire execution cycle of the job. It covers a chain of authentication tasks, namely, (i) from a user to the user’s client running on the user’s machine, (ii) from the client to the MR application (using the AS), and (iii) messages sent by the MR components; these are transactions sent from an MR-Job Component to an MR-Inf Component and from one MR-Job Component to another MR-Job Component. Among these authentication tasks, from (i) to (iii), data authentication should be provided. In other words, an authentication solution for MR should not only guard the gate into the system, but also guard every resource access during the execution of a job, as the execution involves multiple MR components distributed across multiple nodes of both PF and DFS clusters. Implementing a simple but effective authentication solution for the MR model is needed. So as part of our work to design such a solution, this paper has also given a high-level overview of how this solution may be designed. Our proposed idea is to use a layered approach to tackle the complex task of authentication in MR. This approach takes into account of the authentication requirements of all the components of the MR model. The authentication service should be provided at multiple levels and do so securely and efficiently. To satisfy the authentication requirements as detailed in R2.1 to R2.3, we have analyzed and identified all the possible interactions among the MR components. The interactions of C-RM, C-NN, JT-NN, TT-NN, RT-TT and RT-NN have been considered in our design of VDAF. We will also address the requirement of R2.4 and R2.5, taking into account of the interactions of RM-JT, NN-DN and JT-TT, in the design of an authentication solution for the MR-Inf. Domain Authentication Layer. The way we design the VDAF can also make the MR model be delivered as a Software as a Service (SaaS) in a public cloud environment. This is due to two factors. First, the MR-Inf. Domain is likely to be managed remotely from a central location by an MR provider and not by a client, and the client is not involved in issuing and managing (i.e. is not in control of) the authentication credentials or secrets of the members of the MR-Inf. Domain (i.e. the MR-Inf components which are actually the master nodes of both PF and DFS clusters) and MR-Job Component’s Hosting Nodes (which are actually the slave nodes of both clusters). In other words, a client does not have to install or configure the master and slave nodes of both shared clusters, the client is not involved in issuing and managing the authentication credentials or secrets of the MR-Inf components and the MR-Job Components’ Hosting Nodes, rather the client controls the authentication credentials or secrets of the MR-Job Components that are involved in executing his job (i.e. his own MR-Job Domain). Therefore, a client only needs to make a new job-submission request which includes uploading the data of his MR-Job Domain. Secondly, an MR application delivers its computing services in a ‘‘one-to-many’’ manner. This means that one MR-Inf. Domain hosts multiple MR-Job Domains and each MR-Job Domain serves the execution of a particular job submitted by a particular client. Each client has his own MR-Job Domain secrets (i.e. isolated from other MR-Job Domains), and the client controls the credentials or secrets of his own MR-Job Domain. The detailed design of the VDAF for MR-Job Domains will be presented in our future work. ACKNOWLEDGMENT The authors would like to thank their colleagues from the School of Computer Science at the University of Manchester, Manchester, U.K., who provided facilities that greatly assisted this paper. The authors would also like to thank the reviewers for their valuable comments to this paper. REFERENCES [1] J. Dyer and N. Zhang, ‘‘Security issues relating to inadequate authentication in MapReduce applications,’’ in Proc. Int. Conf. High Perform. Comput. Simulation (HPCS), Jul. 2013, pp. 281–288. [2] T. White, ‘‘How the MapReduce works,’’ in Hadoop: The Definitive Guide, 3rd ed. Tokyo, Japan: O’Reilly Inc., 2012. [3] I. Lahmer and N. Zhang, ‘‘MapReduce: MR model abstraction for future security study,’’ in Proc. 7th Int. Conf. Secur. Inf. Netw., 2014, pp. 392–398. [4] C. Lam, ‘‘Introducing hadoop, and managing hadoop,’’ in Hadoop in Action. Greenwich, U.K.: Manning Publications Co, 2010. [5] P. Zikopoulos, C. Eaton, D. Deroos, T. Deutsch, and G. Lapis, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. New York, NY, USA: McGraw-Hill, 2012. [6] J. Dean and S. Ghemawat, ‘‘MapReduce: Simplified data processing on large clusters,’’ Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008. [7] J. Xiao and Z. Xiao, ‘‘High-integrity MapReduce computation in cloud with speculative execution,’’ in Theoretical and Mathematical Foundations of Computer Science. Heidelberg, Germany: Springer-Verlag, 2011, pp. 397–404. [8] B. Lakhe, ‘‘Introducing Hadoop and its security,’’ in Practical Hadoop Security. New York, NY, USA: Apress, 2014. [9] I. Lahmer and N. Zhang, ‘‘MapReduce: A security analysis and authentication requirement specification,’’ in Proc. 2nd Int. Conf. Comput. Inf. Syst. (ICCIS), World Congr. Comput. Appl. Inf. Syst., 2015, pp. 65–71. [10] D. A. B. Fernandes, L. F. B. Soares, J. V. Gomes, M. M. Freire, and P. R. M. Inácio ‘‘Security issues in cloud environments: A survey,’’ Int. J. Inf. Secur., vol. 13, no. 2, pp. 113–170, Apr. 2014. [11] J. M. Kizza, ‘‘Cloud computing and related security issues,’’ in Guide to Computer Network Security. London, U.K.: Springer-Verlag, 2013, pp. 465–489. [12] A. Kumar, S. Jakhar, and S. Makkar, ‘‘Comparative analysis between DES and RSA algorithms,’’ Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 2, no. 7, pp. 386–391, Jul. 2012. [13] O. O’Malley, K. Zhang, S. Radia, R. Marti, and C. Harrell, ‘‘Hadoop security design,’’ Yahoo, Inc., Sunnyvale, CA, USA, Tech. Rep., 2009. [14] N. Somu, A. Gangaa, and V. S. S. Sriram, ‘‘Authentication service in Hadoop using one time pad,’’ Indian J. Sci. Technol., vol. 7, pp. 56–62, Apr. 2014. 1674 VOLUME 4, 2016 www.redpel.com +917620593389 www.redpel.com +917620593389
  • 18. I. Lahmer, N. Zhang: Toward a Virtual Domain Based Authentication on MR [15] S. Rubika, G. S. Sadasivam, and K. A. Kumari, ‘‘A novel authentication service for Hadoop in cloud environment,’’ in Proc. IEEE Int. Conf. Cloud Comput. Emerg. Markets (CCEM), Oct. 2012, pp. 1–6. [16] W. Wei, J. Du, T. Yu, and X. Gu, ‘‘SecureMR: A service integrity assurance framework for MapReduce,’’ in Proc. ACSAC, Dec. 2009, pp. 73–82. [17] B. Padmavathi and S. R. Kumari, ‘‘A survey on performance analysis of DES, AES and RSA algorithm along with LSB substitution technique,’’ Int. J. Sci. Res., vol. 2, no. 4, pp. 170–174, 2013. [18] A. Ruan and A. Martin, ‘‘TMR: Towards a trusted MapReduce infrastructure,’’ in Proc. IEEE 8th World Congr. Services, Jun. 2012, pp. 141–148. [19] J. Zhao, J. Tao, and A. Streit, ‘‘Enabling collaborative MapReduce on the cloud with a single-sign-on mechanism,’’ Computing, vol. 98, no. 1, pp. 55–72, Jan. 2014. [20] Q. Quan, W. Tian-Hong, Z. Rui, and X. Ming-Jun, ‘‘A model of cloud data secure storage based on HDFS,’’ in Proc. 12th IEEE Int. Conf. Comput. Inf. Sci. (ICIS), Jun. 2013, pp. 173–178. [21] Microsoft Technical Team. (2016). Securing Server Clusters, Microsoft Technet Library, accessed on Jan. 20, 2016. [Online]. Available: https:// technet.microsoft.com/en-us/library/cc785088%28v=ws.10%29.aspx [22] Microsoft Technical Team. (2016). Applying Kerberos Authentication in a Clustered Environment, Microsoft Technet Library, accessed on Jan. 20, 2016. [Online]. Available: https://technet.microsoft.com/enus/ library/cc738070%28v=ws.10%29.aspx [23] Red Hat Technical Team. (2015). ‘Creating Domains: Kerberos Authentication’ in Deployment Guide: Deployment, Configuration and Administration of Red Hat Enterprise Linux 6, accessed on Jan. 21, 2016. [Online]. Available: https://access.redhat.com /documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_ Guide/Configuring_Domains-Setting_up_Kerberos_Authentication.html [24] I. Lahmer and N. Zhang, ‘‘MapReduce: A critical analysis of existing authentication methods,’’ in Proc. 10th Int. Conf. Internet Technol. Secured Trans. (ICITST), Dec. 2015, pp. 302–313. IBRAHIM LAHMER received the B.Sc. (Hons.) degree in computer engineering from the University of Tripoli, Libya, in 2008, and the M.Sc. (Hons.) degree in computer and network security from Middlesex University, London, U.K., in 2010. He became CCENT, CCNA R&S, CCNA Security, and MCSA Certified, as he works as a Network and Security Administrator with National Oil Corporation for three years. Currently, he has been sponsored to do a research on computer networking and security with the School of Computer Science, The University of Manchester. His research interests includes authentication in distributed systems. He received the British Computing Society Prize for the best postgraduate computing project in London 2011. NING ZHANG received the B.Sc. degree in electronics engineering from Dalian Maritime University, Dalian, China, and the Ph.D. degree in electronics engineering from the University of Kent, Canterbury, U.K. She is currently a Senior Lecturer with the School of Computer Science, The University of Manchester, Manchester, U.K. Her current research interests include security in networked and distributed systems, applied cryptography, data privacy, and trust and digital right managements. She has authored papers and acted as referees and reviewers in these topic areas. VOLUME 4, 2016 1675 www.redpel.com +917620593389 www.redpel.com +917620593389