SlideShare a Scribd company logo
Datapath: Stochastic, Temporal Simulation of Data
Flow Reliability
Edward Verenich, Gennady Staskevich
Clarkson University
Abstract—We present a simulation of data flow reliability that
models directed information flow through a network where the
source of information may change with time. We also attempt to
quantify resource availability values or uptimes in the context of
time in order to model data transfer reliability more realistically.
We encode the problem as a variant of a Dynamic Bayesian
Network and develop a custom Markov Chain Monte Carlo
[5] sampling algorithm that computes the posterior marginal
probabilities of data being present at a given node at a given
time. We parameterize the probabilistic model using the binary
event conditional probability distribution and introduce several
additional parameters to the random variable in order to improve
the accuracy of our model. Finally, we present a light weight,
browser based model editor in order to construct and assess
probabilistic data flow models.
I. INTRODUCTION
Assessing network reliability is an important aspect of
network and cloud infrastructure [7] engineering. Most tech-
niques for doing so involve closed form solutions that we
believe make certain assumptions to simplify the problem
or become difficult to work with when models become very
large. Accounting for uncertainty in network communication
involves the use of probabilistic models, where using exact
inference techniques does not scale well as model connectivity
increases and increase in dimensionality of parameters makes
exact computations intractable. Account for time, and the
problem becomes even harder. Take for example a network
that simulates a vehicular mobile cloud [1] where the source
of the directional data flow changes in time as the vehicle
moves and must now send data through a new access point,
can we model this hand-off and its effect on overall data flow
reliability for every time unit in the model? In this paper we
present a light weight simulation that attempts to provide an
easy to use framework to make such reliability assessments
by encoding the problem as a Dynamic Bayesian Network
and running a custom Markov Chain Monte Carlo sampling
algorithm to compute the posterior marginal probabilities of
data being present at node x at time y.
This paper is structured as follows, in Section II we present
the probabilistic model, this includes the parameterization
method, which is a binary event conditional distribution model
and additional binary event parameters. Section III describes
our custom sampling algorithm that is a derivative from the
family of Markov Chain Monte Carlo algorithms. In Section
IV we describe our reference simulation editor that can be
used to construct DBN based simulations and provide several
model use-cases. Finally, Section V concludes our paper and
describes two possible extensions to our model that we believe
are novel and unique not only in applying MCMC to reliability
modeling, but to the general field of probabilistic causal
modeling.
II. PROBABILISTIC MODEL
In this section we describe our choice of a probabilistic
model that we felt was appropriate to model the dynamic
nature of mobile networks. We first provide some background
information on Graphical Models and Dynamic Bayesian
Networks. We then describe how we parameterize the model
and explain additional parameters that we introduce to our
binary event implementation.
A. Background
A proper introduction of Graphical Models, Markov Net-
works and Dynamic Bayesian Networks is beyond the scope
of this paper. Graphical models are useful for depicting in-
dependence and dependence relationships between probability
distributions, which is convenient computationally. They are
also used to model how variables interact, where ultimately a
given class of a Graphical Model corresponds to a factorization
property of the joint distribution. We attempt to model the flow
of data using a directed graphical model, which corresponds
to a Bayesian Network. By including the temporal dimension,
we end up with a Dynamic Bayesian Network, which we
parameterize using the Conditional Probability Distribution.
The CPD is time invariant, but other parameters drive the
temporal behavior of the model as we explain next.
B. Model Parameters
We chose to parameterize our network using the binary
event model. In this model, a random variable x represents an
event that can happen according to a conditional probability
distribution.
Fig. 1. Event z is caused by events x and y according to a CPD.
Figure 1 shows a directed graph where event z is caused
by events x and y independently, this means that each causing
event can trigger event z on its own. Consider the diagram as
showing data flowing to node z from node y with probability
0.8 and from node x to z with probability 0.9. This is encoded
in the CPD of Z as follows:
TABLE I
THE CPD FOR EVENT Z.
x y prob
0 0 .0
0 1 .8
1 0 .9
1 1 .98
Consider the set of causes Q = {x, y}, then the rows of
this table represent an entry of the CPD, where each entry
CPD[i] ∈ P(Q). This represents four possible configurations
for event z, (1) none of the links (x, z) nor (y, z) are
transmitting, (2) only (y, z) is transmitting, (3) only (x, z)
is transmitting, and (4) both are transmitting. So we see that
given our binary event model, each event or node in the model
will have a CPD of the size 2|Q|
. In the CPD in Table II-B
only the last parameter needed to be computed, for this we
utilize the Noisy-OR model generalized by Diaz [?] shown in
Equation 1.
p(y ←− c(x0, x1, ..., xn−1)) = 1 −
n−1
i=0
(1 − p(y ←− c(xi)))
(1)
The symbol ←− represents the causal relationship or the
directional flow of data, and the set c(x0, x1, ..., xn−1) is the
set Q.
An event, which represents a physical resource node in our
network simulation, has additional parameters that control its
behavior in our sampling simulation. Here we define these
parameters:
Definition Let Reliability represent the probability of an
event x having the value true given that it is caused by another
event or is explicitly triggered in the simulation.
Explicitly triggering a probabilistic event means that the
modeler specifies that a specific event will happen at a time t
with a probability p. This leads us to the next parameter that
specifies how long this event will persist.
Definition Let Persistence ∈ Z+
be the number of time
units that the event persists at probability p once it is triggered
at that probability.
Whenever an event is triggered, the model may specify
how long the event will last. For example the modeler may
easily specify that once an asset breaks down, it stays broken
until another event causes it to become functional. The next
parameter also relates to the temporal aspect of the model.
Definition Let Continuation represent the probability that
an event x has the value of true at time t given that it had a
probability p of being true at time t − 1.
This parameter allows the modeler to conveniently model
reliability estimates in time given a prior reliability measure-
ment for a fixed time interval t. For example, consider a
resource having a Reliability probability of 0.99 in an interval
t. Now consider the computation of the probability of failure
of resource σ at time interval x, thus we have:
Reliablityσ(intervalx) = 1 −
x
1
Reliabilityσ (2)
If we set x = 9 then at that time interval the probability of
failure becomes 0.086. Although this is a simple computation
for a single parameter, our continuation parameter allows us to
compute this value for every random variable and every time
interval in the model at no additional cost.
We also need to account for the fact that it is not practical or
realistic to model every possible cause for an event, especially
at model fidelity that we aim for, so we introduce a parameter
that accounts for unspecified causes, we call it the Leak.
Definition Let Leak be the probability that an event is caused
with a probability p at every time interval t by one or a set of
unspecified causes besides the set of causes Q.
Finally, each event has a schedule, which allows the event
to be triggered explicitly by the modeler.
Definition Let the Schedule represent a map of values (t →
p) that trigger an event x at time t with probability p.
The schedule gives us the ability to simulate the change in
the source of data flow in time, as we will see in Section IV.
III. SAMPLING ALGORITHM
In this section we outline our sampling algorithms that we
use to compute the marginal posterior probabilities of our
event variables that in turn represent the probability that our
network resources received the data from the source.
A. Generating a DBN
Before we can run our sampling algorithm, we need to
generate a DBN suitable for our sampler. This requires several
steps. We begin with the set of events and topologically sort
them, remember that our model is a Directed Acyclic Graph.
Next we generate the CPDs for each node, that corresponds
to an event, in the DBN by using the technique described
in Section II. Third, we Noisy − OR our leak value with
all values of the CPD, we note that if the value of leak is
zero, it has no effect on the CPD. Finally, we multiply each
parameter of the CPD by the reliability value. Remember,
that before reliability is applied each value in the CPD
reflects the probability that the signal made it to the target
node given that the source node sent it, now the target node
may not be available to receive it and re-transmit it, which is
what we model using the reliability parameter.
B. MCMC Algorithm
Here we present our Datapath MCMC algorithm and de-
scribe some of the computations that we omitted from the
pseudo code in Algorithm III-B for clarity.
Algorithm 1 Datapath MCMC
1: procedure DATAPATHMCMC
2: dbn ← topologically sorted nodes
3: durration ← positive number of time slices
4: samples ← number of samples to simulate
5: sample ← boolean array of length dbn.length
6: counts ← integer array for counts
7: for time t in durration do
8: for sample s in samples do
9: for node n in dbn do
10: prob ← 0.0
11: if n is root node then
12: prob ← leak or schedule[t]
13: else
14: if n is scheduled at t then
15: prob ← schedule[t]
16: else
17: prob ← CPD(state index)
18: if prob ≥ Random then
19: sample[n] ← true
20: counts[n] ← increment by one
21: for node n in dbn do
22: n ← set martinal for time t
One particular calculation that was not specified in the
algorithm but happens at line 17 is the computation of an index
of active parents that is used to obtain the probability value
from the CPD during sampling. This is done as follows: (1)
we sample our variables in topological order, thus all children
will have their parent states set before they are sampled, (2)
we store the variable’s CPD as a power set of its causes,
thus by reading the boolean values of parent states that have
already been set as bits we generate an integer index of active
parents that corresponds to the CPD value index that we need
to Monte Carlo against.
Another aspect omitted from the pseudo code was the use
of the persistence and continuation parameter driving our
temporal simulation, again this was done mostly for clarity,
but for those interested we invite them to reference our source
code that will be made available. To the best of our knowledge,
the majority of the general purpose tools used for Bayesian
modeling use the unrolling method to deal with temporal
simulations, basically taking a snapshot of the model for every
time slice and then sampling each one. This quickly leads to
models that require significant resources to execute, which is
not the case with our approach.
DatapathMCMC is a simple but effective algorithm for pre-
dicting directional reliability of data flow, it does not however
have good convergence properties for performing Bayesian
Inference in models that need to compute the joint posterior
distribution when many observations are made downstream.
This is done by design to address our specific scenario, thus
it is by no means a general purpose method for Bayesian
Inference. For the general case, we would probably select
Gibbs or Metropolis-Hastings [6], or some other variation
of Importance or Rejection sampling. In our case however,
rejecting samples is not necessary as our observations, in the
form of explicit scheduling, are mostly made upstream, or at
root nodes, thus most observations will be sampled first. Doing
the latter could be the next natural step and could lead to some
interesting applications like diagnosing the most likely path
that data took in anycast transmission, or identifying failure
points.
IV. SIMULATOR
In this section we describe our simulator implementation.
We chose to develop the simulator as a browser based
JavaScript application for ease of use for end users and the
convenience of existing visual library, namely D3.js, which
we utilize for network graph creation. We also considered
the ease with which users can experiment with extending this
framework because of the lack of any real configuration or
even by utilizing the variety of online editors like JSFiddle.
A. Network Editor
The graphical network editor is written using the D3.js vec-
tor graphics library. It allows the user to construct a directed
graph where nodes represent network elements that map to
events in our probabilistic model. Directed edges represent
data links and have reliability values that are specified at their
target nodes. These values map to conditional probabilities of
their target random variable in the form P(target|source),
and are used to compute full conditional probability distribu-
tions for each node.
B. Running the Simulation
When a new node is added, its Reliability value defaults to
1.0, so does any link that is added. This allows the modeler to
quickly sketch out a network topology before experimenting
with reliability values and time. Once all the desired values
are specified, the user specifies the duration of the simulation
in abstract time unit and specify the number of samples to
generate. The rule is the more samples we generate the more
accurate our estimate is, but as mentioned in Section III our
algorithm converges very quickly in our intended application.
C. Serial Reliability
Our first simulation consists of running a serial data path
consisting of three nodes. There are several ways to build
simulations using our model; in this simulation we model
node failure rates per time unit, where a time unit can be
any interval of time for which we have prior measured failure
information, for example we know that a particular resource
has an uptime of 99.99% in a given week (independent of any
other resource), the second resource in a serial flow has the
uptime of 99.98% for the same interval, also independent of
other resources, and so on.
Fig. 2. Serial path consisting of three nodes.
Figure 2 shows the setup in our first simulation. In this
model the verticies represent hardware nodes or virtual routers
or servers, while the edges represent connection means, which
could be physical or virtual. To validate our simulation algo-
rithm we kept edge probabilities at 1.0, meaning we are only
accounting for node failure rates. We model this simulation
using our leak parameter, essentially stating that we are
modeling recorded average failure priors for any given interval
t (week). Our leak parameters are set as follows: Let N be the
set of nodes in a serial datapath and their leak probabilities
are (n3 = 0.01, n4 = 0.02, n5 = 0.01), we would like to project
their reliability for the next 15 weeks. Using our closed form
from Equation 2, we can calculate node 3 expected failure
at time 5 to be 0.049. Running our simulation we have the
following:
Fig. 3. Node 3 probability at time step 5.
Figure 3 shows that we are approximating the probability
of failing to deliver data fairly closely. Now we consider the
next node in the series, if we were to compute its failure
to deliver data as an origin or root node, in other words
without any previously accumulated failure causing additional
failure possibility, we can use the same formula to compute its
expected failure at time 5 to be about 0.096%. But we need
to accumulate all the previous expectations, thus we need to
account for node 4 failing to deliver data because it failed to
deliver independently and the possibility that node 3 failed
prior to data traveling to node 4, and this must be done for all
time intervals.
Fig. 4. Node 4 without any causal contribution from Node 3.
Figure 4 shows our approximation for Node 4 at time 5.
This is close to the exact answer computing using Equation
2. Now we activate Node 3 effectively accumulating Node 3
and Node 4 failure probability at Node 4. Figure 5 shows
Fig. 5. Node 4 with a causal contribution from Node 3.
a significantly higher probability of failing to deliver data
at Node 4 at time 5. This is an interesting result that we
would like to investigate further. We interpret this result as
the probability of data delivery failure at time t and node i
given the prior average failure measurements for interval t for
all nodes prior and including node i in a given series.
D. Mobile Hand-off
Our next model involves the scheduling mechanism that
allows us to simulate the switching of data sources during
the simulation. This may be analogous to a mobile customer
connecting to different towers while moving in space or hand-
offs in vehicular clouds. Figure 6 shows a model that simulates
Fig. 6. Node 3 and Node 7 simulate data sources at different times.
two sources of data at different times, essentially simulating
a hand-off. Consider Node 8 to be the final destination and
Nodes 3 and 7 to be gateways nodes for a mobile client.
We simulate the effect of movement in space and time that
involves connecting to a different gateway, which results in
a different path to the final destination. In this model we
planned to simulate data flow reliability using our reliability
and schedule parameters, as opposed to the leak parameter.
The idea here is to specify the reliability value at each node
while each incoming edge represents a conditional reliability
of that connection given that the source node sent the data.
Unfortunately we were not able to complete the simulation in
time given an unexpected bug in the simulator, which we are
still investigating. We expect this to be resolved in the near
term.
V. CONCLUSION
We developed a stochastic, temporal simulation of network
data flow that is easy to use and delivers unique capabilities
such as temporal probability profiles for all model variables,
in contrast to explicit queries on individual variables. We
introduced the capability of simulating connection hand-offs
associated with mobile and vehicular clouds. We developed a
light weight simulation editor that can be extended to support
even more sophisticated simulation scenarios.
A. Current Bugs and Incomplete Features
As mentioned in the previous section, our simulations in-
volving the scheduling parameter need to be debugged, which
we expect to be fairly straight forward. The user interface
needs to be cleaned up and input validation needs to be im-
plemented. The persistence parameter needs to be integrated in
the simulation, it is currently being done using the scheduling
parameter for every time interval. Saving the model locally
works as it was used for debugging purposes, but loading it
back needs to be finished. We need to further investigate our
serial reliability results as well as experiment with random
number generation other than the standard JavaScript random
method.
B. Possible Extensions
We envision two possible extensions that would result
in some novel simulation capabilities. First, when modeling
multiple incoming connections into a node, we utilize the
Noisy-OR model to calculate each link’s contribution to data
reliability assuming independence. Perhaps, there are mea-
surements that we can use as prior information that capture
the synergistic effect of these links being active at the same
time that result in a higher probability than that calculated
using the independence assumption. Currently, there is no
way to specify that value in the simulation, and if there was,
we would need to utilize a different distribution estimation
algorithm, perhaps something similar to Recursive Noisy-OR
[3] proposed by Lemmer and Gossink. Their generalization
of Noisy-OR allows the learned parameters to be specified
explicitly, then the rule propagates these parameters to the
parameters that contain these learned values as their subsets.
For example, given a set of incoming links Q = {x, y, z}, we
have direct measurements θ that when x, y are both active their
reliability contribution to the node is different than the one
computed using the independence assumption, with RNOR
we can incorporate this joint measurement with Q where θ will
be propagated to the set x, y, z unless it is specified explicitly.
The second extension involves introducing the concept of
counter factual reasoning. This involves adding causal influ-
ences that act on their effect when they are absent, or do not
happen. For example, the statement if it does not rain, my car
will probably be dry is one instance of such an interaction. The
technique to model these interactions in binary event models
is referred to as CPD inversion. It was originally suggested
[4] by J. Pearl, but the exact details are beyond the scope of
this article.
REFERENCES
[1] R. Yu, Y. Zhang, S. Gjessing, W. Xia, K. Yang, Toward Cloud-based
Vehicular Networks with Efficient Resource Management. 2013.
[2] F.J. Diez, Parameter adjustment in Bayes networks. The generalized noisy
OR gate. in Proc. 9th Annu. Conf. Uncertainty Artificial Intelligence, San
Mateo, CA, 1993, pp. 99-105.
[3] J.F. Lemmer, D.E. Gossink, Recursive Noisy OR - A Rule for Estimating
Complex Probabilistic Interactions. IEEE Transactions on Systems, Man,
And Cybernetics - Part B:Cybernetics, 2004. 34(6).
[4] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. San Mateo, CA. 1988. Morgan Kaufmann.
[5] J. S. Liu, Monte Carlo Strategies in Scientific Computing. New York, NY.
2004. Springer Science Media LLC.
[6] D. Barber, Bayesian Reasoning and Machine Learning. U.K. 2012.
Cambridge University Press.
[7] N. L. S. Fonseca, R. Boutaba, Cloud Services, Networking, and Manag-
ment. Danvers, MA. 2015. Wiley.

More Related Content

PDF
Clustering Algorithms for Data Stream
PDF
REDUCING FREQUENCY OF GROUP REKEYING OPERATION
PDF
Approaches to online quantile estimation
PDF
Load balancing in public cloud combining the concepts of data mining and netw...
PDF
A Novel Design Architecture of Secure Communication System with Reduced-Order...
DOCX
NEW ALGORITHMS FOR SECURE OUTSOURCING OF LARGE-SCALE SYSTEMS OF LINEAR EQUAT...
PDF
A study of localized algorithm for self organized wireless sensor network and...
PDF
Erca energy efficient routing and reclustering
Clustering Algorithms for Data Stream
REDUCING FREQUENCY OF GROUP REKEYING OPERATION
Approaches to online quantile estimation
Load balancing in public cloud combining the concepts of data mining and netw...
A Novel Design Architecture of Secure Communication System with Reduced-Order...
NEW ALGORITHMS FOR SECURE OUTSOURCING OF LARGE-SCALE SYSTEMS OF LINEAR EQUAT...
A study of localized algorithm for self organized wireless sensor network and...
Erca energy efficient routing and reclustering

What's hot (16)

PDF
Compressive Data Gathering using NACS in Wireless Sensor Network
PPT
Part 2: Unsupervised Learning Machine Learning Techniques
PDF
50120130406039
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
E035425030
DOC
ASCE_ChingHuei_Rev00..
PDF
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
PDF
Energy Efficient Power Failure Diagonisis For Wireless Network Using Random G...
PDF
Analysis of single server fixed batch service queueing system under multiple ...
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PDF
Testing and Improving Local Adaptive Importance Sampling in LFJ Local-JT in M...
PPTX
Dimension Reduction: What? Why? and How?
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Performance of the Maximum Stable Connected Dominating Sets in the Presence o...
PDF
Density Based Subspace Clustering Over Dynamic Data
PDF
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
Compressive Data Gathering using NACS in Wireless Sensor Network
Part 2: Unsupervised Learning Machine Learning Techniques
50120130406039
International Journal of Engineering Research and Development (IJERD)
E035425030
ASCE_ChingHuei_Rev00..
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
Energy Efficient Power Failure Diagonisis For Wireless Network Using Random G...
Analysis of single server fixed batch service queueing system under multiple ...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
Testing and Improving Local Adaptive Importance Sampling in LFJ Local-JT in M...
Dimension Reduction: What? Why? and How?
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Performance of the Maximum Stable Connected Dominating Sets in the Presence o...
Density Based Subspace Clustering Over Dynamic Data
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
Ad

Viewers also liked (17)

PPTX
7 d justin tan sekoteng
PDF
Long Snapper University Brochure (edited final)
PPT
koneke
DOCX
Flat Plan of Contents Page
PDF
Forget ad blocking, user experience is a big deal - Digiday WTF Ad Blocking N...
DOC
diaa hrb CV1-2016
PDF
Boletín informativo UNIORE
PPTX
Tichnorpaige
PPT
7 c presentation1 (fix.) thalia
DOC
Proyecto lectura escritura
PPTX
Grafos Eulerianos y Hamiltanianos
PPTX
Challenges and Opportunities in Industrial Biotech Regulation
PPTX
Activid Ad Glosario[3[2
PPTX
Story boards
PDF
Speed Dating on Advertising
PPT
Introdução à Estratégia do Oceano Azul
7 d justin tan sekoteng
Long Snapper University Brochure (edited final)
koneke
Flat Plan of Contents Page
Forget ad blocking, user experience is a big deal - Digiday WTF Ad Blocking N...
diaa hrb CV1-2016
Boletín informativo UNIORE
Tichnorpaige
7 c presentation1 (fix.) thalia
Proyecto lectura escritura
Grafos Eulerianos y Hamiltanianos
Challenges and Opportunities in Industrial Biotech Regulation
Activid Ad Glosario[3[2
Story boards
Speed Dating on Advertising
Introdução à Estratégia do Oceano Azul
Ad

Similar to Datapath (20)

PPT
ProbabilisticModeling20080411
PDF
PDF
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
PPTX
“AI techniques in cyber-security applications”. Flammini lnu susec19
PDF
A Sample-Driven Channel Model for Developing and Testing Practical WSN Applic...
PDF
Bayesian network based software reliability prediction
PDF
Internet of Things Data Science
PDF
LOAD BALANCING MANAGEMENT USING FUZZY LOGIC TO IMPROVE THE REPORT TRANSFER SU...
PDF
A2546035115
PDF
IMPROVEMENT OF FALSE REPORT DETECTION PERFORMANCE BASED ON INVALID DATA DETEC...
KEY
PDF
Ali Mousavi -- Event modeling
PDF
A Short Course in Data Stream Mining
PPTX
Stochastic modelling and its applications
PDF
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
PDF
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
PDF
Approximation of regression-based fault minimization for network traffic
PDF
Can someone provide a solution for this assignmentPurpose of This.pdf
PPT
Mitigating routing misbehavior in mobile ad hoc networks
PDF
Modeling and Analysis of Two Node Network Model with Multiple States in Mobi...
ProbabilisticModeling20080411
Markov Chain Monitoring - Application to demand prediction in bike sharing sy...
“AI techniques in cyber-security applications”. Flammini lnu susec19
A Sample-Driven Channel Model for Developing and Testing Practical WSN Applic...
Bayesian network based software reliability prediction
Internet of Things Data Science
LOAD BALANCING MANAGEMENT USING FUZZY LOGIC TO IMPROVE THE REPORT TRANSFER SU...
A2546035115
IMPROVEMENT OF FALSE REPORT DETECTION PERFORMANCE BASED ON INVALID DATA DETEC...
Ali Mousavi -- Event modeling
A Short Course in Data Stream Mining
Stochastic modelling and its applications
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
STATE SPACE GENERATION FRAMEWORK BASED ON BINARY DECISION DIAGRAM FOR DISTRIB...
Approximation of regression-based fault minimization for network traffic
Can someone provide a solution for this assignmentPurpose of This.pdf
Mitigating routing misbehavior in mobile ad hoc networks
Modeling and Analysis of Two Node Network Model with Multiple States in Mobi...

Datapath

  • 1. Datapath: Stochastic, Temporal Simulation of Data Flow Reliability Edward Verenich, Gennady Staskevich Clarkson University Abstract—We present a simulation of data flow reliability that models directed information flow through a network where the source of information may change with time. We also attempt to quantify resource availability values or uptimes in the context of time in order to model data transfer reliability more realistically. We encode the problem as a variant of a Dynamic Bayesian Network and develop a custom Markov Chain Monte Carlo [5] sampling algorithm that computes the posterior marginal probabilities of data being present at a given node at a given time. We parameterize the probabilistic model using the binary event conditional probability distribution and introduce several additional parameters to the random variable in order to improve the accuracy of our model. Finally, we present a light weight, browser based model editor in order to construct and assess probabilistic data flow models. I. INTRODUCTION Assessing network reliability is an important aspect of network and cloud infrastructure [7] engineering. Most tech- niques for doing so involve closed form solutions that we believe make certain assumptions to simplify the problem or become difficult to work with when models become very large. Accounting for uncertainty in network communication involves the use of probabilistic models, where using exact inference techniques does not scale well as model connectivity increases and increase in dimensionality of parameters makes exact computations intractable. Account for time, and the problem becomes even harder. Take for example a network that simulates a vehicular mobile cloud [1] where the source of the directional data flow changes in time as the vehicle moves and must now send data through a new access point, can we model this hand-off and its effect on overall data flow reliability for every time unit in the model? In this paper we present a light weight simulation that attempts to provide an easy to use framework to make such reliability assessments by encoding the problem as a Dynamic Bayesian Network and running a custom Markov Chain Monte Carlo sampling algorithm to compute the posterior marginal probabilities of data being present at node x at time y. This paper is structured as follows, in Section II we present the probabilistic model, this includes the parameterization method, which is a binary event conditional distribution model and additional binary event parameters. Section III describes our custom sampling algorithm that is a derivative from the family of Markov Chain Monte Carlo algorithms. In Section IV we describe our reference simulation editor that can be used to construct DBN based simulations and provide several model use-cases. Finally, Section V concludes our paper and describes two possible extensions to our model that we believe are novel and unique not only in applying MCMC to reliability modeling, but to the general field of probabilistic causal modeling. II. PROBABILISTIC MODEL In this section we describe our choice of a probabilistic model that we felt was appropriate to model the dynamic nature of mobile networks. We first provide some background information on Graphical Models and Dynamic Bayesian Networks. We then describe how we parameterize the model and explain additional parameters that we introduce to our binary event implementation. A. Background A proper introduction of Graphical Models, Markov Net- works and Dynamic Bayesian Networks is beyond the scope of this paper. Graphical models are useful for depicting in- dependence and dependence relationships between probability distributions, which is convenient computationally. They are also used to model how variables interact, where ultimately a given class of a Graphical Model corresponds to a factorization property of the joint distribution. We attempt to model the flow of data using a directed graphical model, which corresponds to a Bayesian Network. By including the temporal dimension, we end up with a Dynamic Bayesian Network, which we parameterize using the Conditional Probability Distribution. The CPD is time invariant, but other parameters drive the temporal behavior of the model as we explain next. B. Model Parameters We chose to parameterize our network using the binary event model. In this model, a random variable x represents an event that can happen according to a conditional probability distribution. Fig. 1. Event z is caused by events x and y according to a CPD.
  • 2. Figure 1 shows a directed graph where event z is caused by events x and y independently, this means that each causing event can trigger event z on its own. Consider the diagram as showing data flowing to node z from node y with probability 0.8 and from node x to z with probability 0.9. This is encoded in the CPD of Z as follows: TABLE I THE CPD FOR EVENT Z. x y prob 0 0 .0 0 1 .8 1 0 .9 1 1 .98 Consider the set of causes Q = {x, y}, then the rows of this table represent an entry of the CPD, where each entry CPD[i] ∈ P(Q). This represents four possible configurations for event z, (1) none of the links (x, z) nor (y, z) are transmitting, (2) only (y, z) is transmitting, (3) only (x, z) is transmitting, and (4) both are transmitting. So we see that given our binary event model, each event or node in the model will have a CPD of the size 2|Q| . In the CPD in Table II-B only the last parameter needed to be computed, for this we utilize the Noisy-OR model generalized by Diaz [?] shown in Equation 1. p(y ←− c(x0, x1, ..., xn−1)) = 1 − n−1 i=0 (1 − p(y ←− c(xi))) (1) The symbol ←− represents the causal relationship or the directional flow of data, and the set c(x0, x1, ..., xn−1) is the set Q. An event, which represents a physical resource node in our network simulation, has additional parameters that control its behavior in our sampling simulation. Here we define these parameters: Definition Let Reliability represent the probability of an event x having the value true given that it is caused by another event or is explicitly triggered in the simulation. Explicitly triggering a probabilistic event means that the modeler specifies that a specific event will happen at a time t with a probability p. This leads us to the next parameter that specifies how long this event will persist. Definition Let Persistence ∈ Z+ be the number of time units that the event persists at probability p once it is triggered at that probability. Whenever an event is triggered, the model may specify how long the event will last. For example the modeler may easily specify that once an asset breaks down, it stays broken until another event causes it to become functional. The next parameter also relates to the temporal aspect of the model. Definition Let Continuation represent the probability that an event x has the value of true at time t given that it had a probability p of being true at time t − 1. This parameter allows the modeler to conveniently model reliability estimates in time given a prior reliability measure- ment for a fixed time interval t. For example, consider a resource having a Reliability probability of 0.99 in an interval t. Now consider the computation of the probability of failure of resource σ at time interval x, thus we have: Reliablityσ(intervalx) = 1 − x 1 Reliabilityσ (2) If we set x = 9 then at that time interval the probability of failure becomes 0.086. Although this is a simple computation for a single parameter, our continuation parameter allows us to compute this value for every random variable and every time interval in the model at no additional cost. We also need to account for the fact that it is not practical or realistic to model every possible cause for an event, especially at model fidelity that we aim for, so we introduce a parameter that accounts for unspecified causes, we call it the Leak. Definition Let Leak be the probability that an event is caused with a probability p at every time interval t by one or a set of unspecified causes besides the set of causes Q. Finally, each event has a schedule, which allows the event to be triggered explicitly by the modeler. Definition Let the Schedule represent a map of values (t → p) that trigger an event x at time t with probability p. The schedule gives us the ability to simulate the change in the source of data flow in time, as we will see in Section IV. III. SAMPLING ALGORITHM In this section we outline our sampling algorithms that we use to compute the marginal posterior probabilities of our event variables that in turn represent the probability that our network resources received the data from the source. A. Generating a DBN Before we can run our sampling algorithm, we need to generate a DBN suitable for our sampler. This requires several steps. We begin with the set of events and topologically sort them, remember that our model is a Directed Acyclic Graph. Next we generate the CPDs for each node, that corresponds to an event, in the DBN by using the technique described in Section II. Third, we Noisy − OR our leak value with all values of the CPD, we note that if the value of leak is zero, it has no effect on the CPD. Finally, we multiply each parameter of the CPD by the reliability value. Remember, that before reliability is applied each value in the CPD reflects the probability that the signal made it to the target node given that the source node sent it, now the target node may not be available to receive it and re-transmit it, which is what we model using the reliability parameter.
  • 3. B. MCMC Algorithm Here we present our Datapath MCMC algorithm and de- scribe some of the computations that we omitted from the pseudo code in Algorithm III-B for clarity. Algorithm 1 Datapath MCMC 1: procedure DATAPATHMCMC 2: dbn ← topologically sorted nodes 3: durration ← positive number of time slices 4: samples ← number of samples to simulate 5: sample ← boolean array of length dbn.length 6: counts ← integer array for counts 7: for time t in durration do 8: for sample s in samples do 9: for node n in dbn do 10: prob ← 0.0 11: if n is root node then 12: prob ← leak or schedule[t] 13: else 14: if n is scheduled at t then 15: prob ← schedule[t] 16: else 17: prob ← CPD(state index) 18: if prob ≥ Random then 19: sample[n] ← true 20: counts[n] ← increment by one 21: for node n in dbn do 22: n ← set martinal for time t One particular calculation that was not specified in the algorithm but happens at line 17 is the computation of an index of active parents that is used to obtain the probability value from the CPD during sampling. This is done as follows: (1) we sample our variables in topological order, thus all children will have their parent states set before they are sampled, (2) we store the variable’s CPD as a power set of its causes, thus by reading the boolean values of parent states that have already been set as bits we generate an integer index of active parents that corresponds to the CPD value index that we need to Monte Carlo against. Another aspect omitted from the pseudo code was the use of the persistence and continuation parameter driving our temporal simulation, again this was done mostly for clarity, but for those interested we invite them to reference our source code that will be made available. To the best of our knowledge, the majority of the general purpose tools used for Bayesian modeling use the unrolling method to deal with temporal simulations, basically taking a snapshot of the model for every time slice and then sampling each one. This quickly leads to models that require significant resources to execute, which is not the case with our approach. DatapathMCMC is a simple but effective algorithm for pre- dicting directional reliability of data flow, it does not however have good convergence properties for performing Bayesian Inference in models that need to compute the joint posterior distribution when many observations are made downstream. This is done by design to address our specific scenario, thus it is by no means a general purpose method for Bayesian Inference. For the general case, we would probably select Gibbs or Metropolis-Hastings [6], or some other variation of Importance or Rejection sampling. In our case however, rejecting samples is not necessary as our observations, in the form of explicit scheduling, are mostly made upstream, or at root nodes, thus most observations will be sampled first. Doing the latter could be the next natural step and could lead to some interesting applications like diagnosing the most likely path that data took in anycast transmission, or identifying failure points. IV. SIMULATOR In this section we describe our simulator implementation. We chose to develop the simulator as a browser based JavaScript application for ease of use for end users and the convenience of existing visual library, namely D3.js, which we utilize for network graph creation. We also considered the ease with which users can experiment with extending this framework because of the lack of any real configuration or even by utilizing the variety of online editors like JSFiddle. A. Network Editor The graphical network editor is written using the D3.js vec- tor graphics library. It allows the user to construct a directed graph where nodes represent network elements that map to events in our probabilistic model. Directed edges represent data links and have reliability values that are specified at their target nodes. These values map to conditional probabilities of their target random variable in the form P(target|source), and are used to compute full conditional probability distribu- tions for each node. B. Running the Simulation When a new node is added, its Reliability value defaults to 1.0, so does any link that is added. This allows the modeler to quickly sketch out a network topology before experimenting with reliability values and time. Once all the desired values are specified, the user specifies the duration of the simulation in abstract time unit and specify the number of samples to generate. The rule is the more samples we generate the more accurate our estimate is, but as mentioned in Section III our algorithm converges very quickly in our intended application. C. Serial Reliability Our first simulation consists of running a serial data path consisting of three nodes. There are several ways to build simulations using our model; in this simulation we model node failure rates per time unit, where a time unit can be any interval of time for which we have prior measured failure information, for example we know that a particular resource has an uptime of 99.99% in a given week (independent of any other resource), the second resource in a serial flow has the uptime of 99.98% for the same interval, also independent of other resources, and so on.
  • 4. Fig. 2. Serial path consisting of three nodes. Figure 2 shows the setup in our first simulation. In this model the verticies represent hardware nodes or virtual routers or servers, while the edges represent connection means, which could be physical or virtual. To validate our simulation algo- rithm we kept edge probabilities at 1.0, meaning we are only accounting for node failure rates. We model this simulation using our leak parameter, essentially stating that we are modeling recorded average failure priors for any given interval t (week). Our leak parameters are set as follows: Let N be the set of nodes in a serial datapath and their leak probabilities are (n3 = 0.01, n4 = 0.02, n5 = 0.01), we would like to project their reliability for the next 15 weeks. Using our closed form from Equation 2, we can calculate node 3 expected failure at time 5 to be 0.049. Running our simulation we have the following: Fig. 3. Node 3 probability at time step 5. Figure 3 shows that we are approximating the probability of failing to deliver data fairly closely. Now we consider the next node in the series, if we were to compute its failure to deliver data as an origin or root node, in other words without any previously accumulated failure causing additional failure possibility, we can use the same formula to compute its expected failure at time 5 to be about 0.096%. But we need to accumulate all the previous expectations, thus we need to account for node 4 failing to deliver data because it failed to deliver independently and the possibility that node 3 failed prior to data traveling to node 4, and this must be done for all time intervals. Fig. 4. Node 4 without any causal contribution from Node 3. Figure 4 shows our approximation for Node 4 at time 5. This is close to the exact answer computing using Equation 2. Now we activate Node 3 effectively accumulating Node 3 and Node 4 failure probability at Node 4. Figure 5 shows Fig. 5. Node 4 with a causal contribution from Node 3. a significantly higher probability of failing to deliver data at Node 4 at time 5. This is an interesting result that we would like to investigate further. We interpret this result as the probability of data delivery failure at time t and node i given the prior average failure measurements for interval t for all nodes prior and including node i in a given series. D. Mobile Hand-off Our next model involves the scheduling mechanism that allows us to simulate the switching of data sources during the simulation. This may be analogous to a mobile customer connecting to different towers while moving in space or hand- offs in vehicular clouds. Figure 6 shows a model that simulates Fig. 6. Node 3 and Node 7 simulate data sources at different times. two sources of data at different times, essentially simulating a hand-off. Consider Node 8 to be the final destination and Nodes 3 and 7 to be gateways nodes for a mobile client. We simulate the effect of movement in space and time that involves connecting to a different gateway, which results in a different path to the final destination. In this model we planned to simulate data flow reliability using our reliability and schedule parameters, as opposed to the leak parameter. The idea here is to specify the reliability value at each node while each incoming edge represents a conditional reliability of that connection given that the source node sent the data. Unfortunately we were not able to complete the simulation in time given an unexpected bug in the simulator, which we are still investigating. We expect this to be resolved in the near term. V. CONCLUSION We developed a stochastic, temporal simulation of network data flow that is easy to use and delivers unique capabilities
  • 5. such as temporal probability profiles for all model variables, in contrast to explicit queries on individual variables. We introduced the capability of simulating connection hand-offs associated with mobile and vehicular clouds. We developed a light weight simulation editor that can be extended to support even more sophisticated simulation scenarios. A. Current Bugs and Incomplete Features As mentioned in the previous section, our simulations in- volving the scheduling parameter need to be debugged, which we expect to be fairly straight forward. The user interface needs to be cleaned up and input validation needs to be im- plemented. The persistence parameter needs to be integrated in the simulation, it is currently being done using the scheduling parameter for every time interval. Saving the model locally works as it was used for debugging purposes, but loading it back needs to be finished. We need to further investigate our serial reliability results as well as experiment with random number generation other than the standard JavaScript random method. B. Possible Extensions We envision two possible extensions that would result in some novel simulation capabilities. First, when modeling multiple incoming connections into a node, we utilize the Noisy-OR model to calculate each link’s contribution to data reliability assuming independence. Perhaps, there are mea- surements that we can use as prior information that capture the synergistic effect of these links being active at the same time that result in a higher probability than that calculated using the independence assumption. Currently, there is no way to specify that value in the simulation, and if there was, we would need to utilize a different distribution estimation algorithm, perhaps something similar to Recursive Noisy-OR [3] proposed by Lemmer and Gossink. Their generalization of Noisy-OR allows the learned parameters to be specified explicitly, then the rule propagates these parameters to the parameters that contain these learned values as their subsets. For example, given a set of incoming links Q = {x, y, z}, we have direct measurements θ that when x, y are both active their reliability contribution to the node is different than the one computed using the independence assumption, with RNOR we can incorporate this joint measurement with Q where θ will be propagated to the set x, y, z unless it is specified explicitly. The second extension involves introducing the concept of counter factual reasoning. This involves adding causal influ- ences that act on their effect when they are absent, or do not happen. For example, the statement if it does not rain, my car will probably be dry is one instance of such an interaction. The technique to model these interactions in binary event models is referred to as CPD inversion. It was originally suggested [4] by J. Pearl, but the exact details are beyond the scope of this article. REFERENCES [1] R. Yu, Y. Zhang, S. Gjessing, W. Xia, K. Yang, Toward Cloud-based Vehicular Networks with Efficient Resource Management. 2013. [2] F.J. Diez, Parameter adjustment in Bayes networks. The generalized noisy OR gate. in Proc. 9th Annu. Conf. Uncertainty Artificial Intelligence, San Mateo, CA, 1993, pp. 99-105. [3] J.F. Lemmer, D.E. Gossink, Recursive Noisy OR - A Rule for Estimating Complex Probabilistic Interactions. IEEE Transactions on Systems, Man, And Cybernetics - Part B:Cybernetics, 2004. 34(6). [4] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA. 1988. Morgan Kaufmann. [5] J. S. Liu, Monte Carlo Strategies in Scientific Computing. New York, NY. 2004. Springer Science Media LLC. [6] D. Barber, Bayesian Reasoning and Machine Learning. U.K. 2012. Cambridge University Press. [7] N. L. S. Fonseca, R. Boutaba, Cloud Services, Networking, and Manag- ment. Danvers, MA. 2015. Wiley.