Datapath

Datapath: Stochastic, Temporal Simulation of Data
Flow Reliability
Edward Verenich, Gennady Staskevich
Clarkson University
Abstract—We present a simulation of data flow reliability that
models directed information flow through a network where the
source of information may change with time. We also attempt to
quantify resource availability values or uptimes in the context of
time in order to model data transfer reliability more realistically.
We encode the problem as a variant of a Dynamic Bayesian
Network and develop a custom Markov Chain Monte Carlo
[5] sampling algorithm that computes the posterior marginal
probabilities of data being present at a given node at a given
time. We parameterize the probabilistic model using the binary
event conditional probability distribution and introduce several
additional parameters to the random variable in order to improve
the accuracy of our model. Finally, we present a light weight,
browser based model editor in order to construct and assess
probabilistic data flow models.
I. INTRODUCTION
Assessing network reliability is an important aspect of
network and cloud infrastructure [7] engineering. Most tech-
niques for doing so involve closed form solutions that we
believe make certain assumptions to simplify the problem
or become difficult to work with when models become very
large. Accounting for uncertainty in network communication
involves the use of probabilistic models, where using exact
inference techniques does not scale well as model connectivity
increases and increase in dimensionality of parameters makes
exact computations intractable. Account for time, and the
problem becomes even harder. Take for example a network
that simulates a vehicular mobile cloud [1] where the source
of the directional data flow changes in time as the vehicle
moves and must now send data through a new access point,
can we model this hand-off and its effect on overall data flow
reliability for every time unit in the model? In this paper we
present a light weight simulation that attempts to provide an
easy to use framework to make such reliability assessments
by encoding the problem as a Dynamic Bayesian Network
and running a custom Markov Chain Monte Carlo sampling
algorithm to compute the posterior marginal probabilities of
data being present at node x at time y.
This paper is structured as follows, in Section II we present
the probabilistic model, this includes the parameterization
method, which is a binary event conditional distribution model
and additional binary event parameters. Section III describes
our custom sampling algorithm that is a derivative from the
family of Markov Chain Monte Carlo algorithms. In Section
IV we describe our reference simulation editor that can be
used to construct DBN based simulations and provide several
model use-cases. Finally, Section V concludes our paper and
describes two possible extensions to our model that we believe
are novel and unique not only in applying MCMC to reliability
modeling, but to the general field of probabilistic causal
modeling.
II. PROBABILISTIC MODEL
In this section we describe our choice of a probabilistic
model that we felt was appropriate to model the dynamic
nature of mobile networks. We first provide some background
information on Graphical Models and Dynamic Bayesian
Networks. We then describe how we parameterize the model
and explain additional parameters that we introduce to our
binary event implementation.
A. Background
A proper introduction of Graphical Models, Markov Net-
works and Dynamic Bayesian Networks is beyond the scope
of this paper. Graphical models are useful for depicting in-
dependence and dependence relationships between probability
distributions, which is convenient computationally. They are
also used to model how variables interact, where ultimately a
given class of a Graphical Model corresponds to a factorization
property of the joint distribution. We attempt to model the flow
of data using a directed graphical model, which corresponds
to a Bayesian Network. By including the temporal dimension,
we end up with a Dynamic Bayesian Network, which we
parameterize using the Conditional Probability Distribution.
The CPD is time invariant, but other parameters drive the
temporal behavior of the model as we explain next.
B. Model Parameters
We chose to parameterize our network using the binary
event model. In this model, a random variable x represents an
event that can happen according to a conditional probability
distribution.
Fig. 1. Event z is caused by events x and y according to a CPD.

Figure 1 shows a directed graph where event z is caused
by events x and y independently, this means that each causing
event can trigger event z on its own. Consider the diagram as
showing data flowing to node z from node y with probability
0.8 and from node x to z with probability 0.9. This is encoded
in the CPD of Z as follows:
TABLE I
THE CPD FOR EVENT Z.
x y prob
0 0 .0
0 1 .8
1 0 .9
1 1 .98
Consider the set of causes Q = {x, y}, then the rows of
this table represent an entry of the CPD, where each entry
CPD[i] ∈ P(Q). This represents four possible configurations
for event z, (1) none of the links (x, z) nor (y, z) are
transmitting, (2) only (y, z) is transmitting, (3) only (x, z)
is transmitting, and (4) both are transmitting. So we see that
given our binary event model, each event or node in the model
will have a CPD of the size 2|Q|
. In the CPD in Table II-B
only the last parameter needed to be computed, for this we
utilize the Noisy-OR model generalized by Diaz [?] shown in
Equation 1.
p(y ←− c(x0, x1, ..., xn−1)) = 1 −
n−1
i=0
(1 − p(y ←− c(xi)))
(1)
The symbol ←− represents the causal relationship or the
directional flow of data, and the set c(x0, x1, ..., xn−1) is the
set Q.
An event, which represents a physical resource node in our
network simulation, has additional parameters that control its
behavior in our sampling simulation. Here we define these
parameters:
Definition Let Reliability represent the probability of an
event x having the value true given that it is caused by another
event or is explicitly triggered in the simulation.
Explicitly triggering a probabilistic event means that the
modeler specifies that a specific event will happen at a time t
with a probability p. This leads us to the next parameter that
specifies how long this event will persist.
Definition Let Persistence ∈ Z+
be the number of time
units that the event persists at probability p once it is triggered
at that probability.
Whenever an event is triggered, the model may specify
how long the event will last. For example the modeler may
easily specify that once an asset breaks down, it stays broken
until another event causes it to become functional. The next
parameter also relates to the temporal aspect of the model.
Definition Let Continuation represent the probability that
an event x has the value of true at time t given that it had a
probability p of being true at time t − 1.
This parameter allows the modeler to conveniently model
reliability estimates in time given a prior reliability measure-
ment for a fixed time interval t. For example, consider a
resource having a Reliability probability of 0.99 in an interval
t. Now consider the computation of the probability of failure
of resource σ at time interval x, thus we have:
Reliablityσ(intervalx) = 1 −
x
1
Reliabilityσ (2)
If we set x = 9 then at that time interval the probability of
failure becomes 0.086. Although this is a simple computation
for a single parameter, our continuation parameter allows us to
compute this value for every random variable and every time
interval in the model at no additional cost.
We also need to account for the fact that it is not practical or
realistic to model every possible cause for an event, especially
at model fidelity that we aim for, so we introduce a parameter
that accounts for unspecified causes, we call it the Leak.
Definition Let Leak be the probability that an event is caused
with a probability p at every time interval t by one or a set of
unspecified causes besides the set of causes Q.
Finally, each event has a schedule, which allows the event
to be triggered explicitly by the modeler.
Definition Let the Schedule represent a map of values (t →
p) that trigger an event x at time t with probability p.
The schedule gives us the ability to simulate the change in
the source of data flow in time, as we will see in Section IV.
III. SAMPLING ALGORITHM
In this section we outline our sampling algorithms that we
use to compute the marginal posterior probabilities of our
event variables that in turn represent the probability that our
network resources received the data from the source.
A. Generating a DBN
Before we can run our sampling algorithm, we need to
generate a DBN suitable for our sampler. This requires several
steps. We begin with the set of events and topologically sort
them, remember that our model is a Directed Acyclic Graph.
Next we generate the CPDs for each node, that corresponds
to an event, in the DBN by using the technique described
in Section II. Third, we Noisy − OR our leak value with
all values of the CPD, we note that if the value of leak is
zero, it has no effect on the CPD. Finally, we multiply each
parameter of the CPD by the reliability value. Remember,
that before reliability is applied each value in the CPD
reflects the probability that the signal made it to the target
node given that the source node sent it, now the target node
may not be available to receive it and re-transmit it, which is
what we model using the reliability parameter.

B. MCMC Algorithm
Here we present our Datapath MCMC algorithm and de-
scribe some of the computations that we omitted from the
pseudo code in Algorithm III-B for clarity.
Algorithm 1 Datapath MCMC
1: procedure DATAPATHMCMC
2: dbn ← topologically sorted nodes
3: durration ← positive number of time slices
4: samples ← number of samples to simulate
5: sample ← boolean array of length dbn.length
6: counts ← integer array for counts
7: for time t in durration do
8: for sample s in samples do
9: for node n in dbn do
10: prob ← 0.0
11: if n is root node then
12: prob ← leak or schedule[t]
13: else
14: if n is scheduled at t then
15: prob ← schedule[t]
16: else
17: prob ← CPD(state index)
18: if prob ≥ Random then
19: sample[n] ← true
20: counts[n] ← increment by one
21: for node n in dbn do
22: n ← set martinal for time t
One particular calculation that was not specified in the
algorithm but happens at line 17 is the computation of an index
of active parents that is used to obtain the probability value
from the CPD during sampling. This is done as follows: (1)
we sample our variables in topological order, thus all children
will have their parent states set before they are sampled, (2)
we store the variable’s CPD as a power set of its causes,
thus by reading the boolean values of parent states that have
already been set as bits we generate an integer index of active
parents that corresponds to the CPD value index that we need
to Monte Carlo against.
Another aspect omitted from the pseudo code was the use
of the persistence and continuation parameter driving our
temporal simulation, again this was done mostly for clarity,
but for those interested we invite them to reference our source
code that will be made available. To the best of our knowledge,
the majority of the general purpose tools used for Bayesian
modeling use the unrolling method to deal with temporal
simulations, basically taking a snapshot of the model for every
time slice and then sampling each one. This quickly leads to
models that require significant resources to execute, which is
not the case with our approach.
DatapathMCMC is a simple but effective algorithm for pre-
dicting directional reliability of data flow, it does not however
have good convergence properties for performing Bayesian
Inference in models that need to compute the joint posterior
distribution when many observations are made downstream.
This is done by design to address our specific scenario, thus
it is by no means a general purpose method for Bayesian
Inference. For the general case, we would probably select
Gibbs or Metropolis-Hastings [6], or some other variation
of Importance or Rejection sampling. In our case however,
rejecting samples is not necessary as our observations, in the
form of explicit scheduling, are mostly made upstream, or at
root nodes, thus most observations will be sampled first. Doing
the latter could be the next natural step and could lead to some
interesting applications like diagnosing the most likely path
that data took in anycast transmission, or identifying failure
points.
IV. SIMULATOR
In this section we describe our simulator implementation.
We chose to develop the simulator as a browser based
JavaScript application for ease of use for end users and the
convenience of existing visual library, namely D3.js, which
we utilize for network graph creation. We also considered
the ease with which users can experiment with extending this
framework because of the lack of any real configuration or
even by utilizing the variety of online editors like JSFiddle.
A. Network Editor
The graphical network editor is written using the D3.js vec-
tor graphics library. It allows the user to construct a directed
graph where nodes represent network elements that map to
events in our probabilistic model. Directed edges represent
data links and have reliability values that are specified at their
target nodes. These values map to conditional probabilities of
their target random variable in the form P(target|source),
and are used to compute full conditional probability distribu-
tions for each node.
B. Running the Simulation
When a new node is added, its Reliability value defaults to
1.0, so does any link that is added. This allows the modeler to
quickly sketch out a network topology before experimenting
with reliability values and time. Once all the desired values
are specified, the user specifies the duration of the simulation
in abstract time unit and specify the number of samples to
generate. The rule is the more samples we generate the more
accurate our estimate is, but as mentioned in Section III our
algorithm converges very quickly in our intended application.
C. Serial Reliability
Our first simulation consists of running a serial data path
consisting of three nodes. There are several ways to build
simulations using our model; in this simulation we model
node failure rates per time unit, where a time unit can be
any interval of time for which we have prior measured failure
information, for example we know that a particular resource
has an uptime of 99.99% in a given week (independent of any
other resource), the second resource in a serial flow has the
uptime of 99.98% for the same interval, also independent of
other resources, and so on.

Fig. 2. Serial path consisting of three nodes.
Figure 2 shows the setup in our first simulation. In this
model the verticies represent hardware nodes or virtual routers
or servers, while the edges represent connection means, which
could be physical or virtual. To validate our simulation algo-
rithm we kept edge probabilities at 1.0, meaning we are only
accounting for node failure rates. We model this simulation
using our leak parameter, essentially stating that we are
modeling recorded average failure priors for any given interval
t (week). Our leak parameters are set as follows: Let N be the
set of nodes in a serial datapath and their leak probabilities
are (n3 = 0.01, n4 = 0.02, n5 = 0.01), we would like to project
their reliability for the next 15 weeks. Using our closed form
from Equation 2, we can calculate node 3 expected failure
at time 5 to be 0.049. Running our simulation we have the
following:
Fig. 3. Node 3 probability at time step 5.
Figure 3 shows that we are approximating the probability
of failing to deliver data fairly closely. Now we consider the
next node in the series, if we were to compute its failure
to deliver data as an origin or root node, in other words
without any previously accumulated failure causing additional
failure possibility, we can use the same formula to compute its
expected failure at time 5 to be about 0.096%. But we need
to accumulate all the previous expectations, thus we need to
account for node 4 failing to deliver data because it failed to
deliver independently and the possibility that node 3 failed
prior to data traveling to node 4, and this must be done for all
time intervals.
Fig. 4. Node 4 without any causal contribution from Node 3.
Figure 4 shows our approximation for Node 4 at time 5.
This is close to the exact answer computing using Equation
2. Now we activate Node 3 effectively accumulating Node 3
and Node 4 failure probability at Node 4. Figure 5 shows
Fig. 5. Node 4 with a causal contribution from Node 3.
a significantly higher probability of failing to deliver data
at Node 4 at time 5. This is an interesting result that we
would like to investigate further. We interpret this result as
the probability of data delivery failure at time t and node i
given the prior average failure measurements for interval t for
all nodes prior and including node i in a given series.
D. Mobile Hand-off
Our next model involves the scheduling mechanism that
allows us to simulate the switching of data sources during
the simulation. This may be analogous to a mobile customer
connecting to different towers while moving in space or hand-
offs in vehicular clouds. Figure 6 shows a model that simulates
Fig. 6. Node 3 and Node 7 simulate data sources at different times.
two sources of data at different times, essentially simulating
a hand-off. Consider Node 8 to be the final destination and
Nodes 3 and 7 to be gateways nodes for a mobile client.
We simulate the effect of movement in space and time that
involves connecting to a different gateway, which results in
a different path to the final destination. In this model we
planned to simulate data flow reliability using our reliability
and schedule parameters, as opposed to the leak parameter.
The idea here is to specify the reliability value at each node
while each incoming edge represents a conditional reliability
of that connection given that the source node sent the data.
Unfortunately we were not able to complete the simulation in
time given an unexpected bug in the simulator, which we are
still investigating. We expect this to be resolved in the near
term.
V. CONCLUSION
We developed a stochastic, temporal simulation of network
data flow that is easy to use and delivers unique capabilities

such as temporal probability profiles for all model variables,
in contrast to explicit queries on individual variables. We
introduced the capability of simulating connection hand-offs
associated with mobile and vehicular clouds. We developed a
light weight simulation editor that can be extended to support
even more sophisticated simulation scenarios.
A. Current Bugs and Incomplete Features
As mentioned in the previous section, our simulations in-
volving the scheduling parameter need to be debugged, which
we expect to be fairly straight forward. The user interface
needs to be cleaned up and input validation needs to be im-
plemented. The persistence parameter needs to be integrated in
the simulation, it is currently being done using the scheduling
parameter for every time interval. Saving the model locally
works as it was used for debugging purposes, but loading it
back needs to be finished. We need to further investigate our
serial reliability results as well as experiment with random
number generation other than the standard JavaScript random
method.
B. Possible Extensions
We envision two possible extensions that would result
in some novel simulation capabilities. First, when modeling
multiple incoming connections into a node, we utilize the
Noisy-OR model to calculate each link’s contribution to data
reliability assuming independence. Perhaps, there are mea-
surements that we can use as prior information that capture
the synergistic effect of these links being active at the same
time that result in a higher probability than that calculated
using the independence assumption. Currently, there is no
way to specify that value in the simulation, and if there was,
we would need to utilize a different distribution estimation
algorithm, perhaps something similar to Recursive Noisy-OR
[3] proposed by Lemmer and Gossink. Their generalization
of Noisy-OR allows the learned parameters to be specified
explicitly, then the rule propagates these parameters to the
parameters that contain these learned values as their subsets.
For example, given a set of incoming links Q = {x, y, z}, we
have direct measurements θ that when x, y are both active their
reliability contribution to the node is different than the one
computed using the independence assumption, with RNOR
we can incorporate this joint measurement with Q where θ will
be propagated to the set x, y, z unless it is specified explicitly.
The second extension involves introducing the concept of
counter factual reasoning. This involves adding causal influ-
ences that act on their effect when they are absent, or do not
happen. For example, the statement if it does not rain, my car
will probably be dry is one instance of such an interaction. The
technique to model these interactions in binary event models
is referred to as CPD inversion. It was originally suggested
[4] by J. Pearl, but the exact details are beyond the scope of
this article.
REFERENCES
[1] R. Yu, Y. Zhang, S. Gjessing, W. Xia, K. Yang, Toward Cloud-based
Vehicular Networks with Efficient Resource Management. 2013.
[2] F.J. Diez, Parameter adjustment in Bayes networks. The generalized noisy
OR gate. in Proc. 9th Annu. Conf. Uncertainty Artificial Intelligence, San
Mateo, CA, 1993, pp. 99-105.
[3] J.F. Lemmer, D.E. Gossink, Recursive Noisy OR - A Rule for Estimating
Complex Probabilistic Interactions. IEEE Transactions on Systems, Man,
And Cybernetics - Part B:Cybernetics, 2004. 34(6).
[4] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. San Mateo, CA. 1988. Morgan Kaufmann.
[5] J. S. Liu, Monte Carlo Strategies in Scientific Computing. New York, NY.
2004. Springer Science Media LLC.
[6] D. Barber, Bayesian Reasoning and Machine Learning. U.K. 2012.
Cambridge University Press.
[7] N. L. S. Fonseca, R. Boutaba, Cloud Services, Networking, and Manag-
ment. Danvers, MA. 2015. Wiley.

Datapath

More Related Content

What's hot (16)

Viewers also liked (17)

Similar to Datapath (20)

Datapath