当前位置:首页 >> >> Dependability Analysis of Distributed Computer Systems with Imperfect Coverage

Dependability Analysis of Distributed Computer Systems with Imperfect Coverage

Dependability Analysis of Distributed Computer Systems with Imperfect Coverage
Xinyu Zang, Hairong Sun and Kishor S. Trivedi fxzang, hairong, kst@ee.duke.edug Center for Advanced Computing and Communications Department of Electrical and Computer Engineering Duke University Durham, NC 27708
Abstract

In this paper, a new algorithm based on Binary Decision Diagram (BDD) for dependability analysis of distributed computer systems (DCS) with imperfect coverage is proposed. Minimum le spanning trees (MFST) are generated and stored via BDD maniplution. By using the multistate concepts, the algorithms for BDD generation and evaluation that can deal with imperfect coverage are given. Due to the nature of BDD, the sum of disjoint products (SDP) can be implicitly represented by BDD, which avoids huge storage and high computation complexity for large system. Several examples are given to show the e ciency of this algorithm. Index Terms: Distributed Computer System (DCS), Minimum File Spanning Tree (MFST), Imperfect Coverage, Multistate Component, Reliability Evaluation, Binary Decision Diagram (BDD)

This research was supported in part by the National Science Foundation under Grant No. EEC9418765, and by the Department of Defense as an enhancement project to the Center for Advanced Computing and Communications in Duke University.

1

1 Introduction
With the rapid development of computer networking, the distributed computer systems (DCS) become a more and more attract and e cient way to share system resources, achieve fault-tolerance and obtain high extensibility and dependability. Normally, in a DCS, a successful excution of a distributed program usually requires one or more of resources that reside on hosts of DCS, e.g. processing elements, memory modules, data les and so on. These hosts are interconnected via communication links which enable a running distributed program to access the resources on remote hosts. In 10], Kumar et al. model a DCS to an undirected graph G(v, e) in which the vertices represent the hosts and the edges represent the communication links. Fig. 1 shows a model of a DCS. Both vertices and edges are in either operational or failed states and their behaviors are independent. The system resources, except for processing elements, required by a program are simpli ed to les. A le spanning tree (FST) was de ned as a spanning tree that connects the root node (the host that runs the program under consideration) to some other nodes such that its vertices hold all the required les for executing that program. If all FSTs that are supersets of some other FSTs are removed from the FTS set, the remian part of the set make up the set of minimal le spanning tree (MFST), i.e. a FST is a MFST if there exists no other FST that is subset of this FST. For instance, consider PRG 1 in Fig. 1 can be working if it can run on node n1 or n4 , and can access les fF1 , F2 , F3 g, the MFSTs of program P1 in Fig. 1 are: 1. fn1 , e1 , n2 g 2. fn1 , e4 , n3 , e3 , n2 g 3. fn4 , e5 , n3 g 2

FA: F 3 , F 4 PRG: P4 n2 e1 FA: F 1 , F 2 PRG: P1 , P2 n1 e4 n3 FA: F 2 , F 4 PRG: P2 , P3 e3 e5 e2 n4 FA: F 1 , F 3 PRG: P1 , P3

Figure 1: A four-node DCS 4. fn4 , e2 , n2 , e3 , n3 g Based on these, two reliability measures, distributed program reliability (DPR) and distributed system reliability (DSR), were de ned as:

DPR = Pr at least one MFST of a given program is operational]
= Pr(
nmfst j =1

MFSTj )

where nmfst is the number of MFST that run the given program.

DPR = Pr at least one MFST for all programs is operational]
= Pr(
nmfst i=1

MFSTi )

where nmfst is the number of MFST that run all programs. The algorithm to obtain the MFSTs is also given in 10]. After obtaining the MFSTs, disjoint method is applied to obtain the probability expression, i.e. sum of disjoint products (SDP). 3

To compute the DPR of DCS e ciently, Kumar et al. 9] presented an algorithm that can obtain the probability expression directly during generation of MFST, i.e., in one step. In 8], Kumar and Agrawal generalized this algorithm so that it can deal with the case in which one program can run on multiple nodes. Ke and Wang 7] gave an improved algorithm that addressed the imperfect nodes. All these algorithms, however, can not be used directly to obtain the DSR of DCS. The another issue that above algorithms did not consider is imperfect coverage. Generally, the hosts in DCS can be located at di erent geographic sites. There exists a possibility that some failures of hosts and communication links may not be detected promptly and the redundant units (the other hosts and links) can not be utilized so that the distributed program can not be executed successfully. Dugan and Trivedi addressed this issue in 6]. Doyle et al. introduced the coverage factor to combinatorial models in 5]. To address the imperfect coverage in DCS, Lopez-Benitez used Petri net in 11]. As the other Markov based model, this method become impractical when the number of hosts and links in DCS is large. In this paper, we use the multistate method developed in 17] to deal with the imperfect coverage. Binary Decision Diagram (BDD) was, at rst, used in VLSI design and veri cation as an e cient method to manipulate the Boolean expression 2, 1]. Bryant 2] and other researches showed that, in most cases, BDDs use less memory to represent large Boolean expressions than representing them explicitly. Because BDDs are based on Shannon's decomposition, reliability evaluation is very easily obtained from BDD format. Some researchers have already used BDD to do reliability analysis for fault tree 14, 3, 15, 16, 4]. In this paper, a new algorithm based on BDD is proposed for dependability analysis of DCS with imperfect coverage. The BDD for a program can be generated by searching 4

MFSTs of this given program. The BDD for DSR can be obtain by logic operation between the BDD for each program. Then using multistate concepts, we will convert these BDDs to multistate BDDs which can deal with imperfect coverage. The nal BDD for a program and whole system can implicitly represent the SDPs, avoiding the huge storage for large number of SDPs, and easily be evaluated to obtain the unreliability. Ordering strategy in BDD will be discussed in this paper as well, because the size of BDD heavily depends on this order. Several ordering strategy will be provided and compared. Sensitivity analysis will be made for DCS with imperfect coverage. The paper is organized as follows. Section 2 presents some preliminary concepts of BDD. Section 3 reviews the coverage model and multistate model. Section 4 gives the description of the algorithm. Some examples are provided in Section 5 to show the e ciency of the algorithm. The last section gives the conclusion and future work.

2 Binary Decision Diagrams (BDD)
In 2], Bryant gave the basic de nitions for BDD (also known as function graph). A subset of general BDD, reduced ordered binary decision diagrams (ROBDD) were introduced as e cient means to manipulate the Boolean expressions. 14, 3, 15] applied BDD to reliability evaluation for fault trees. Here we will review some basic concepts of BDD.

2.1 Shannon's decomposition and ite format
Theorem 1 Shannon's Decomposition: let f be a Boolean expression on X , and x be a
variable of X , then,

f = x fx=1 + x fx=0
where f evaluated in x = v is denoted by fx=v .

(1)

5

Shannon's decomposition is the basis of using BDD. In order to express Shannon's decomposition concisely, the If-Then-Else (ite) format is de ned as:

f = ite(x; F1 ; F2 ) = x F1 + x F2
where F1 = fx=1 and F2 = fx=0 .

(2)

2.2 BDD
A BDD is a directed acyclic graph (DAG) that is based on Shannon's decomposition. The graph has two sink nodes, labeled 0 and 1, representing the two corresponding constant 0 and 1. Each non-sink node is labeled with a Boolean variable x and has two outgoing edges that represent the two corresponding expressions in the Shannon's decomposition. These two edges are called 0-edge (or else-edge) and 1-edge (or then-edge). The node linked by 1-edge represents the Boolean expression when x = 1, i.e., fx=1 in Eqn. (1), while the node linked by 0-edge represents the Boolean expression when x = 0, i.e., fx=0 in Eqn. (1). Thus, each non-sink node in BDD encodes an ite format. Obviously, one of the key feature of BDD is the disjoint, nature of the two subexpressions. An ordered binary decision diagram (OBDD) is a BDD with the constraint that the variables are ordered and every source to sink path in the OBDD visits the variables in ascending order. A reduced ordered binary decision diagram (ROBDD) is an OBDD where each node represents a distinct Boolean expression. In practice, ROBDDs are widely used. Actually, to generate a ROBDD, the ordering of the variables has to be selected rst and this order of variables is not changed during the generation1 . In this paper, we denote xi < xj as variable xj is behind variable xi in the order of variables. Fig. 2 shows the ROBDD for several Boolean expressions.
1

We do not consider the dynamic reordering of BDD in this paper

6

a
0 1 0

a
1

G1

H2

b
0 1 0

b
1

G2

H1

c
0 1

c
1 0

0

1

0

1

(a) g = a c + b

c

(b) h = a b + c

Figure 2: BDD representation of Boolean expressions

2.3 Manipulation of BDDs
BDDs represent Boolean expressions graphically. The manipulation of BDDs using logical operations is very easy. For instance, consider a logic operation AND on two Boolean expressions g and h. We rst generate two BDDs for g and h respectively using the same ordering of variables. We assume that the rst variable is common to g and h. Let this common variable be denoted by x,. Using Eqn. (2), the ite formats of the expressions are

g = ite(x; gx=1 ; gx=0) = ite(x; G1 ; G2) h = ite(x; hx=1 ; hx=0 ) = ite(x; H1 ; H2)
The g + h represented by ite format will be:

ite(x; G1 ; G2) + ite(x; H1 ; H2) = g + h
7

a
0 1

G1+H1

G2+H2

a
0 1

b
0 1 0

b
1

REDUCE
0

b
1

0+H1

G2+H1

H1+G2

1+G2

c
1 0

c
0 1 0

c
1

0 1

1

0

1

0

1

Figure 3: OR operation of two BDDs

8

= ite(x; (g + h)x=1 ; (g + h)x=0 ) = ite(x; (gx=1 + hx=1 ); (gx=0 + hx=0 )) = ite(x; (G1 + H1 ); (G2 + H2 )) (3) The recursive method can be used for G1 + H1 and G2 + H2 till one of them becomes a constant expression, 0 or 1, or they do not have the rst variable in common. Next we assume h does not have variable x, but has variable y, and x < y in order of variables. The ite formats of the expressions are

g = ite(x; gx=1 ; gx=0) = ite(x; G1 ; G2) h = ite(y; hy=1 ; hy=0 ) = ite(y; H1 ; H2 )
The g + h represented by ite format will be:

ite(x; G1 ; G2 ) + ite(y; H1 ; H2 ) = g + h
= ite(x; (g + h)x=1 ; (g + h)x=0 ) = ite(x; (gx=1 + h); (gx=0 + h)) = ite(x; (G1 + h); (G2 + h)) (4) In the above example, we used OR operation, but any other logical operation can be used and the only di erence is to use the di erent truth tables when one of the operands becomes constant expression. Actually, in practice, the BDD is generated by using logical operation on variables rather than using Shannon's decomposition directly. Fig. 3 shows the OR operation of two Boolean expressions in Fig. 2. Note that two reductions are made during operation: 9

Because the results of 0 + H 1 and G2 + H 1 are same, the node b at left becomes irrelevant and can be reduced. Because the 0-edge of the node b at right is linked to same subtree as the 0-edge of the node a does (the node b at left is reduced), these two subtrees can be reduced to one.

3 Coverage model and multistate model
In this section, we will review some basic concepts of coverage model and multistate model, and present how to use multistate model to deal with imperfect coverage.

3.1 Coverage model
Generally, the reliability of systems will be improved via adding redundant components. However, it must be sure that the failure of a component can be detected promptly, so that the redundant components can be utilized. Otherwise, the system may still be out of function even there are many redundant components that can replace the failed component 6]. A parameter called coverage was introduced to re ect the ability of the system to automatically recover from the occurrence of a fault during normal system operation. The coverage is de ned as: coverage = Pr system recovers j fault occurs] In a DCS, faults may occur at hosts and communication links. Some of them may not be detected promptly since the hosts and communication links may be distributed in a wide area. These undetectable faults will cause the distributed program to fail. Hence, the imperfect coverage need be considered in a practical system. 10

(1-c) λ

2

undetectable failure

operation

1 cλ

3

detectable failure

Figure 4: Markov chain for imperfect coverage Because faults of components (hosts and communication links) should be divided into detectable faults and undetectable faults, only one failure mode of components is not enough. In this paper, each component will have two failure modes corresponding to detectable faults and undetectable faults. Fig. 4 shows a Markov chain representing the behaivor of a component.

3.2 Multistate model
In many combinatorial reliability models, components are assumed to be in one of two states: function or failure. However, in some application, more than two states need to be considered. Actually, the components in DCS with imperfect coverage may exist in one of three states: 1. operational state; 2. undetectable failed state that will cause system down; 3. detectable failed state that can not cause system down. In 17], we developed a BDD-based algorithm to solve the multistate problems. Each state of the multistate components is represented by a Boolean variable. Because of the 11

A2
0 1

A1
0 1

0

A3

1

0

1

0

1

Figure 5: BDD format of the equivalence A1 = A2 A3 dependence among these Boolean variables, a Boolean algebra with restrictions on variables is used to address this dependence. In this paper, we will use the same concepts to deal with imperfect coverage in DCS, but will use the simpli ed method since there are only three states in coverage model. Let A be a component (host or link), the three states, operational state, undetectable failed state and detectable failed state can be represented by three Boolean variables A1 , A2 and A3 respectivly. Ai = 1, i = 1; 2; 3, means that component A is in the corresponding state. According to the restriction rule of Boolean algebra with restrictions on variables,

A1 = A2 + A3 = A2 A3
This equivalence can be represented by BDD format in Fig. 5

(5)

4 BDD algorithm for DCS
4.1 Generation of BDD
We will use two step to generate a BDD for DCS with imperfect coverage: 1. Generate ordinary BDDs by seaching MFST for each program, then use BDD operation AND to combine these BDDs to a BDD for whole system. 12

bdd_gen(p_id) { mfst_bdd = 0 for(node_i in the set of nodes at which p_id resides) { mfst_bdd = mfst_bdd OR bdd_gen_aux(p_id, node_i) disable all edges connected to node_i } enable back all edge_i that be disabled in above loop } bdd_gen_aux(p_id, node) { T_bdd = 0 set node in this_mfst for (edge_i in the set of edges starting from all nodes in this_mfst) { disable edge_i next_node = the other end of edge_i if (next_node is already in this_mfst) continue; if (next_node and this_mfst contain all the required files) submfst_bdd = edge_i_bdd else submfst_bdd = bdd_gen_aux(p_id, next_node) AND edge_i_bdd T_bdd = T_bdd OR submfst_bdd } T_bdd = T_bdd AND node_bdd clear node in this_mfst enable back all edge_i that be disabled in above loop return T_bdd }

Figure 6: Algorithm for generating a BDD for a given progrm

13

2. Covert this BDD for whole DCS to a multistate BDD that can re ect the imperfect coverage. If only DPR is needed, this covertion may apply only to the BDD for a given program. Fig. 6 shows the algorithm for generating a BDD for a given progrm. This algorithm is adopted from 8] and |citeKW97. The function of bdd gen aux(p id, node) is actually to seach the set of MFST for a given program from a given node via recursive method. We use Boolean variable Ei to represent edge ei , and Ni to node ni. For the program P1 running on n1 in Fig. 1, the BDD obtained from bdd gen aux(p id, node) represents the Boolean expression: N1 (E1 N2 + E4 N3(E5 N4 + E3 + N2 )) (6) Fig. 7 shows the BDD generated for distributed program P1 in Fig. 1. The BDD generated above is a ordinary BDD that does not contain coverage. To incorporate imperfect coverage, the following two steps will be applied to the ordinary BDD: 1. Replace all nodes except sink nodes in ordinary BDD by their corresponding equivelent nodes shown in Fig. 5, i.e. for node that represent component A, use A2 A3 to replace the A1 . 2. Use BDD operation AND to merge the variables that represent the undetectable fault of components. Note that only the components that are contained in MFSTs will be considered. After these two steps, the nal BDD is able to deal with the imperfect coverage. Fig. 8 shows the nal BDD that is coverted from Fig. 7. Observing Fig. 8, we see that the nodes representing state 3 of the components have only one parent, i.e. the variable for state 2 whose 0-edge always connects to sink node 0. Based on this fact, we can remove the nodes whose 1-edge connects to the variable for state 3 of components. Actually, in our implementation of the algorithm, step 1 of conversion is just to replace the nodes of orignal BDD by nodes representing state 3 of the components.

14

N1
0 1

N2
0 1

N2
1

0

E1
1

0

E4

1

N3
0

1

0

N3

10

N3
1

N4
0 1

0

N4

1

N4
0 1

0

E5

10

E5

1 0

E5
1

0

E3

1 0

E3

1

0

E2 1

0

1

Figure 7: BDD for distributed program P1 in Fig. 1

15

N1,2
0 1

N1,3
0 1

N2,2
0 1 0

N2,2
1

0

N2,3
0 1 0

N2,3
1

E1,2
0 1 0

E1,2
0
1 0

E1,2
1

0

E1,3
0 1

E4,2
0 1 0

E4,2
1 0

E4,2
1 0

E4,2
1

0

0
0

0

E4,3
1

N3,2
0 10

N3,2
1 0

N3,2
10

N3,2
1

0

0

0

N3,3
0 10

N3,3
1 0

N3,3
1

0

0

N4,2
0 1 0

N4,2
1 0

N4,2
1 0

N4,2
1

0

0

0

N4,3
0 1 0

N4,3
1 0

N4,3
1

0

E5,2
0 1 0

E5,2
1 0

E5,2
1 0

E5,2
0

E5,2
0
1

0
0 0

0

0
0

E5,3 1 E5,3 1 E3,2
1 0

E5,3
1

0

E3,2
0
1 0 1 0

E3,2
0
1 1

0

E3,3

E3,3

0

E2,2
1

0

E2,2
1

0

E2,3 1

0

1

Figure 8: Final BDD for distributed program P1 in Fig. 1 16

4.2 Unreliability evaluation
Because the Boolean variables that represent the two fault states of the same component are no longer independent with each other, the algorithm for ordinary BDD can not be used to evaluate the BDD with dependence. In 17], we have developed an evaluation algorithm to deal with this dependence. This algorithm can be simpli ed here because there are only two Boolean variables for each component. Observing the BDD in Fig. 8, we see that the 0-edge always links two variables that belong to di erent components. But for the 1-edge, there are two cases which need to be evaluated di erently: the 1-edge linking the variables of di erent components; the 1-edge linking the variables of the same component.

Lemma 1 Let BDD G be
G = ite(X; G1 ; G2) = X G1 + X G2
and then

G1 = ite(Y ; H1 ; H2) = Y H1 + Y H2
(

X; Y are di erent states of the same component P (G) = P (G1) + P (X )(1 ? P (H1)) P (G1) + P (X )(P (G2 ) ? P (G1)) otherwise
(7)

Proof: First consider the case in which the 1-edge linking the variables of di erent comP (G) = P (ite(X; G1; G2 )) = P (X G 1 + X G 2 ) = P (X )P (G1 ) + P (X )P (G2 ) = P (G1 ) + P (X )(P (G2 ) ? P (G1 ))
17

ponents, it is the same as the ordinary BDD. Normal evaluation method for BDD can be used as:

(8)

Next consider the case in which the 1-edge linking the variables of the same component, we assume that X and Y represent the di erent states of component A, i.e. X = A2 , Y = A3 ,

P (G) = P (ite(X; G1; G2)) = P (X G 1 + X G 2 ) = P (X (barY H1 + Y H2 )) + P (X )P (G2 ) = P (X Y H1 + XY H2 ) + P (X )P (G2 )

(9)

According to the restriction relation among the variables associated with the same component, XY = A2 A3 = A3 = Y

Y = A3 = A 1 + A2 = A2 A3 + A 2 = XY + X
we can obtain

P (G1) = P (Y H1 + Y H2) = P ((X Y + X ) Y H1 + X Y H2 ) = P (X Y H1 + XY H2 ) + P (XH1 ) = P (XY H1 + X Y H2 ) + P (X )P (H1 )

(10)

Becaue H1 does not have variables that represent the states of the same component as X does, P (XH1 ) = P (X )P (H1 ). Therefore,

P (XY H1 + X Y H2) = P (G1) ? P (X )P (H1 )

(11)

At this case, because undetectable faults always cause system failure, P (G2 ) = 1. This result can also be observed from Fig. 8, all 0-edges of Boolean variables that represent the 18

undetectable faults of components are connected to sink node 0. Substituting Exp. (11) to Exp. (9), we have

P (G) = P (G1) ? P (X )P (H1 ) + P (X ) = P (G1 ) + P (X )(1 ? P (H1 )) 2

(12)

Using recursive method, we can calculate P (G1 ), P (G2 ) and P (H1 ) until reaching the sink node, i.e. Gi = 1 or Gi = 0:

Gi = 0, implies the system or subsystem represented by Gi is always down, hence P (Gi) = 1. Gi = 1, implies the system or subsystem represented by Gi is always up, hence P (Gi) = 0.
In our implementation, all nodes whose 1-edge connects to variable representing state 3 of components are removed. The Lemma 1 can be changed to

Lemma 2 Let BDD G be
G = ite(X; G1 ; G2) = X G1 + X G2
then

P (G) = (1 ? Cx)P (X ) + (1 ? P (X ))P (G1 ) + CxP (X )P (G2 ) X is variable for state 3 P (G1) + (1 ? Cx)P (X )(P (G2 ) ? P (G1)) otherwise
where Cx is the coverage of the component.

(

(13)

Fig. 9 give the algorithm for evaluation of unreliability.

4.3 Sensitivity analysis
The purpose of the sensitivity analysis(also known as importance measure) is to obtain the information concerning a component's contribution to the system reliability, which 19

Prob(G) { if (G == 1) return 0 else if (G == 1) return 0 else if (computed-table has entry {G, P_G}) return P_G else { /* G = ite(x, G1, G2) */ P_G1 = Prob(G1), P_G2 = Prob(G2) if (x is variables for state 3 of component) P_G = (1 - Cx) * P(x) + (1 - P(x)) * P_G1 + Cx * P(x) * P_G2 else P_G = P_G1 + (1 - Cx) * P(x) * (P_G2 - P_G1)) } insert_computed_table({G, P_G}) return P_G }

Figure 9: Algorithm for unreliability evaluating of DCS can be immensely used in system design, failure diagnosis and system failure probability minimization. The de nition of sensitivity analysis of event in fault tree was given in 13]. In this paper, we use the same de nition for sensitivity analysis and consider the imperfect coverage as wel, and obtain three types of measures for DCS.

4.3.1 Birnbaum's importance measure
The Birnbaum's measure of importance provides the probability that the system is in a critical state with respect to component k and that the failure of component k will then cause the system to fail. It is de ned as the partial derivative of system unreliability with respect to the failure probability of component k, i.e.
B d Ik (t) = @Fsys(t) = P r(1k ; x) = 1] ? P r(0k ; x) = 1] @F (t) k

(14)

20

where r(x) is the system structure function as a function of state vector (x1 ; x2 ; xi = 1 means the component xi has failed, and d (1k ; x) = (x1 ; x2 ; d (0k ; x) = (x1 ; x2 ;

; xn),

; xk?1; 1; xk+1 ; ; xn ) ; xk?1; 0; xk+1 ; ; xn )

r(x) = 1 represents the system has failed. Fsys(t) and Fk (t) are the failure function of the system and component k, this failure function contains both detectable and undetectable
faults of component. Because there are two types of faults that will cause di erent system behaviore, P r(1k ; x) = 1] should include the probability of system failure caused by both faults. There are two ways to calculate the Birnbaum's importance measure. One is directly using the de nition formula Eqn. (14): evaluating the BDD by setting component failure and function, then the di erence of these two evaluations is the Birnbaum's importance measure. This method need to traverse the BDD twice. The alternative method is using the algorithm shown in Fig. 10, and only need to traverse the BDD once.

4.3.2 Criticality importance measure
The criticality importance measure gives the probability of a component k being responsible for system failure before time t. It is de ned as

t) t) CR d B Ik (t) = @Fsys(t) FFk ((t) = Ik (t) FFk ((t) @Fk (t) sys sys

(15)

In order to calculate the criticality importance measure, it only need to follow the Eqn. B (15): use the algorithm shown in Fig. 10 to obtain the Ik (t) rst, then multiply FFk (t()t) . sys 21

CrProb(F, xk, flag) { /* F = ite(x, F1, F2) */ if (F == 1) return 0 else if (F == 0) { if (flag == 0) return 0 else return 1 } else if ((x is at behind of xk) and (flag == 0)) return 0 else if (cpomupted-table has entry {(flag, xk, F),R}) return R else if (x is the varibale for component xk) { P_F1 = CrProb(F1, xk, 1), P_F2 = CrProb(F2, xk, 1) if (x is the variables for state $3$ of component) P_F = (1 - C_xk) + C_xK * P_F2 - P_F1 else P_F = P_F2 - P_F1 } else { P_F1 = CrProb(F1, xk, flag), P_F2 = CrProb(F2, xk, flag) if (x is the variables for state 3 of same component) P_F = (1 - Cx) * P(x) + (1 - P(x)) * P_F1 + Cx * P(x) * P_F2 else P_F = P_F1 + (1 - Cx) * P(x) * (P_F2 - P_F1) } insert_computed_table({(flag, xk, F), P_F}) return P_F }

Figure 10: Algorithm for Birnbaum's importance measure

22

4.3.3 Structural importance measure
The structural importance measure allow us to consider the relative importance of various components when only the structure of the system is known. It is de ned as:
ST d Ik = 2n1?1

X
x

r(1k ; x) ? r(0k ; x)]

(16)

where r(x) is the system structure function. Eqn. (16) gives the de nition of structural importance but it is hard to use this formula to compute the structural importance measure because there are 2n?1 state vectors have to be evaluated. An alternative method is to assign a failure probability of 0.5 for all components xi (i = 1; 2; : : : ; n and i 6= k), and then use the algorithm shown in Fig. 10. The proof of this method was given in 13].

4.3.4 An example
The Table 1 shows the importance measures for the reliability graph in Fig. 1

4.4 Ordering strategies
The order of variables is very important for BDD generation. The size of BDD (the number of nodes in BDD) heavily depends on the order. But the problem of computing an ordering that minimizes the size of BDD is itself a co NP-complete problem. The previous study showed that a set of heuristics may be used to select an adequate ordering 2, 12], however all these heuristics are proposed for digital circuits that consist of logic gates and there is no similarity between DCS and logic circuits. New ordering strategies are needed for DCS. Observing the Eqn. 6, we see that this expression can be represented by a fault tree, so that some ordering heuristics developed for fault tree may be adopted for DCS. Insipired 23

Comp.

Failure Prob. 0.015

n1 n2 n3 n4 e1 e2 e3 e4 e5

0.015

0.015

0.015

0.010

0.015

0.020

0.010

0.015

Coverage 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99

Birnbaum's Impt. 1.22612129e-01 7.63365391e-02 3.90352529e-02 1.33968564e-01 8.90025201e-02 5.28013095e-02 1.30003197e-01 8.45710274e-02 4.79772126e-02 1.22401424e-01 7.60884221e-02 3.87541783e-02 1.10933942e-01 6.33451243e-02 2.49204652e-02 9.89085127e-02 4.98381101e-02 1.01819035e-02 9.91641656e-02 5.00930063e-02 1.04567944e-02 9.88082058e-02 4.97659544e-02 1.01138966e-02 1.10977738e-01 6.33433474e-02 2.48986107e-02

Criticality Impt. 1.31928966e-01 1.50313836e-01 2.30713924e-01 1.44148334e-01 1.75254346e-01 3.12076813e-01 1.39881654e-01 1.66528319e-01 2.83564474e-01 1.31702251e-01 1.49825271e-01 2.29052661e-01 7.95755983e-02 8.31550112e-02 9.81932635e-02 1.06424201e-01 9.81359335e-02 6.01791131e-02 1.42265706e-01 1.31517131e-01 8.24051046e-02 7.08775147e-02 6.53292348e-02 3.98514435e-02 1.19410521e-01 1.24729018e-01 1.47160726e-01

Structural Impt. 1.75511641e-01 2.01533201e-01 2.24437575e-01 1.98356172e-01 2.28343796e-01 2.54760184e-01 1.98356172e-01 2.28343796e-01 2.54760184e-01 1.75511641e-01 2.01533201e-01 2.24437575e-01 1.46971797e-01 1.68024363e-01 1.86534505e-01 3.31218672e-02 2.75454344e-02 2.13494383e-02 4.96382734e-02 4.73052489e-02 4.40150165e-02 3.31218672e-02 2.75454344e-02 2.13494383e-02 1.46971797e-01 1.68024363e-01 1.86534505e-01

Table 1: Sensitivity analysis of DPR of P1 in Fig. 1

24

by this relationship between fault tree and DCS, we design the ordering strategie for DCS. The Eqn. 6 actually gave trace of seaching the set of MFST. To avoid travsing the graph twice, we order the components (hosts or links) during which we use the algorithm shown in Fig. 6 to obtain the MFST, i.e. we put the components in the ordering list if we meet a new variable, or need the variable representing this component in BDD manipulation and this variables has not been in the list yet. There are two strategies for putting the varibales in ordering list. Strategy 1: always put at the head of the list so that the list serves as a stack. We push the varibale in the stack if we need this variable in BDD manipulation. Strategy 2: always put at the end of the list so that the list serves as a queue. We add the variable at the end of the queue if this is a new variable. For the example in Fig. 1, two orders are generated as: Strategy 1: e2 < n1 < e4 < n3 < e3 < n4 < e5 < n2 < e1 Strategy 2: n1 < n2 < e1 < e4 < n3 < n4 < e5 < e3 < e2

5 Examples
5.1 E ect of coverage
Example: Fig. 11 shows the system topology of a DCS. The program P1 need les fF1 , F2 , F3 g and P2 need les fF4 , F5 , F6 g. The failure rate of hosts is 0:01, and the failure
rate of communication links is 0:015. Fig. 12 shows the results of reliability of DCS with di erent coverage. 25

FA: F 2 , F 5 FA: F 1 , F 6 e1 FA: F 1 , F 5 PRG: P1 n1 e6 n3 FA: F 3 , F 4 PRG: P2 e7 n5 FA: F 3 , F 6 n2 e2 PRG: P2 n4 e3 e4 e5 e8 n6 FA: F 1 , F 4 PRG: P1

Figure 11: System topology of a DCS

5.2 Experimental results
The execution time of our algorithm is depend on system topology, distribution of programs and les and ordering of variables. We use the same benchmarks given in 7]. Please refer to appdix or 7] for detail of these benchmarks. Table 2 shows the experimental results for these benchmarks. Observing these results, we can obtain some feature of BDD algorithm: In most cases, the size of BDD is not very large if an appropriate order is used. The size of BDD heavily depends on the ordering of the variables.

6 Conclusion
We presented a BDD-based algorithm for dependability analysis of distributed computer system with imperfect coverage. Several related issues, such as generation of BDD for 26

1 0.9 0.8 0.7 Reliability 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 Time 60 80 100 c = 1.00 c = 0.95 c = 0.90 c = 0.85

Figure 12: System topology of a DCS
Network G8:4 G10:4 G8:6 G8:7 G10:7 G8:8 G10:8 G10:9 BDD (O1) # of Nodes Time(s) 56 0.06 72 0.06 124 0.07 269 0.10 309 0.12 725 0.23 891 0.36 2503 2.10 BDD (O2) # of Nodes Time(s) 65 0.06 73 0.06 345 0.10 1433 0.25 1529 0.38 3952 0.74 6571 1.83 28473 16.62

Table 2: Experimental results 27

systems or a given program, evaluation of unreliability, sensitivity analysis and ordering strategies, were discussed. Due to the nature of BDD, this algorithm is e cient in both computation and storage, which makes it possible for us to study some practical and large distributed systems.

Appdix
Gi:j is the benchmark network used in the expreiments, j i, where i is the number of nodes in the network, and j means G with n1 to nj being completely connected. Fig. 13 shows the benchmark network G8:6 . The program is located at n1 , and les F1 , F3 , F5 are
required to excuted it. Table 3 gives the le distribution.
Node Name n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 Files F1 , F2 , F3 F2 , F3 , F4 F3 , F4 , F5 F4 , F5 , F6 F5 , F6 , F7 F6 , F7 , F8 F1 , F7 , F8 F1 , F2 , F8 F3 , F7 , F8 F1 , F4 , F7

Table 3: File distribution

28

n1 n8

n2 n3

n7 n6 n5

n4

Figure 13: Benchmark network G8:6

References
1] K. Brace, R. Rudell, and R. Bryant. E cient implementation of a bdd package. Proc. 27th ACM/IEEE Design Automation Conference, pages 40{45, 1990. 2] R. Bryant. Graph based algorithms for boolean function manipulation. IEEE Transactions on Computer, 35(8):677{691, 1987. 3] O. Coudert and J.C. Madre. Metaprime: An interactive fault-tree analyzer. IEEE Transactions on Reliability, 43(1):121{127, 1994. 4] S.A. Doyle and J.B. Dugan. Dependability assessment using binary decision diagrams. Proc. 25th International Symposium on Fault-Tolerant Computing, pages 249{258, 1995. 5] S.A. Doyle, J.B. Dugan, and M. Boyd. Combinatorial-models and coverage: a binary decision diagram (bdd) approach. Proc. 1995 Annual Reliability and Maintainability Symposium, pages 82{89, 1995. 29

6] J.B. Dugan and K.S. Trivedi. Coverage modeling for dependability analysis of faulttolerant systems. IEEE Transactions on Computers, 38(6):775{787, 1989. 7] W. Ke and S. Wang. Reliability evaluation for distributed computing networks with imperfect nodes. IEEE Transactions on Reliability, 46(3):342{349, 1997. 8] A. Kumar and D.P. Agrawal. A generalized algorithm for evaluating distributedprogram reliability. IEEE Transactions on Reliability, 42(3):416{426, 1993. 9] A. Kumar, A. Rai, and D.P. Agrawal. On computer communication network reliability under program execution constraints. IEEE Journal on Selected Areas in Communications, 6(8):1393{1399, 1988. 10] V.K.P. Kumar, S. Hariri, and C.S. Raghavendra. Distributed program reliability analysis. IEEE Transactions on Software Engineering, SE-12(1):42{50, 1986. 11] N. Lopez-Benitez. Dependability modeling and analysis of distributed programs. IEEE Transactions on Software Engineering, 20(5):345{352, 1994. 12] M.Bouissou. An ordering heuristic for building binary decision diagrams from faulttrees. Proc. 1996 Annual Reliability and Maintainability Symposium, pages 208{214, 1996. 13] K.B. Misra. Reliability Analysis and Prediction: A Methodology Oriented Treatment. Elsevier, Amsterdam, The Netherlands, 1992. 14] A. Rauzy. New algorithms for fault tree analysis. Reliability Engineering and System Safety, 40:203{211, 1993.

30

15] R.M. Sinnamon and J.D. Andrews. Improved accuracy in quantitative fault tree analysis. Quality and Reliability Engineering International, 13:285{292, 1997. 16] R.M. Sinnamon and J.D. Andrews. Improved e ciency in qualitative fault tree analysis. Quality and Reliability Engineering International, 13:293{298, 1997. 17] X. Zang, H. Sun, and K.S. Trivedi. A bdd-based algorithm for analysis of multistate systems with multistate components. Technical Report, 1998.

31


更多相关文档:

Distributed computing with imperfect randomness_免....pdf

Dependability Analysis o... 暂无评价 31页 免费... Distributed Computing with Imperfect Randomness Sha...computer science including probabilistic algorithms, ...

...a high-quality software tool for fault tree analysis.pdf

Thus, a system-level dependability analysis tool ...incorporate the important notion of imperfect ...distributed, we cannot provide an exact solution ...

...simulation Environment for Computer Control Systems.pdf

system timing and dependability, and hence, impact...The computer system is typically composed of a ...analysis of distributed real-time control systems....

...Analysis of Distributed Physical Systems with Ap....pdf

Dependability Analysis o... 暂无评价 31页 免费 Timed Automata with Data.... Proc. AAAI-98 Qualitative Analysis of Distributed Physical Systems with ...

Analysis of RAKE reception with MRC and imperfect weight ....pdf

imperfect weight estimation for binary coherent ort...Analysis of MRC with Gaussian distributed estimation...system over a fading channel with L branches, ...

RENEW A tool for fast and efficient implementation of check....pdf

W. Kent Fuchs School of Electrical and Computer ...distributed systems can be divided into a small ...dependability analysis of user-de?ned fault-...

...in Partitionable Asynchronous Distributed Systems.pdf

Functional Analysis of Concurrent Systems with EMPA... Partitionable Asynchronous Distributed Systems, A....with dependability requirements has also increased. ...

ICSE 2002 Workshop on Architecting Dependable Systems.pdf

computer systems since everyday life improve its ... what are the architectural distributed systems, ...performance and dependability analysis of the ...

Dependability Evaluation of Fault Tolerant Distributed ....pdf

Tolerant Distributed Industrial Control Systems 1_...In this paper we study the dependability of a ... Dependability modelling and analysis of complex nd...

Dependability Benchmarking & Prediction A Grand Challenge ....pdf

Dependability: Trustworthiness of a computer system such that reliance can ...merely a theoretical design measured against a (possibly imperfect) ...

...and Imperfect Clocks on Timestamp Ordering in Distributed ....pdf

Performance Models for Perfect and Imperfect Clocks...G. Spirakis Department of Computer Science and ...a model of a distributed database system which ...

DEPENDABILITY EVALUATION USING COMPOSED SAN-BASED REWARD ....pdf

evaluation of fault-tolerant parallel and distributed computer systems. ...S. Trivedi, Coverage modeling for dependability analysis of faulttolerant ...

Defect and Fault Seeding In Dependability Benchmarking.pdf

Defect and Fault Seeding In Dependability Bench...actual usage patterns is generally highly imperfect...This gives you a good comparative analysis of ...

Determining the Dependability of Service-Oriented ....pdf

Malcolm Munro Department of Computer Science, ...distributed applications and systems due to such ...2. Dependability Analysis We intend our method ...

PARALLEL AND DISTRIBUTED COMPUTING AND SYSTEMS.pdf

PARALLEL AND DISTRIBUTED COMPUTING AND SYSTEMS_专业... dependability and performability evaluations in ... A big problem with the analysis of PNs and ...

On Dependability in Distributed Databases.pdf

On Dependability in Distributed Databases_专业资料。...dependability of computer systems, and a large ... can be made from careful analysis of query and...

...and fault-tolerance issues in distributed VoD systems_图文....pdf

This work presents a tool which is suited for ...perform dependability analysis of distributed VoD ...Reliable Computer Systems Design and Evaluation. ...

Software Reliability Modeling with Imperfect Repair Actions_....pdf

a,0 Figure 3: Non-homogeneous Markov chain with imperfect repair section. ...formance and Reliability Analysis of Computer Systems: An Example-Based ...

2 Does imperfect debugging affect software reliability growth....pdf

2 Does imperfect debugging affect software ...An analysis of real project data is presented. ...In the development of a large software system ...

Verification of Behavior in Multi-Agent Systems.pdf

system rather than a true distributed agent system...Since agent actuators and sensors may be imperfect... an analysis of the stability and control of ...

更多相关标签:
网站地图

文档资料共享网 nexoncn.com copyright ©right 2010-2020。
文档资料共享网内容来自网络,如有侵犯请联系客服。email:zhit325@126.com