Reliability analysis of phased mission systems when components can be swapped upon failure

This paper proposes a new strategy to improve the reliability of phased mission systems (PMS), namely by swapping of components. In the proposed strategy, when a component fails, it can be swapped by another one in the system which is still functioning. We consider both the options to swap components at any time and for swaps to be possible only at phase transitions. This paper also discusses the strategy of swapping components according to structure importance. The structure importance is used to measure the importance level of the components in contributing to system reliability. Then when a component with high importance fails, it is swapped by another component with lower importance from the system which has not yet failed. The survival signature methodology is implemented to assess the reliability of PMS when there is a possibility of components swapping. In addition, we consider the cost effectiveness of component swapping through two models (time independent and time dependent) of penalty costs for PMS.


Introduction
A phased mission system (PMS) is defined as a system which performs a series of tasks in consecutive and nonoverlapping periods (phases). In order for a PMS to accomplish its mission successfully, each phase has to be completed without any failure [1]. Therefore, the reliability of a PMS is the probability that the system functions in all phases. A distinct feature of a PMS is that the system configuration varies between phases and the functioning of components in different phases is dependent. This feature makes the reliability analysis of PMS more complex than the reliability analysis of a single phase system.
Over the past few decades, there has been extensive research to analyse the reliability of a PMS. Some researchers focus on modelling the dependence among system components using state-based approaches, which are based on Markov models or Petri nets [2][3][4][5]. Other approaches are based on combinatorial methods, such as binary decision diagram or multi-valued decision diagram based models [6][7][8][9]. Recently, a combinatorial analytical approach providing a new survival signature methodology for reliability analysis of PMS has been introduced [10]. The use of the survival signature is attractive as it separates the system structure from the component lifetime distributions, which simplifies the PMS reliability computation and related statistical analysis [11][12][13].
It is often difficult for a PMS to work with high reliability. Generally, there are mainly two approaches that can be used to improve the reliability of the PMS. The first way is increasing the component reliability (reliability allocation), and the other way is using redundant components in parallel (redundancy allocation) e.g. [14][15][16][17]. Unfortunately, these two approaches will increase the cost of the PMS and do not always yield competitive results. Recently, a new attractive strategy called 'component swapping' has been introduced to improve the reliability of a single phased mission system [18]. Component swapping is defined basically as the possibility to swap a component upon failure by another one in the system, which has not yet failed [18]. The strategy of 1 Copyright c by ASME component swapping is attractive in its nature since it will not increase the weight and volume of the system. In this paper, we propose the approach of swapping components upon failure to prevent the PMS from failing. In addition, we discuss the strategy of swapping components according to structure importance, as an example of a method to decide which components may be considered for swapping. It is attractive if we can consider the possibility of swaping components at any time during the mission. However, this may not be possible in some PMS, so we also consider the case that swaps between components are only possible at transitions of phases. In this paper, we use the survival signature methodology for PMS, as introduced by [10], to analyse the effect of swapping components on the PMS reliability when the components can be swapped at any time during the mission or only at transitions of phases. Some costs are likely related to enabling both swapping scenarios. In this paper, we also establish models for evaluating the expected costs for the PMS if there is a possibility to swap components at any time during the mission or only at transitions of phases.
This paper is organized as follows: Section 2 presents a brief background on PMS. Section 3 considers the effect of swapping components upon failure on the PMS reliability. Section 4 presents the effect of swapping components according to reliability importance on the PMS reliability. Section 5 demonstrates two cost models to analyse the expected costs of the failure of the PMS if the components can be swapped. The material in Sections 3-5 is all novel and forms the main contribution of this paper to the literature. We present illustrative examples in each section. We end the paper with some concluding remarks in Section 6.

Phased mission systems
A PMS performs a sequence of functions or tasks during consecutive phases to accomplish a specific mission. Generally, in a PMS, each phase corresponds to one configuration and the configuration changes from phase to phase. The states of the same component in different phases are mutually dependent. The PMS might have the same components in each phase or the components might vary from phase to phase. In this paper we consider only PMS with the same components used in each phase, as it simplifies the survival signature approach as shown in [10], and it still provides the opportunity to fully show how component swapping can be implemented in PMS and how the resulting system reliability improvement can be quantified, which is the main contribution of this paper. The presented approach for component swapping for PMS can quite straightforwardly be generalized to PMSs with different components in each phase using the survival signature method introduced by [10]. The key contribution of this current paper is the introduction of component swapping in PMS, which has not been considered before. It needs to be emphasized that, in this paper, both the system and its components are assumed to be non-repairable during the mission, so if a component fails to function at the end of a certain phase, then it cannot work again in subse-quent phases.
Consider a PMS with n components, with N ≥ 2 phases. The state of component j ∈ {1, 2, · · · , n} in phase i, i ∈ {1, 2, · · · , N} can be represented as a binary variable X i, j , such that X i, j = 1 if component j functions for all of phase i and X i, j = 0 if component j fails before the end of phase i. The state of the system in phase i can then be described by a binary function where φ i = 1 represents that the system functions successfully for the entire phase i and φ i = 0 represents failure to do so. The vector X i = (X i,1 , ..., X i,n ) represents the states of all components at the end of phase i. Similarly, the structure function of the PMS is also a binary variable which is completely determined by the states of all the components during the mission where X = (X 1 , ..., X N ) = (X 1,1 , ..., X 1,n , ..., X N,1 , ..., X N,n ) is the state vector of the components during the entire phased mission. Because a PMS is functioning if and only if all its phases are completed without failure, the structure function of the PMS can be written as So, φ s = 1 indicates successful functioning of the system to complete the mission, while φ s = 0 indicates failure of the system to complete the mission successfully.

Swapping components upon failure
In this section, we consider the strategy of swapping components upon failure to increase PMS reliability under two scenarios. First, we assume that if a component fails at any time during the mission, it can be swapped by another one which is still functioning. Secondly, we assume that component swapping is only possible at transitions of phases. A swap between components is logically restricted to components of the same type and it can only be done if the system cannot function with the existing components in place.
While this paper presents the mathematical theory for component swapping in PMS, it is important to briefly consider the type of systems that may well enable application of this theory. A first category is critical systems which cannot easily be maintained, for example systems for use in space. One could consider computer systems as components of such systems being set up such that a computer intended for use of a less critical task could take over a critical task upon failure of the computer intended for that task. As a very different 2 Copyright c by ASME system, one could think about staff in an organisation, where it may be possible to provide additional training for some staff, enabling them to take over duties for other staff if necessary, for example if the latter fall ill during crucial periods.
As an example close to home, at a university department it may be needed that other staff take over exam marking duties, under tight deadlines, if the main examiner is ill. A further example would be sound or light systems for concerts, where spare parts or equipment may not be available but the position of a failing component may be more crucial than of another component, which could possibly be quickly swapped into the location of the failed component, although this may only be possible at an interval in the performance.

PMS with a single type of components
Consider a system with n components of the same type that performs a N ≥ 2 phase mission. Phase i ∈ {1, 2, ..., N} runs from time τ i−1 to time τ i with τ 0 = 0 and τ i−1 < τ i for all i ∈ {1, 2, ..., N}. The survival signature Φ S (l 1 , l 2 , ...l N ) denotes the probability that the PMS functions by the end of the mission given that precisely l i , i ∈ {1, 2, ..., N}, of its components functioned in phase i. It is assumed that the random failure times of components in the same phase are independent and identically distributed [10]. If N(t) ≤ N is the phase that the system is in at time t, the survival signature of the first N(t) phases Φ S l 1 , l 2 , ...l N(t) is equal to where S denotes the set of all possible state vectors for which l i components function in phase i, and m i is the number of components that function at the beginning of phase i. Because both the system and its components are non-repairable during the mission, the number of components that function at the beginning of phase i should be equal to the number of components that function at the end of phase i − 1. So, m i = l i−1 while m 1 = n [10]. The reliability of the PMS at time t is given by , conditioned on the system working at the beginning of phase i, this conditional CDF is Equation (5) can be rewritten as We assume that there are fixed swapping rules, which prescribe upon failure of a component precisely which other component takes over its role in the system, if possible and if the other component is still functioning, in order to prevent the system from failure, and we further assume that such a swap of components takes neglectable time and does not affect the functioning of the component that changes its role in the PMS nor its remaining time until failure. The assumption of neglectable time needed for a component swap is mainly for mathematical convenience, but is reasonable if a swap of components only takes a very small amount of time compared to the period over which the system functions. We can take the effect of the defined swaps, if they are applicable at any time during the mission or only at transitions of phases, into account through the PMS structure function, and hence, it can be taken into account for computation of the system reliability through the PMS survival signature. Let s (X) are the structure functions of the PMS considering the defined swaps at any time during the mission or only at transition of phases, respectively, φ (E) s will typically be equal to 1 for some X for which φ s was equal to 0 and φ (W ) s will typically be equal to 1 for some X for which φ s ≥ φ s . The reliability of PMS can be calculated straightforwardly in both scenarios by substituting the survival signature of the original PMS in Equation (7) by the survival signatures that consider the swapping scenarios in Equations (8) and (9). It is important to notice here that the swap in both scenarios is entirely reflected in the PMS survival signature, and the conditional failure time of the components remains the same as for the original system. 3 Copyright c by ASME 3.2 PMS with multiple types of components Consider a system with N ≥ 2 phases, and with K types of components in each phase. Components are said to be of the same type if their failure times are exchangeable [11][12][13]. In practice, this can perhaps be best understood as follows. Suppose there are, say, 6 components of the same type in a system; if you receive the information that any number of these 6 components has failed by a certain time, then you have no idea which specific components actually have failed, all subsets of the same size are equally likely to be the failed components. Let phase i run from time τ i−1 to time τ i with τ 0 = 0 and τ i−1 < τ i for all i ∈ {1, 2, ..., N}. Let Φ S (l 1,1 , ..., l 1,K , ..., l N,1 , ..., l N,K ) denote the probability that the PMS functions given that precisely l i,k components of type k function at the end of phase i, for all i ∈ {1, 2, ..., N} and k ∈ {1, 2, ..., K}. Because the failure times of the components of the same type are assumed to be exchangeable, the survival signature of the first N(t) phases is where N(t) ≤ N is the phase that the system is in at time t and S denotes the set of all possible state vectors for which there are possibly l i,k components of type k functioning at the end of phase i. The number of components of type k that function at the beginning of phase i is m i,k . As pointed out in Section 3.1, because both the system and its components are nonrepairable during the mission, the number of components of type k that function at the beginning of phase i is equal to the number of components of type k that function at the end of phase i − 1. So, m i,k = l i−1,k while m 1,k = n k is the number of components of type k in the system. The reliability of the PMS can be expressed as the system working at the beginning of phase i, and it is equal to Equation (11) can be simplified as As in Section 3.1, if it is assumed that there are fixed swapping rules and that a swap of components takes neglectable time, then we can study the effect of the defined swaps if they are applicable at any time during the mission or only at transitions of phases, through the PMS structure function, and hence, it can be taken into account for computation of the system reliability through the PMS survival signature. This approach is illustrated and explained in more detail in the next two examples.

Example 1
Consider the PMS in Fig. 1 that consists of three components performing a three-phase mission. All the components are of the same type and work independently from one another in each phase. The lifetime distribution of the components in each phase follows an Exponential distribution and the failure rates of phases 1, 2 and 3 are 2 × 10 −3 /hour, 1 × 10 −4 /hour and 2 × 10 −4 /hour, respectively. The duration of all three phases are 10 hours each.
The method in Section 3.1 is used to examine the reliability of this PMS if components 1 and 2 can be swapped upon failure at any time during the mission or only at transitions of phases.
In both scenarios, the opportunity of the swap is taken into account through the structure functions. For example, the state vector (0, 1, 0) represents the situation when components 1 and 3 fail during phase 1, but component 2 is still functioning, in this case, φ 1 (0, 1, 0) = φ Copyright c by ASME The first phase The first two phases The PMS  of results are the survival signatures of phase 1, the second group of results contains the survival signatures of the first two phases and the last group of results represents the survival signatures of the whole PMS. Entries for which the survival signatures are 0 are omitted. The reliability R of the original PMS is shown in Table 2 and presented by the solid line in Fig. 2. As shown in Table 2 and Fig. 2, there are reliability jumps at t = 10 and t = 20. The reason is that if component 1 has failed in phase 1, the PMS may still function in phase 1, however, the PMS will fail immediately when it steps into phase 2. Similarly, if component 2 has failed in phase 2, the system can still function in phase 2 when component 3 is functioning, however, the system will fail immediately when it steps into phase 3.
The reliability of the PMS when the swap is applicable at any time during the mission or only at transition of phases, R (W ) and R (E) , respectively, are also given in Table 2 and presented in Fig. 2. In Fig. 2, R i , R

Example 2
Consider the PMS in Fig. 3 that consists of five components performing a three-phase mission. The components follow Weibull and Exponential distributions and can be divided into two types according to the distribution of the life time. Table 3 summarizes the distribution information of the components in each phase. For the Weibull distribution, F(t) = 1 − e −(t/β) α , α and β are the scale parameter and  Table 3. The distribution information of the components in Fig. 3 shape parameter, respectively. For the Exponential distribution, F(t) = 1 − e −λt where λ is the failure rate. Assume that phases 1, 2 and 3 last for 10, 270 and 20 hours, respectively. The method in Section 3.2 is used to examine the reliability of this PMS if components 1 and 2 are swappable, and components 3 and 4 are swappable, with swaps either possible upon failure at any time during the mission or only at transition of phases. For example, in phase 1, if the swap is applicable at any time during the mission, then we can swap component 1 to 2 when component 2 fails but component 1 stills function, and we can swap component 3 to 4 when components 1, 4 and 5 fail but component 3 still functions. The reliability of the PMS is shown in Table 4 and Fig. 4, using the same notations as in Table 2 and Fig. 2.
The results show that there is a reliability jump at the transition of phases 2 and 3 in the original PMS. The reason is that if components 1 and 3 or components 2, 4 and 5 have all failed simultaneously in phase 2, then the PMS still functions in phase 2, but the PMS will fail immediately when it steps into phase 3. The operation of component swapping 5 Copyright c by ASME  4 Swapping components according to structure importance When considering swapping opportunities for components, and important practical question is how to determine which components can actually be swapped. This can only meaningfully be assessed in practice for real-world systems, so it is beyond the scope of this paper. However, in order to show how such decisions could be linked to system properties, we illustrate the idea that defining possible swaps could be based on components' importance measures. In particular, we consider the strategy of swapping components according to structure importance to improve the reliability of PMS. In this strategy, the structure importance is used to measure the importance level of the components in contributing to system reliability, then when a component with high importance fails, it is swapped by another component with lower importance from the system which has not yet failed. If the component is swapped according to the structure importance criterion, the swap will take place with disregard of whether or not the system can continue to function with the existing components in place, depending on the level of the importance of the component that has failed. However, as we have seen in the previous section, if the component is swapped upon failure, then the swap takes place only if the PMS cannot continue to function with the existing components in place.
Since it is assumed that only components of the same type are swappable, the structural importance which measures the relative importance of components with respect to their positions is sufficient to prioritize the components in each phase [19]. The structural importance of component j ∈ {1, 2, · · · , n} for the configuration in phase i ∈ {1, 2, · · · , N}, denoted by I (i) j , is defined as where φ i (·) is the structure function of the system in phase i; x j represents the component state vector with x j removed, (1 j , x j ) and (0 j , x j ) represent the component vector when component j in phase i is in state 1 or 0, respectively, and 2 n−1 is the total number of different state vectors with n − 1 elements.
After the components are prioritized by structural importance, the swapping rules are defined upon this prioritization, so it is assumed that when a component with high importance fails, it is swapped by another component of the same type with lower importance which has not yet failed. It is assumed further that the swap between components takes neglectable time and does not affect the functioning state of the component that changes its role in the PMS nor its remaining time until failure.
We can calculate the reliability of a PMS after we define the swapping rules according to the structural importance, in the same way as in Section 3. We consider the effect of the defined swaps, either if they are applicable at any time during the mission or only at transitions of phases, into account through the PMS structure function, and hence, it can be taken into account for computation of the system reliability through the PMS survival signature. This approach is illustrated and explained in more detail in the following example.

Example 3
Consider again the system in Fig. 3 and we keep the same scenario for the phases duration and the conditional lifetime distribution of the components in each phase as in Example 2. Structural importance analysis is conducted to measure the importance of the components in contributing to system reliability in each phase, the results are shown in Table 5. The results show the orders of structure importances are I  Table 5. Structure importance for the configuration in each phase in  Table 6. Comparing these results with the results in Example 2 in which the components are swapped upon failure, we find that the reliability in phase 1 and 2, if the components 1 and 2 are swappable, and the components 3 and 4 are swappable according to their structural importances are exactly the same as if these components are swapped upon failure, however, in phase 3, the results are different. The reason is that there are some cases of the swaps that happen when the components are swapped upon failure but not happen when the components swapped according to their structural importance, and vice versa. For example, if the swaps are applicable at any time during the mission, when the components 1 and 4 function and components 2, 3 and 5 are failed in phase 3, the PMS continues to function when the components are swapped upon failure, since there is no need for component swapping in this case, however, when the components are swapped according to their structural importances, the system will have failed since in this case component 4 has taken over the role of component 3, because component 4 is classified as less importance than 3.
The results also show that the reliability jump at the transition of phases 2 and 3 in the original PMS is reduced when the components are swapped according to its structural importance. However, the amount of reduction that is gained if the components are swapped upon failure, is more than if they are swapped according to their structure importance. The reason for this is that if the components are swapped according to their structure importances, if the components 2, 4 and 5 all failed during phase 2, then components 2 and 4 cannot be swapped by components 1 and 3, respectively, when needed, as in the case when the components are swapped  upon failure.

Cost penalty for failure of PMS with component swapping
In this section, we establish models for evaluating the expected penalty costs of system failure and component swaps. This is important for decisions on whether or not to facilitate component swaps. We consider two basic cost scenarios for system down-time, namely a single fixed penalty cost, independent of the length of the system down-time, and a penalty cost proportional to down-time.

Time independent penalty costs
Suppose that we have a PMS which needs to perform a sequence of missions in a certain period of time [τ 0 , τ N ]. The system must function during all the phases. If the system fails at any time during phase i before τ N , then a fixed penalty cost must be paid. Let this cost be where p j , j = i, · · · , N, is a specific cost resulting for phase j not being completed. We assume that p j is independent of the failure time during phase j. Let τ − i represent the last moment in phase i and C S denote the expected cost of failure 7 Copyright c by ASME of the PMS, then As described in Sections 3 and 4, the reliability of PMSs can be improved by swapping components either at any time during the mission or only at the transition of phases. An upfront cost may need to be paid to enable each swapping scenario. Let b denote the cost to enable a regime of specified swaps at any time during the mission and e denote the cost to enable a regime of specified swaps only at the transition of phases. Let C W S and C E S denote the expected costs of the PMS in these scenarios, respectively. These expected costs are derived as follows:

Time dependent penalty costs
In practical engineering, the cost penalty for failure of a PMS may be time dependent. Let the penalty cost per unit of time in phase i be u i . If the system fails at time , 2, · · · , N}. If the system fails during phase i, the expected penalty costs that need to be paid are (19) where τ + i−1 represents the first moment in phase i and τ − i represents the last moment in phase i, and f (t) is the probability density function of the failure time of the PMS. If the system fails at τ i , then the expected penalty costs are Let C S denote the expected cost for a PMS, then Similarly, as shown above with time independent penalty costs, if b denotes the cost to enable a regime of specified swaps at any time during the mission and e denotes the cost to enable a regime of specified swaps only at the transition of phases, then the expected costs in both scenarios are given by the following equations, respectively,

Example 4
Consider again the PMS with the single type of components in Fig. 1. We keep the same scenario for the duration of all the three phases and for the conditional lifetime distribution of the components in each phase as in Example 1. We also consider the same scenario for the swapping opportunity as in Example 1, namely components 1 and 2 can be swapped upon failure. Assume that the penalty costs allocated to phase 1, 2, and 3 are p 1 = 1 × 10 3 , p 2 = 8 × 10 2 and p 3 = 5 × 10 2 , respectively. These penalty costs are independent of the failure time during or before the phases. If the cost to enable the swap at any time during the mission is b = 50, and the cost to enable the swap only at the transitions of phases is e = 3, then the expected costs for the original PMS and the expected costs in both swap scenarios are C S = 39.28, C W S = 52.1 and C E S = 7.72, respectively. Thus, the best option is to take the opportunity to enable the swap only at the transitions of phases. If the penalty costs are time dependent and u 1 = 1 × 10 3 , u 2 = 8 × 10 2 and u 3 = 5 × 10 2 , then C S = 378.71, C W S = 68.07 and C E S = 35.49. Therefore, while taking the opportunity of both swapping scenarios would reduce the expected costs, the maximum reduction is obtained when the swap is applicable only at the transitions of phases.

Example 5
Consider again the same PMS with multiple types of components in Fig. 3. We keep the same scenario for the duration of all three phases and for the conditional lifetime distributions of the components in each phase as in Examples 2 and 3. Also, we consider the same scenario for the swapping cases as in Examples 2 and 3, namely that components 1 and 2 are swappable, and components 3 and 4 are swappable. If the fixed penalty costs are independent 8 Copyright c by ASME of the failure time during the phases, set at p 1 = 1 × 10 3 , p 2 = 8×10 2 and p 3 = 5×10 2 , then the expected costs for the original PMS is C S = 15.87. If b = 50 and e = 3, then the expected costs for both swap scenarios, when the components are swapped upon failure as in Example 2, are C W S = 51.32 and C E S = 15.15. If the components are swapped according to the structure importance, as in Example 3, then the expected costs for both swap scenarios are C W S = 52.28 and C E S = 16.77. Therefore, for possible swap upon failure, it is best to take the opportunity to enable the swap only at the transitions of phases. But when the swap is according to the structure importance, then it is better not to take the option of any swap scenarios. This is because the improvement that is gained in the reliability when the components are swapped upon failure is more than if they are swapped according to the structure importance.
If the penalty costs are dependent on the failure time during the phases, set at u 1 = 1 × 10 3 , u 2 = 8 × 10 2 and u 3 = 5 × 10 2 , then the expected costs for the original PMS is C S = 615.38. The expected costs for both swap scenarios when the components are swapped upon failure are C W S = 90.63 and C E S = 526.99, and when the components are swapped according to the structure importance the expected costs are C W S = 105.41 and C E S = 473.02. It is clear that, although the best option in both swapping strategies is to take the opportunity to enable the swap at any time during the mission, the expected costs when the components are swapped upon failure are less than if they are swapped according to the structure importance.
6 Concluding remarks A phased mission system (PMS) is one that performs several different tasks or functions in sequence. In order to accomplish the mission successfully, the system has to complete every phase without failure. Therefore, it is often difficult for a PMS to work with high reliability. In this paper, we introduced the strategy of swapping components upon failure that is introduced in [18], to improve the reliability of PMS. In addition, we discussed the strategy of swapping components according to structure importance, as an example of the use of component reliability characteristics for determining possible swaps. In practice, specific system design will mostly determine which components may be swappable, this will be an important topic for future research which necessarily must be considered in direct relation to a real-world PMS. The survival signature methodology for PMS, that is introduced in [10], is used to analyse the effect of component swapping on the reliability of the PMS, comparing the scenario when the swap between components is applicable at any time during the mission with the scenario when it is applicable only at transitions of phases. Considering the approach of component swapping to increase the reliability of the PMS is attractive since the reliability and number of components do not need to be increased to improve the system reliability.
In this paper, we derived two models (time independent and time dependent) of penalty costs for PMS, in order to compare the expected costs for the PMS when there is a possibility to swap components with the option not to enable swaps. This shows that although an upfront cost might need to be paid to enable each swapping scenario, the operation of component swapping either at any time during the mission or only at transitions of phases might contribute significantly to reducing the expected costs of the PMS. These indicators are useful in security assessment and risk management under the constraint of costs.
A major topic for future research is up-scaling this approach to large real-world systems. In principle there are no difficulties for as far as the effect of swapping opportunities is concerned, the challenge is mostly in the computation of the survival signatures for the system without swaps enabled, and with swaps enabled. If only few swaps are possible, these survival signatures will largely be identical, which should give a route to efficient computation. In general, the topic of computing the survival signature for large systems is crucial, and continues to receive substantial attention. An advantage is the fact that the survival signature of coherent systems is monotonously increasing as function of the numbers of functioning components, which is particularly useful for the possibility to derive approximations or bounds for the survival signature. Such bounds may already be sufficient to answer practical questions which require the system reliability function as input.