Robustness of Nonparametric Predictive Inference for Future Order Statistics

This paper considers robustness of Nonparametric Predictive Inference (NPI), in particular considering inference involving future order statistics. The concept of robust inference is usually aimed at development of inference methods which are not too sensitive to data contamination or to deviations from model assumptions. In this paper we use it in a slightly narrower sense. For our aims, robustness indicates insensitivity to small change in the data, as our predictive probabilities for order statistics and statistical inferences involving future observations depend upon the given observations. We introduce some concepts for assessing the robustness of statistical procedures to the NPI framework, namely sensitivity curve and breakdown point; these classical concepts require some adoption for application in NPI. Most of our nonparametric inferences have a reasonably good robustness with regard to small changes in the data.


Introduction
As every statistical inference has underlying assumptions about models and specific methods used, one important field in statistics is the study of robustness of inferences. Statistical inferences are based on the data observations as well as the underlying assumptions, e.g. about randomness, independence and distributional models [22]. Since the middle of the twentieth century, much theoretical effort has been dedicated to develop statistical procedures that are resistant with regard to outliers and robust with regard to small deviations from an assumed parametric model [6]. Huber [21] provides the basic theory of robust statistics. Hampel et al. [17] discussed some properties of robust estimators, test statistics and linear models. In these developments, the primary focus has been on estimating location, scale, and regression parameters [23]. It is well known that some classical procedures are not robust to slight contamination of the strict model assumptions [6]. From this perspective robustness against small deviations from the assumed model and existence of outliers or contamination, have all been identified as principal issues [23]. In classical robust statistics, there are several tools used to describe robustness, e.g. the influence function, the sensitivity curve and the breakdown point.
This paper introduces robustness of NPI. This involves adopting some of the concepts of classical robust statistics within the NPI setting, namely sensitivity curve and breakdown point. These concepts fit well with the NPI setting as they depend on the actual data at hand rather than on a hypothetical underlying assumption. Data may be subject to errors occurring during the measurement and repeating process [11]. The concept of robust inference is usually aimed at development of inference methods which are not too sensitive to data errors or to deviations from the model assumptions. In this paper, we use it in a slightly narrower sense, as for our aims robustness indicates insensitivity to a small change in the data or to outliers. This paper is organized as follows. Section 2 provides a brief introduction to NPI, including key results on NPI for future order statistics as used in this paper. Section 3 provides a brief overview of some concepts used in robust statistics, namely influence function, sensitivity curve and breakdown point. In Sect. 4 we introduce the sensitivity curve and breakdown point in the NPI framework. Section 5 presents the use of these tools for NPI for events involving the r-th future observation. In Sect. 6 we use these tools to explore the robustness of the inferences involving the median and the mean of the m future observations. In Sect. 7, we briefly present NPI robustness of further inferences, namely pairwise comparisons and reproducibility of statistical tests. The paper ends with some concluding remarks in Sect. 8.

Nonparametric Predictive Inference
Nonparametric Predictive Inference (NPI) [5,7] is a statistical framework which uses few modelling assumptions, with inferences explicitly in terms of future observations. For real-valued random quantities attention has thus far been mostly restricted to a single future observation, although multiple future observations have been considered for some NPI methods, e.g. in statistical process control [2,3].
Assume that we have real-valued ordered data x (1) < x (2) < ⋯ < x (n) , with n ≥ 1 . For ease of notation, define x (0) = −∞ and x (n+1) = ∞ , or define these equal to other known lower and upper bounds of the range of possible values for these random quantities. The n observations create a partition of the real-line into n + 1 intervals I j = (x (j−1) , x (j) ) for j = 1, … , n + 1 . We assume throughout this paper that ties do not occur. If we wish to allow ties, also between past and future observations, we could use closed intervals [x (j−1) , x (j) ] instead of these open intervals I j , the difference is rather minimal and to keep presentation easy we have opted not to do this here. We are interested in m ≥ 1 future observations, X n+i for i = 1, … , m . We link the data and future observations via Hill's assumption A (n) [19], or, more precisely, via A (n+m−1) (which implies A (n+k) for all k = 0, 1, … , m − 2 ; we will refer to this generically as 'the A (n) assumptions'), which can be considered as a post-data version of a finite exchangeability assumption for n + m random quantities. The A (n) assumptions imply that all possible orderings of the n data observations and the m future observations are equally likely, where the n data observations are not distinguished among each other and neither are the m future observations. Let the random quantity S i j be defined as the number of m future observations in I j = (x j−1 , x j ) given a specific ordering, which is denoted by O i , of the m future observations among n data observations, for i = 1, … , n + m n , so that S i j = #{X n+l ∈ I j , l = 1, … , m|O i } . Then the A (n) assumptions lead to [10] where s i j are non-negative integers with ∑ n+1 j=1 s i j = m . Equation (1) implies that all n + m n orderings O i of the m future observations among the n data observations are equally likely. Another convenient way to interpret the A (n) assumptions with n data observations and m future observations is to think that n randomly chosen observations out of all n + m real-valued observations are revealed, following which you wish to make inferences about the m unrevealed observations. The A (n) assumptions then imply that one has no information about whether specific values of neighbouring revealed observations make it less or more likely that a future observation falls in between them. For any event involving the m future observations, Eq. (1) implies that we can count the number of such orderings for which this event holds. Generally in NPI, a lower probability for the event of interest is derived by counting all orderings for which this event has to hold, while the corresponding upper probability is derived by counting all orderings for which this event can hold [5,7]. In NPI, the A (n) assumptions justify the use of resulting inferences directly as predictive probabilities. Using only precise probabilities, such inferences cannot be used for many events of interest, but in NPI we use the fact, in line with De Finetti's Fundamental Theorem of Probability [13], that corresponding optimal bounds can be derived for all events of interest [5]. These bounds are lower and upper probabilities in the theory of imprecise probability [4]. NPI provides frequentist inferences which are exactly calibrated in the sense of [24], and it has strong consistency properties in theory of interval probability [5]. NPI is always in line with inferences based on empirical distributions, which is an attractive property when aiming at objectivity [7]. In NPI the n observations are explicitly used through the A (n) assumptions, yet as there is no use of conditioning as in the Bayesian framework, we do not use an explicit notation to indicate this use of the data. The m future observations must be assumed to result from the same sampling method as the n data observations in order to have full exchangeability. NPI is totally based on the A (n) assumptions, which however should be considered with care as they imply, e.g. that the specific ordering in which the data appeared is irrelevant, so accepting A (n) implies an exchangeability judgement for the n observations. It is attractive that the appropriateness of this approach can be decided upon after the n observations have become available. Let X (r) , for r = 1, … , m , be the r-th ordered future observation, so X (r) = X n+i for one i = 1, … , m and X (1) < X (2) < ⋯ < X (m) . The following probabilities are derived by counting the relevant orderings and use of Eq. (1). For j = 1, … , n + 1 and r = 1, … , m, For this event NPI provides a precise probability, as each of the n + m n equally likely orderings of n past and m future observations has the r-th ordered future observation in precisely one interval I j . As Eq. (2) only specifies the probabilities for the events that X (r) belongs to intervals I j , it can be considered to provide a partial specification of a probability distribution for X (r) ; no assumptions are made about the distribution of the probability masses within such intervals I j .
Analysis of the probability in Eq. (2) leads to some interesting results, including the logical symmetry P(X (r) ∈ I j ) = P(X (m+1−r) ∈ I n+2−j ) . For all r, the probability for X (r) ∈ I j is unimodal in j, with the maximum probability assigned to interval I j * with r−1 m−1 (n + 1) ≤ j * ≤ r−1 m−1 (n + 1) + 1 . A further interesting property occurs for the special case where the number of future observations is equal to the number of data observations, so m = n . In this case, P(X (r) < x r ) = P(X (r) > x r ) = 0.5 holds for all r = 1, … , m . This fact can be proven by considering all 2n n equally likely orderings, where clearly in precisely half of these orderings the r-th future observation occurs before the r-th data observation due to the overall exchangeability assumption.
For an event X (r) ∈ I j , the A (.) assumptions provide precise probabilities. More generally, interest may be in an event X (r) ∈ Z , with Z any subset of the real values, for example an interval not equal to one of the intervals I j created by the data. Generally, NPI provides bounds for the probability for such an event, where the maximum lower bound and minimum upper bound are lower and upper probabilities, respectively [4]. The NPI lower and upper probabilities are The lower probability (3) is obtained by summing up only the probability masses that must be in Z. The upper probability (4) is obtained by summing up all probability that can be in Z. The NPI lower and upper probabilities for the event that X (r) > z , where z is not equal to one of the data observations, are We denote the median of m future observations by M m . For m odd, so M m = X ( m+1 2 ) , the NPI probability for the event M m ∈ I j = (x j−1 , x j ) can be derived straightforwardly from Eq. (2). NPI for the median of m future observations is relatively more complicated if m is even, in which case M m = (X ( m 2 ) + X ( m 2 +1) )∕2 . In this case NPI does not provide precise probabilities for the event M m ∈ I j but lower and upper probabilities, which are presented in the PhD thesis of [1].
We denote the mean of m future observations by m , and the mean corresponding to a specific ordering O i of the future observations among n observations by i m . When we consider m and i m , we must avoid possible probability mass in −∞ or ∞ , because it affects the mean of the m future observations. We assume finite bounds L < R for the data observations and future observations, such that L < x 1 < ⋯ < x n < R , and we define x 0 = L and x n+1 = R for the A (n) assumptions. The maximum lower bound and the minimum upper bound for the mean i m of the m future observations, for given ordering O i , are

Classical Concepts for Evaluating Robustness
In the literature of robustness, many measures of robustness of an estimator have been introduced [16,17]. In this section, we review some concepts from classical theory of robust statistics, namely the influence function (IF), sensitivity curve (SC), empirical influence function (EIF) and breakdown point (BP). First, we consider the influence function (IF), an approach that is due to [16]. Let the CDF F denote the true underlying distribution function, and CDF G a contaminating distribution which puts all its mass in . For an estimator T based on data from a population with CDF F, the influence function of T at basic distribution F is Here (1 − )F + G with 0 < < 1 is a mixture distribution of F and G . This definition of the IF depends on the assumed distribution as it assesses the effect of an infinitesimal perturbation in a distribution on the value of the estimator. There are several finite sample versions of (13), the most important being the sensitivity curve [28] and the empirical influence function [17]. Let T n (X) = T n (x 1 , .., x n ) denote a statistic of the sample X = (x 1 , .., x n ) and let T n+1 (X, ) denote the corresponding statistic of the sample x 1 , .., x n , . The simplest idea is the empirical influence function [17].
This EIF i is defined by replacing the i-th value in the sample X by an arbitrary value and looking at the output of the estimator [17]. Alternatively, one can define it by adding an observation, i.e. when the original sample consists of n observations one can add an arbitrary value [17, p. 93]. The second tool is the sensitivity curve [28]. Again there are two versions, one with addition and one with replacement [17]. In case of additional an observation, the sensitivity curve (SC) is defined as [22] SC n ( , T n , X) measures the sensitivity of T n to the addition of one observation with value [22]. The sensitivity curve measures sensitivity of an estimator to a change in the sample. In case of replacing an observation x i by , let T n (X, , i) denote a statistic of the sample (x 1 , … , x i−1 , , x i , … , x n ) , then the SC is defined as [22] SC i ( , T n , X) = n T n (X, , i) − T n (X) . This version of SC measures the sensitivity of T n to replacing the i-th value in the sample by an arbitrary value.
The concepts defined above are local measurements, as they in principle examine the effect on an estimator of substituting a single contaminant for one of the n observations, or of adding a data point to the sample. In contrast, the breakdown point is a global measurement, as it gives the highest fraction of outliers one may have in the data before the estimator goes to infinity [23]. Let X = (x 1 , … , x n ) be a fixed sample of size n. We can contaminate this sample in many ways [22]. We consider the following two; a replacement and b contamination. These will also be considered in the NPI setting in Sect. 4. First, a replacement: we replace an arbitrary subset of size l of the sample by arbitrary values y 1 , … , y l , so 1 ≤ l ≤ n [22]. Let X ′ denote the contaminated sample. The fraction of contaminated values in the contaminated sample X � = (x 1 … , x l−1 , y l , … , y n ) , is a = l n . Secondly, b contamination: we add l arbitrary additional values Y = (y 1 , … , y l ) to the sample X [22]. Let X ′′ denote the sample contaminated by adding l arbitrary additional values. Thus, the Page 8 of 34 fraction of contaminated values in the contaminated sample X �� = X ∪ Y is b = l l+n . Let T = (T n ) be an estimator and T(X) be its value at the sample X. The maximum bias which might be caused by general , which is either a or b , is [22] where the supremum is taken over the set of all -contaminated samples, which is either X ′ or X ′′ . The definition of the breakdown point is The breakdown point * (X, T) of an estimator T at sample X is the smallest value of for which the estimator T(X, Y) can have values arbitrarily far from T(X).

Robustness Concepts in NPI
A simple way to study NPI robustness is to contaminate the given data and then explore its effect on our predictive inference. This approach is straightforward, gives an intuitive analysis, and is in line with the classic nonparametric robustness concepts, as they typically assess the influence on statistical inference of an arbitrary data value either added to the data or substituted for an original observation. We do not consider IF for NPI, as IF depends on the assumed distribution and in the NPI approach we do not assume any underlying distribution. In our study of the robustness of NPI, we will focus on the sensitivity curve (SC) and breakdown point (BP) as they typically rely on the actual data at hand rather than on a hypothetical underlying population. We can also adopt EIF, but we prefer to only focus on SC as local measurement of our predictive inferences. Let x = {x 1 , … , x n } be a given sample of real-valued observations and let I(x) be a predictive inference for future observations, based on the sample x . Such a sample x can be contaminated in many ways, as discussed in Sect. 3, and we consider two of them; substituting a contaminant for one of the n observations or adding an additional observation to the past data. We denote these contaminated data by x(j, ) and ( x, y ), respectively. These two ways of contaminating the sample will be studied separately in the NPI framework. We first focus on the effect of adding to one of the observations in the past data, as it is convenient and logical to do this in the NPI method. Let I(x(j, )) denote the inference of interest based on the contaminated data x(j, ) , where the data are contaminated by replacing x j by x j + in x . The NPI sensitivity curve (NPI-SC) for a predictive inference I(x) , in case of replacing one observation x j by x j + , is defined by It can also be of interest to consider nSC I (x(j, )) , corresponding to the classical definition of the sensitivity curve as given in Sect. 3. We may multiply SC I (x(j, )) by n, but in our case Eq. (16) is more straightforward, and it depends on n, so when n is large we expect SC I (x(j, )) to become smaller. However, if one wants to compare sensitivity for different values of n, then one may need to multiply SC by the sample size n to make the evaluation less sensitive to n. Let I(x, y) denote the inference of interest based on the contaminated data, where the data are contaminated by adding y to x . The NPI-SC, in the case of adding an additional observation y to the data, is This NPI-SC I (x, y) assesses the sensitivity of an inference to the position of an additional observation, so it illustrates the impact of adding an additional observation y to the sample on the inferences involving future observations.
A finite sample breakdown point (BP) was first proposed by [20], as "tolerance of extreme values" in the situation of location parameter problems, and it was generalized for a variety of cases by [15]. As far as we know, it has not been applied to situations of predictive inferences where the range of the inferences for the future observations is bounded, but it can easily be extended to such situations. We will modify the concept of BP to fit with the NPI approach. The maximum value of predictive inferences in terms of lower and upper probabilities is 1. We introduce a new definition of BP, which we call the c-breakdown point, and denote by * c (I, x(j 1 , … , j l , )). To introduce the c-breakdown point concept, we first need to introduce some notation related to the way of contamination of the data x , as discussed in Sect. 3. First, 'replacement': we replace a subset of size l of the data x by x j 1 + , … , x j l + , where 1 ≤ l ≤ n . We denote these contaminated data by x(j 1 , … , j l , ) . Let I(x(j 1 , … , j l , )) denote the inference of interest based on the contaminated data. Note that can be vary for each value, i.e. j i for i = 1, … , l , and we denote these contaminated data by x(j 1 , … , j l , j 1 , … , j l ) for different . The fraction of contaminant values in the contaminated sample x(j 1 , … , j l , ) is a = l n . Secondly, 'additional': we add l arbitrary additional observations y 1 , … , y l to the past data x . We denote these contaminated data by (x, y 1 , … , y l ) . The inference is denoted by I(x, y 1 , … , y l ) . The fraction of contaminant values in the contaminated sample (x, y 1 , … , y l ) , is b = l l+n . The maximum bias which might be caused by a -replacement, is where the supremum is taken over the set of all a -replacement samples , n} for fixed and given data x . Alternatively, one can define the maximum bias by adding l contaminated values to the sample x , so the maximum bias which might be caused by b -contamination is where the supremum is taken over the set of all b -contaminated samples (x, y 1 , … , y l ) , with y 1 , … , y l ∈ ℝ of given data x . The c-breakdown point, where c ∈ [0, 1] , for the case of a -replacement , is defined as Alternatively, the c-breakdown point for the case of adding l observations to the original sample ( b -contamination), is The c-breakdown point is the smallest fraction of contamination in the past data that could cause a predictive inference to take a value at least c away from the value of the initial predictive inference. This definition includes for, c = 0 , the case when any change in the inference caused by l contaminated observations, is considered as breakdown of the inference of interest. The value c determines how much we allow the inference to change before its breakdown.

Robustness of NPI for the rth Future Order Statistic
To illustrate the use of the robustness concepts for NPI, namely NPI-SC and NPI-BP as defined in Sect. 4, we first consider the probabilities for events involving the r-th ordered future observation. We illustrate both ways that the sample can be contaminated.

NPI-SC for Data Replacement
To begin with, we explore how a contamination in the data affects the NPI probability for the event that X (r) ∈ I k in Eq. (2). The probability (2) is only affected by replacing contamination if the indices, k = 1, … , n + 1 , differ. The effect of replacing an observation x j by x j + =x l , with ∈ ℝ , on the probability for the event X (r) ∈ I k is The NPI lower and upper probabilities for the event X (r) > z are, in some cases, affected slightly by changing x j to x j + . Let z ∈ I k = (x k−1 , x k ) , then the effect of replacing an observation x j by x j + =x l , with ∈ ℝ , on the NPI lower and upper probabilities for the event X (r) > z , is This NPI-SC depends on the value of r and which interval it falls in, and will be illustrated in Example 1 in Sect. 5.4.

NPI-SC for Additional Data
Suppose we are interested in assessing the effect of an additional observation on the probability for the event that the rth ordered future observation falls in interval I j , by considering We let j * be such that y ∈ I j * . If the method is robust to the new observation then P(X (r) ∈ I j |y ∈ I j * ) should be close to P(X (r) ∈ I j ) for all r, j, j * . The intuitive question we should investigate is when the influence is larger, if j * < j , or j * = j , or j * > j ? Thus, this P(X (r) ∈ I j |y ∈ I j * ) needs to be studied with respect to the position of j * and j. The P(X (r) ∈ I j |y ∈ I j * ) can be derived using Eq. (2). For j * < j, Similarly, for j * > j , n is replaced in Eq. (2) by n + 1 but j is unchanged, = P(X (r) ∈Ĩ j |y ∈ I j * ) + P(X (r) ∈Ĩ j+1 |y ∈ I j * )

3
It is quite easy to proof [1] that and for j * > j if and only if j ≥ r(n+1) m + 1 . The SC for the event that X (r) ∈ I j , when we add an additional observation y ∈ I j * where j * < j and The NPI-SC measures how a single contaminant, whether added or substituted, affects an inference of interest, which is in line with SC in classical robustness.

NPI-BP for Data Replacement and Adding
We illustrate the NPI-BP for the lower and upper probabilities for the event that x n go to infinity, then the NPI lower and upper probabilities for the event that X (r) > z , will not change at all. However, when we only keep x 1 , … , x k−2 fixed and let x k−1 , … , x n go to infinity then [P, P](X (r) > z) will increase. For c = 0 the minimum fraction of the contaminated values in the contaminated sample that can cause b( a ; x, [P, P](X (r) > z)) > 0 , is An effect on such an inference occurs only when the contaminated values lead to change of the number of the observations that are greater than z. The value of the c-breakdown point decreases as the value of k increases, where I k is the interval that z falls in. Similarly, the c-breakdown point for the probability for the event that X (r) ∈ I k is n−k+2 n . In the case of adding observations to the data, the c-breakdown point for the probability for the event that X (r) ∈ I j , for c = 0 , is Thus, adding a single data observation will change the probability for the event that X (r) ∈ I j . The size of the change varies depending on which order statistic is considered and in which interval it is, which will be illustrated in Example 1 in Sect. 5.4. Similarly, in the case of additional observations to the sample, the c-breakdown point for the event that X (r) > z , for c = 0 is 1 n+1 . We have only considered the NPI-BP for c = 0 here. In Example 1, we will also illustrate NPI-BP for c > 0.

Example
We illustrate the NPI-SC and NPI-BP presented in this section by the following example.
To illustrate the NPI-BP, we consider the data set x and the case with m = 5 and interest in event X (r) ≥ 1 . Table 2 presents the NPI-SC for the NPI lower and upper probabilities for X (r) ≥ 1 for the values r = 1, … , 5 , in the case where we keep x 1 , … , x 8−l and we added = 100 to x 9−l , … , x 8 for l = 1, … , 8 . The results clearly show that, as the value of r increases, the effect of replacing l observations by contaminated values on the NPI lower and upper (28) * 0 (P(X (r) ∈ I i ), (x, y j 1 , … , y j l )) = inf{ b |b( b ; x, P(X (r) ∈ I j )) > 0} * 0 (P(X (r) ∈ I i ), (x, y j 1 )) = 1 n + 1 probabilities for X (r) ≥ 1 is decreasing. If we chose c = 0.15 , then the maximum NPI-BP for the event X (r) ≥ 1 occurs for r = 5 , whereas the minimum NPI-BP occurs for r = 2 . The higher the breakdown point of an inference, the more robust it is. * 0 (P(X (1) ≥ 1), x(2, … , 8, 100)) = * 0 (P(X (3) ≥ 1), x(2, … , 8, 100)) = 7 8 whereas the NPI-BP for the lower and upper probabilities for X (2) ≥ 1 and the lower probability for X (3) ≥ 1 is 6 8 and for the lower probability for X (4) ≥ 1 is 1, whereas the upper probability for X (4) ≥ 1 does not breakdown. For r = 5 the inferences did not breakdown.   2 and 3 illustrate the NPI-SC for the event X (r) ∈ I j , for r = 1, 2, 3 , j = 1, … , 9 and j * < j , j * = j and j * > j . These figures illustrate that SC P(X (r) ∈I j ) (x, y) is symmetric, i.e. SC P(X (r) ∈I j ) (x, y) = SC P(X (m+1−r) ∈I n+2−j ) (x, y) , so such as y) . For all r, the NPI-SC for X (r) ∈ I j is unimodal in j.

Robustness of the Median and Mean of the Future Observations
In the classical robustness literature there has been quite a lot of emphasis on robust estimation of a location parameter, where typically they compare the robustness of the mean and the median. In this section, we illustrate the use of the robustness concepts for NPI, namely NPI-SC and NP-BP, by considering events involving the median and the mean of the m future observations.

Median of the m Future Observations
We first examine how contamination in the data affects NPI for an event involving the median of the m future observations, for odd-valued m. We consider the NPI-SC for the lower and upper probabilities for the event M m < z . We wish to examine the effect on [P, P](M m < z) of adding a contaminant to one of the observations x j with j = 1, … , n . Let z ∈ I k = (x k−1 , x k ) , if we add to x j this becomes x l = x j + , where ∈ ℝ . The NPI-SC for event M m < z is The NPI-SC for lower and upper probabilities for the event M m < z is a step function, with the step occurring when the contamination value changes the number of intervals to the right of z.
Next we consider the NPI-SC for the lower and upper probability for the event that M m ∈ (z 1 , z 2 ) . Let z 1 ∈ I k and z 2 ∈ I d where k ≤ d . If we add to one of the data observations, i.e. x j is replaced by x l , then there are three possible situations. The effect of adding to x j is to change the value of the NPI lower and upper probabilities for the event M m ∈ (z 1 , z 2 ) , by an amount NPI-SC as specified for each case below. First, if So, when the data are contaminated and that contamination does not affect the number of intervals in (z 1 , z 2 ) then there is no effect on this inference at all, which is an attractive property. But this is not the same if m is even , which leads to more complicated analysis due to the definition of M m as the overage of two observation. For study of the robustness of M m for even-valued m we refer to the PhD thesis of [1].
The c-breakdown point for the NPI lower and upper probabilities for the event M m > z and M m > (z 1 , z 2 ) , where z, z 2 ∈ I k and m is odd, are similar as presented in Sect. 5, if we replace X (r) by M m in Eq. (27). The NPI lower and upper probabilities for such an event depend only on the number of observations that are greater than z or within (z 1 , z 2 ) , so in the sample of n observations, only n − k + 2 or more outliers can cause these probabilities to change.

Mean of the m Future Observations
We consider the NPI-SC for the mean of the m future observations. It is well known that the mean of the population in classical statistics is more sensitive than the median to a single contamination in the data [22]. We investigate the robustness of inferences involving the mean of the m future observations. The lower and upper bounds for the mean of the m future observations given the ordering O i , as given in Eqs. (7) and (8), depend on the value of s i j . The NPI-SC for the lower and upper bounds of the i m , if x j becomes x j + =x l , for > 0 and l > j or for < 0 and l < j , are As a special case, if l = j , i.e. x j + did not shift from its rank among the observations, so x j−1 < x j + < x j+1 , then the NPI-SC for bound for i m , will exceed any bound for large or small enough. The NPI-SC for m ≥ z , if x j becomes x j + =x l and ∈ ℝ , is The NPI-SC of the lower and upper probabilities for the event m ∈ (z 1 , z 2 )) , are and These NPI-SC will be illustrated in Example 2 in Sect. 6.3.
The c-breakdown points of the lower and upper bounds of i m , are 1 n for s i l+1 ≠ 0 and s i l ≠ 0 respectively. This is because if we hold x 1 , … , x n−1 fixed and let x n go to infinity then i m also goes to infinity if s i l+1 ≠ 0 or s i l ≠ 0 , corresponding to i m and i m . However, when we consider inference involving the mean, we will not let x n go to infinity, as we have assumed bounds for the data observations , x( , j 1 , … , j l )) may not be equal to 1 n . This will be illustrated in Example 2 in Sect. 6.3.

Comparison of Robustness of the Median and the Mean of the Future Observations
A main topic in the classical theory of robustness is comparison of the robustness of the mean and the median. The mean is typically very sensitive to small changes in the data, whereas the median is more robust. In our case the inferences that involve the median of the m future observations depend on the event of interest, for example, the lower and upper probabilities for the event M m > z might slightly be affected if the contaminant changes the number of observations that are less than z, and its effect is a step function, as will be illustrated in Example 2. The 0-breakdown point for M m > z , where z ∈ (x k−1 , x k ) , is n−k+2 n , so the value of NPI-BP for the median decreases as the value of k increases. If we replace x j by x l , then the inferences of events involving the mean of the m future observations might be affected by a small change in the data, if s i l , the number of future observations in I l given the ordering O i , is not equal to zero. Example 2 illustrates the NPI-SC and NPI-BP for inferences involving the mean and the median of the m future observations. Example 2 To illustrate the NPI-SC for different inferences involving the median and mean of the m = 3 future observations, we consider the data set x = {−9, −7, 0, 2, 5, 7, 10, 16} so n = 8 , and the contaminated sample x(2, ) , where we add to x 2 = −7 and ∈ ℝ . When we consider the mean of the 3 future observations, we set x 0 = L = −17 and x 9 = R = 18 as bounds for the observations. Figure 4 shows the NPI-SC for the NPI lower and upper probabilities for the events 3 ≥ 1 , 3 ∈ (1, 9) , M 3 ≥ 1 and M 3 ∈ (1, 9) given x , and the contaminated sample x (2, ) . Note that the NPI lower probability for such an event of interest in these figures is denoted by LP, and the NPI upper probability by UP. The NPI-SC for 3 ≥ 1 increases as the value of −7 + increases, and the maximum NPI-SC for the lower and upper probabilities for 3 ≥ 1 are 0.1576 and 0.1333 respectively, which occur at −7 + = 16 which is the largest contaminated value, as can not go to 25 as we set R = 18 as upper bound for the observations. The inferences involving the median of the m = 3 future observations depend on the ranks of the observations, which are only affected if the number of the observations that are greater than 1, or in (1,9), changes, so NPI-SC is a step function. The NPI-SC for the NPI lower and upper probabilities for M 3 ≥ 1 are 0.1454 and 0.1273 respectively, which occur at > 8 . So it is less than NPI-SC for 3 ≥ 1 . The NPI-SC for the event 3 ∈ (1, 9) increases till ≥ 12.3 then for > 12.3 it decreases to be close to zero. The maximum NPI-SC for the lower and upper probabilities for 3 ∈ (1, 9) are 0.0667 and 0.0909 respectively, and it occurred at = 10.8 . The maximum NPI-SC for the NPI lower and upper probabilities for M 3 ∈ (1, 9) are 0.1454 and 0.1273 respectively, so it is greater than NPI-SC for 3 ∈ (1, 9) . Table 5 shows that for < 7 and > 19 , the inferences involving the mean are more sensitive than the inference involving the median. In contrast, for 8 < ≤ 15.3 the inferences involving the mean are more robust.
To illustrate the c-breakdown point, we consider NPI-SC as function of the number of contaminants present in the data, starting by replacing x 8 by x 8 + 8 , then x 8 and x 7 by x 8 + 8 and x 7 + 7 , and so on, until all observations have been contaminated by { 1 , … , 8 } = {18. 5, 17.5, 11, 9.5, 7, 5.5, 3, 1} . Figure 5 shows the NPI-SC for the lower and upper probabilities for 3 ≥ 1 and M 3 ≥ 1) , as functions of the number of the observations that have been contaminated by adding different values of to them. The results clearly show that when we contaminate up to 5 observations, which are 2, 5, 7, 10, 16 in the data, to become 11.5, 12, 12.5, 13, 17, the inference involving the median X (2) ≥ 1 is not affected at all, whereas the inference involving the mean of the future observations is affected. If we choose c = 0.15 , then the c-breakdown points for the lower and upper probabilities for M 3 ≥ 1 and for the upper probability for 3 ≥ 1 , are all equal to 0.875, so breakdown occurs when we change 7 observations out of 8, whereas the c-breakdown point for the NPI lower probability for 3 ≥ 1 is 0.625, so breakdown occurs if 5 out of 8 observations are contaminated.

Robustness of Other Inferences
In this section we consider the use of the presented tools for robustness, namely NPI-SC and NPI-BP, for pairwise comparisons and for reproducibility of tests, as presented by [8,9].

Pairwise Comparisons
We investigate the robustness of one of the applications of NPI for future order statistics for statistical inference problems, as presented by [9]. Suppose that we have two independent groups of real-valued observations, X and Y and their ordered observed values are x 1 < x 2 < ⋯ < x n x and y 1 < y 2 < ⋯ < y n y . For ease of notation, let x 0 = y 0 = −∞ and x n x +1 = y n y +1 = ∞ . Let I x j x = (x j x −1 , x j x ) and I y j y = (y j y −1 , y j y ) .
We focus attention on m ≥ 1 future observations from each group, X n x +i and Y n y +i for i = 1, … , m . We wish to compare the r-th future order statistics from these two groups by considering the event X (r) < Y (r) , for which the NPI lower and upper probabilities, based on the A (n x ) and A (n y ) assumptions per group, are derived by The NPI-SC of the lower and upper probabilities for the event that X (r) < Y (r) , if we replace y j by y j + , which we denote by ỹ l , are x j x −1 < y j y P X (r) ∈ I x j x P Y (r) ∈ I y j y Table 5 SC I (x(2, )) for m = 3  The NPI-BP for such NPI pairwise comparisons, for c = 0 , is n if x n < y j and x n + > y j  The NPI pairwise comparisons for such an event are not sensitive to a small change in the data, as they only are affected if the change to an observation has changed the order of the X and Y observations. In Example 3 we illustrate the NPI-SC and NPI-BP for such NPI-pairwise comparisons.

Example 3
To illustrate the NPI-SC and NPI-BP for pairwise comparisons, we consider the data set of a study of the effect of ozone environment on rats growth [12, p.170]. One group of 22 rats were kept in an ozone containing environment and the second group of 23 similar rats were kept in an ozone-free environment. Both groups were kept for 7 days and their weight gains are given in Table 6. We use this dataset to illustrate the effect of replacing x 2 = −14.7 by −14.7 + , for from −50 to 100, on the pairwise comparisons based on the events X (r) < Y (r) , r = 1, … , m , and m = 5. Figure 6 illustrates what happens to the NPI lower and upper probabilities for the event X (r) < Y (r) , if observation x 2 = −14.7 in the X sample is replaced by −14.7 + . Increasing the value −14.7 to −14.7 + leads to decreasing SC P(X (r) <Y (r) ) (x(2, )) for such that the rank of this observation among the Y group changes. However, if the contaminated value −14.7 + does not change its rank among Y observations then SC P(X (r) <Y (r) ) (x(2, )) = 0 and SC P(X (r) <Y (r) ) (x(2, )) = 0 . For ≤ −30 the NPI-SC for  Table 7 The absolute value of SC P(X (r) <Y (r) ) (x(j 1 , … , j l , 100)) for m = 3 and n = 22 0.0454 0.0413 5 0.0382 0.0365 4 0.0442 0.0447 (1) has large effect where the other NPI-SC for the other inferences, for r = 2, … , 5 , are close to zero. For −1.5 ≤ ≤ 27 the SC P(X (r) <Y (r) ) (x(2, )) = 0 and SC P(X (r) <Y (r) ) (x(2, )) = 0 for all r, as the value −14.7 + does not change its rank among Y observations. For > 27 , the effect of the contaminated value −14.7 + increases as the value of r increases. The inferences involving r = 4 and 5 have large NPI-SC when the value x 2 + exceeds all the Y observations. To illustrate the c-breakdown point of these NPI pairwise comparisons, we consider NPI-SC for X (r) < Y (r) for m = 3 and r = 1, 2, 3 , for the case of adding the value 100 to l observations in group X. This is shown in Fig. 7 and Table 7. Figure 7 illustrates that the absolute value of the NPI-SC increases as the value of l, the number of contaminations in the X sample, increases. If we choose c = 0.05 , then the NPI-BP for r = 1 is 10/22, for r = 2 it is 6 / 22 and for r = 3 it is 5/22, so as the value of r increases the NPI-BP decreases. Thus the probability for the event X (r) < Y (r) based on the given data is more robust if we consider r = 1 , as it has the highest 0.05-breakdown point.

NPI for Test Reproducibility
Reproducibility of statistical hypothesis tests is an issue of major importance in applied statistics: if the test were repeated, would the same conclusion, rejection or non-rejection of the null hypothesis, be reached? NPI provides a natural framework for such inferences, as its explicitly predictive nature fits well with the core problem formulation of a repeat of the test in the future. For inference on reproducibility of statistical tests, NPI provides lower and upper reproducibility probabilities (RP). In this section, the robustness of the NPI method for reproducibility of statistical tests is presented for two basic tests using order statistics, namely a one sample quantile test and a two sample precedence test. For these inferences, NPI for future order statistics [9] is used, as briefly reviewed in Sect. 2. We assume that the first, actual experiment led to ordered real-valued observations x (1) < x (2) < ⋯ < x (n) . As we consider an imaginary repeat of this experiment, we use NPI for n = m future ordered observations [8].
To study the robustness of NPI reproducibility of classical statistical tests, we will only consider one way of contaminating the data which is by replacing one of the observations by a small contaminant. We do not consider contamination by adding a value to the data as this would make a substantial change to the test statistic and could require a different threshold value, which would complicate the study. Most of the literature on robustness [18,25] considers the robustness of the test result, so that if a test is robust then small variations in the data should not be able to reverse the test decision. In our study, we are interested in exploring the robustness of the NPI reproducibility probability of the test conclusion, not the robustness of the original test result. Thus, we will not consider the case where adding to one of the observations could change the original test decision from rejecting to not rejecting the null hypotheses, or the other way around.

3
Example 4 Suppose that the original test has sample size n = 15 and we are interested in testing the null hypothesis that the third quartile, so the 75% quantile, of the underlying distribution is equal to a specified value 0 0.75 against the alternative hypothesis that this third quartile is greater than 0 0.75 , tested at significance level = 0.05 . Using the Binomial distribution for the classical quantile test, this leads to the rule that H 0 is rejected if x (8)  , respectively. Let k denote to the number of observations that are less than 0 p based on the contaminated sample x(j, ). Table 8 presents, in the first column, the NPI-SC for the NPI-RP for the event that the future test would also reject H 0 if X (8) Table 8 presents, in the second column, the NPI-SC for the test reproducibility if the original test did not reveal a significance affect, which is the event that the future test would also lead to not reject H 0 , if X (8) . The NPI-SC for such an inference decreases as the value of k increase.

Precedence Test
As a second example of NPI for reproducibility of a statistical test based on order statistics, we consider a basic nonparametric precedence test. Such a test, first proposed by [26], is typically used for comparison of two groups of lifetime data, where one wishes to reach a conclusion before all units on test have failed.
We consider the classical scenario with two independent samples. Let X (1) < X (2) < ⋯ < X (n x ) be random quantities representing the ordered real-valued observations in a sample of size n x , drawn randomly from a continuously distributed population, which we refer to as the X population, with a probability distribution depending on location parameter x . Similarly, let Y (1) < Y (2) < ⋯ < Y (n y ) be random quantities representing the ordered real-valued observations in a sample of size n y , drawn randomly from another continuously distributed population, the Y population, with a probability distribution which is identical to that of the X population except for its location parameter y . We consider the hypothesis test for the locations of these two populations is H 0 ∶ x = y versus H 1 ∶ x < y , which is to be interpreted such that, under H 1 , observations from the Y population tend to be larger than observations from the X population.
The precedence test considered in this section, for this specific hypothesis test scenario, is as follows. Given n x and n y , one specifies the value of r, such that the test is ended at, or before, the r-th observation of the Y population. For specific level of significance , one determines the value k (which therefore is a function of and 1 3 of r) such that H 0 is rejected if and only if X (k) < Y (r) . The critical value for k is the smallest integer which satisfies Note that the test is typically ended at the time T = min(X (k) , Y (r) ) , with the conclusion that H 0 is rejected in favour of the one-sided alternative hypothesis H 1 , specified above, if T = X (k) , and H 0 is not rejected if T = Y (r) . It is of interest to emphasize this censoring; continuing with the original test would make no difference at all to the test conclusion, but further observations would make a difference for the NPI reproducibility results, as discussed by [8].
The NPI approach for reproducibility of this two-sample precedence test considers again the same test scenario applied to future order statistics, and derives the NPI lower and upper probabilities for the event that the same overall test conclusion will be derived, given the data from the original test. This involves the NPI approach for inference on the r-th future order statistic Y (r) out of n y future observations based on the data from the Y population, and similarly for the k-th future order statistic X (k) out of the n x future observations based on the data from the X population, where the values of r and k are the same as used for the original test (as we assume also the same significance level for the future test). Note, however, that there is a complication: for full specification of the NPI probabilities for these future order statistics, we require the full data from the original test to be available. But, as mentioned, the data resulting from the original precedence test typically have rightcensored observations for at least one, but most likely both populations, and these are all just known to exceed the time T at which the original test had ended. There are two perspectives on the study of reproducibility of such precedence tests. First, one can study the test outcome assuming that, actually, complete data were available, so all n x and n y observations of the X and Y populations, respectively, in the original test are assumed to be available. Secondly, one can consider inference for the realistic scenario with the actual data from the original test, so including rightcensored observations at time T [8].
The starting point for NPI-RP for the precedence test is to apply NPI for n x future observations, based on the n x original test observations from the X population, which are assumed to be fully available, and similarly for n y future observations based on the n y observations from the Y population. Using the results presented in Sect. 2, the following NPI lower and upper reproducibility probabilities are derived. First, if H 0 is rejected in the original test, so x (k) < y (r) , then If H 0 is not rejected in the original test, so x (k) > y (r) , then We consider NPI-SC for the NPI-RP of the precedence test. As the NPI-RP inferences for the precedence test depend monotonically on the combined ordering of the original test data, so the local change to the combined ordering of the data of the two populations in the original test leads to change both the NPI lower and upper probabilities for the event of interest. First we will consider the RP for the case that H 0 is rejected in the original test, so x k < y r , then RP = P(X (k) < Y (r) ) and RP = P(X (k) < Y (r) ) . The effects of adding to one of the observations in group Y, say y j which becomes y j + =ỹ l , on RP and RP are ) for X (10) > Y (6) and X (9) < Y (6)

3
If H 0 is not rejected in the original test, so x (k) > y (r) , then RP = P(X (k) > Y (r) ) and RP = P(X (k) > Y (r) ) . The effects of adding to y j in group Y, so y j becomes ỹ l , on RP and RP are Example 5 To illustrate the NPI-SC for the NPI-RP for the precedence test, we consider a data set presented by [27] consisting of six groups of times (in minutes) to breakdown of an insulating fluid subjected to different levels of voltage. These times are presented in Table 9. These data were also used by [8]. Both samples are of size 10, and we assume that the precedence testing scenario discussed in this section is followed, so we assume that the population distributions may only differ in location parameters, with H 0 ∶ x = y tested versus H 1 ∶ x < y . We assume that r = 6 , so the test is set up to end at the observation of the sixth failure time for the Y population. We discuss both significance levels = 0.05 and = 0.1 . The missing values in Table 5 are only known to exceed 3.83. For significance level = 0.05 , the critical value is k = 10 , while for = 0.1 this is k = 9 . Therefore, the provided data will lead, in this precedence test, to rejection of H 0 at 10% level of significance but not to rejection of H 0 at 5% level of significance. For both scenarios, the NPI lower and upper reproducibility probabilities, by using only the actual outcome without any assumption on the ordering of the right-censored observations, are RP = P(X (10) > Y (6) ) = 0.3871 SC P(X (k) <Y (r) ) (y(j, )) �� if y j < x d < x d+1 and x d < x d+1 <ỹ l SC P(X (k) <Y (r) ) (y(j, )) �� if y j < x d < x d+1 and x d < x d+1 <ỹ l SC P(X (k) >Y (r) ) (y(j, )) � if y j < x d < x d+1 and x d < x d+1 <ỹ l and RP = P(X (10) > Y (6) ) = 0.8669 for = 0.05 . While for = 0.1 , RP = P(X (9) < Y (6) ) = 0.3029 and RP = P(X (9) < Y (6) ) = 0.7079 . Let us now assume that we added an increasing value of to x 2 = 0.64 , then we examine its effect on the NPI lower and upper reproducibility probabilities.
The left plot of Fig. 8 presents the NPI-SC for the NPI-RP for the event that X (10) > Y (6) , as a function of . The results clearly illustrate that NPI-SC for the NPI-RP for precedence test is a step function, so the NPI-RP is only affected if x 2 + changes its rank among the Y observations. If x 2 + > 3.83 = y 6 then x 2 + is treated as right-censored observation in the x group, and the lower and upper reproducibility probabilities are achieved by taken the minimum and the maximum NPI lower and upper probability respectively, for reproducibility over all possible orderings for the right-censored. The maximum NPI-SC for X (10) > Y (6) is achieved when x 2 + becomes very large and exceeds y 6 .
The right plot of Fig. 8 presents the NPI-SC for the lower and upper reproducibility probabilities for the event X (9) < Y (6) , as a function of . Increasing the values of such that it affects the x 2 + rank among the Y observations leads to decrease of the value of the NPI-SC. We consider only a small value of , as if x 2 + exceeds y 6 that will change the original test conclusion and also the reproducibility probability.

Concluding Remarks
This paper is a first step towards robustness theory for the NPI setting, and we looked at some examples involving inferences on future order statistics. We found that some of the concepts from classical statistics cannot immediately be applied, because we do not use estimators but predictive inferences which are limited in value between [0, 1]. So, inspired by the classical concepts we have defined new concepts which are related to NPI. We then explored their use for some inferences presented in the earlier sections of this paper. We investigated robustness of the mean and the median for the m future observations. The robustness of the inference involving the median of the m future observations is a step function, whereas for the mean is continuously changing function, but the size of the effect is close to the median or less in some cases. For future research it will be of interest to consider other robustness concepts for NPI, and also, of course, robustness of other NPI methods. Further details, examples and discussion of the tests presented in this paper are given in the PhD thesis of [1].
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.