Department of Statistics, Virginia Tech

It was claimed in the notes that everything revolves around this inequality:

\[ \alpha \geq \sum_{i=0}^{r+m-1} {n \choose i} (1-q)^i q^{n-i}. \]

Once a particular \(r\) and \(m\) are chosen (and notice that only the sum of those matter), we may

- find \(n\) for fixed \(q\): simply take the smallest value for which this inequality holds;
- find \(q\) for fixed \(n\): find the value for which equality holds.

The first step towards using this result is to code up the right-hand-side, e.g., in R.

```
bintol <- function(n, r, m, q)
{
i <- 0:(r+m-1)
sum(choose(n, i) * (1-q)^i * q^(n-i))
}
```

- Its that easy!

Now we can test it, and gain intuition at the same time.

Suppose we are interested in finding \(n\) for a choice of \(r=0\), \(m=1\), and \(q=0.85\). Lets set those values, then evaluate that function for \(n\) ranging from 1 to 30.

```
r <- 0
m <- 1
q <- 0.85
n <- 1:30
p <- rep(NA, length(n))
for(i in 1:length(n)) p[i] <- bintol(n[i], r, m, q)
```

Now that we have the values we can plot them, and add a horizontal line at an \(\alpha\) of our choosing. Say the usual \(\alpha = 0.05\).

```
plot(n, p, type="b", main="finding n")
abline(h=0.05, col="gray")
legend("topright", c("alpha=0.05"), lty=1, col="gray")
```

- By visual inspection, it looks like this plot implies \(n=19\).

Now we don’t need to make a plot like this this each time we want to find \(n\); we can ask R to do the “searching” for us.

The thinking behind the code below is that we’ll start \(n\) small, and increase it until we “cross the \(\alpha\) threshold”.

- Note that the code uses `level = 1-alpha’.

```
solve.bintol.n <- function(level, r, m, q)
{
## convert from level to alpha
alpha <- 1-level
n <- 1
while(1) {
p <- bintol(n, r, m, q)
if(p < alpha) break ## we crossed the threshold
else n <- n+1
}
return(n)
}
```

Lets try it:

`solve.bintol.n(0.95, r, m, q)`

`## [1] 19`

- Boom!

The beauty of this code is that we can use it for any \(\alpha\), \(r\), \(m\) and \(q\) combination,

- whereas the tables in the back of textbooks are often limited to \(r+m \in \{1,2\}\) and a few discrete choices of \(\alpha\) and \(q\).

Lets check the approximation from the notes:

\[ n \approx \frac{1}{4} x_{1-\alpha} \frac{1+q}{1-q} + \frac{1}{2} (r+m-1), \] where \(x_{1-\alpha}\) is the upper \(\alpha\) quantile of a \(\chi^2_{2(r+m)}\) distribution.

In R we can code that up as follows:

```
approx.bintol.n <- function(level, r, m, q)
{
x1ma <- qchisq(level, 2*(r+m))
(1/4)*x1ma*(1+q)/(1-q) + (1/2)*(r+m-1)
}
```

Lets investigate and see if that approximation is any good.

- First, duplicating our calculation above

`approx.bintol.n(0.95, r, m, q)`

`## [1] 18.47368`

- In the ballpark, but off by one if we were to round.

We could compare the truth to the estimate for a range of values of \(q\), say:

```
q <- seq(0.5,0.9,length=100)
n <- na <- rep(NA, length(q))
for(i in 1:length(q)) {
n[i] <- solve.bintol.n(0.95, r, m, q[i])
na[i] <- approx.bintol.n(0.95, r, m, q[i])
}
```

One easy way to compare is graphically.

```
plot(q, n, main=paste("r=", r, ", m=", m, sep=""))
points(q, round(na), pch=20)
legend("topleft", c("true", "approx"), pch=c(21, 20))
```

- It looks like (for these \(r\) and \(m\) values, and range of \(q\) values) we are never off by more than one.
- Based on this analysis, I judge this to be a pretty good approximation.

What about for other values of \(r\) and \(m\)?

```
for(i in 1:length(q)) {
n[i] <- solve.bintol.n(0.95, 10, 15, q[i])
na[i] <- approx.bintol.n(0.95, 10, 15, q[i])
}
plot(q, n, main=paste("r=", 10, ", m=", 15, sep=""))
points(q, round(na), pch=20)
legend("topleft", c("true", "approx"), pch=c(21, 20))
```

- These seem to be in even better agreement.

Now consider fixed \(n\) and a search for \(q\). We’ll take a page out of our CI-finding play-book from the Binomial CI and solve for \(q\) by finding the root (or zero-crossing) of

\[ \alpha - \sum_{i=0}^{r+m-1} {n \choose i} (1-q)^i q^{n-i}. \]

For example, when \(n=19\), \(r=0\), \(m=1\) and \(\alpha=0.05\) we can plot this function as a function of \(q \in [0,1]\)

```
n <- 19
q <- seq(0,1,length=1000)
pq <- rep(NA, length=length(q))
for(i in 1:length(q))
pq[i] <- 0.05 - bintol(n, r, m, q[i])
plot(q, pq, type="l", main="finding q")
abline(h=0, col="gray")
```

- The zero-crossing looks like it is at \(q=0.85\), which we know to be the correct answer from above.

But rather than finding \(q\) visually in this manner, we can utilize the `uniroot`

function in R, as follows.

```
solve.bintol.q <- function(level, n, r, m)
{
alpha <- 1-level
f <- function(q, alpha, n, r, m) {
alpha - bintol(n, r, m, q)
}
out <- uniroot(f, c(0,1), alpha=alpha, n=n, r=r, m=m)
return(out$root)
}
```

Lets try it out.

`solve.bintol.q(0.95, n, r, m)`

`## [1] 0.8541304`

- Boom!

As before, the beauty of this code is that we can use it for any \(\alpha\), \(r\), \(m\) and \(n\) combination,

- whereas the tables in the back of most textbooks are limited to \(r+m \in \{1,2\}\) and a few discrete choices of \(\alpha\) and \(n\).

Now lets check the approximation, which arises from the inversion of the approximation for \(n\) given \(q\) above.

\[ q = \frac{4n - 2(r+m+1) - x_{1-\alpha}}{4n - 2(r+m+1) + x_{1-\alpha}} \]

In R we could code that up as

```
approx.bintol.q <- function(level, n, r, m)
{
x1ma <- qchisq(level, 2*(r+m))
temp <- 4*n - 2*(r+m-1)
(temp - x1ma)/(temp + x1ma)
}
```

Lets try it.

`approx.bintol.q(0.95, n, r, m)`

`## [1] 0.8538515`

- Pretty close.

As above, lets compare the truth to the approximation over a range of \(n\) values.

```
n <- 1:300
qa <- q <- rep(NA, length=length(n))
for(i in 1:length(q)) {
q[i] <- solve.bintol.q(0.95, n[i], r, m)
qa[i] <- approx.bintol.q(0.95, n[i], r, m)
}
plot(n, q, main="finding q")
points(n, qa, pch=20)
legend("bottomright", c("truth", "approx"), pch=21:20)
```

- All good except for the very small \(n\).

Probably the most widely used two-sided tolerance limits are those where \(r=1\) and \(m=1\), i.e., \(X^{(1)}\) and \(X^{(n)}\). Electric seat adjusters are available on a popular luxury car. The manufacturer wants to know what range of vertical adjustment is necessary to be 90% certain that at least 80% of the population of potential buyers will be able to adjust their seats to the desired height. What must \(n\) be so that \(X^{(n)}\) and \(X^{(1)}\) furnish our upper and lower limits?

This corresponds to \(r=m=1\), and \(q=0.8\). Using our R functions we can calculate \(n\) as

`solve.bintol.n(0.9, 1, 1, 0.8)`

`## [1] 18`

`approx.bintol.n(0.9, 1, 1, 0.8)`

`## [1] 18.00374`

A sample of size 18 is drawn from the population of potential buyers, and the amount of adjustment from a base position is measured. The largest value of the sample is \(X^{(18)} = 7.57\)in, and the smallest value is \(X^{(1)} = 1.21\)in. Therefore there is a 90% probability that at least 80% of the population requires a vertical seat adjustment between 1.21 and 7.51 inches.

Along with each lot of steel bars, the manufacturer guarantees that at least 90% of the bars will have a breaking point above a number specified for each lot. Because of variable manufacturing conditions the guaranteed breaking point is established separately for each lot by breaking a random sample of bars from each lot and setting the guaranteed breaking point equal to the minimum breaking point in the sample.

How large should the sample be so that the manufacturer can be 95% sure that the guarantee statement is correct?

This corresponds to \(r=1\) and \(m=0\), and \(q=0.9\). Using our R functions we can calculate \(n\) as

`solve.bintol.n(0.95, 1 , 0, 0.9)`

`## [1] 29`

In each lot of a sample of size 29 is selected at random, and the smallest breaking point of these bars in the sample is stated as the guaranteed breaking point, at which at least 90% of the bars in the lot will still be intact, with probability 0.95.

A large population of drums containing radioactive waste is being stored for safe keeping. Each drum has marked on it the amount of radioactive waste contained in the drum. Periodic audits are made where randomly selected drums are scanned externally to estimate the amount of radioactive waste contained in the drum, and the estimate is compared with the label to obtain the discrepancy \(X\).

Over a period of three months 122 drums have been examined in this way, and the results were a random sample \(X_1, \dots, X_{122}\), where each \(X_i\) is the discrepancy between the amount marked on the drum and the amount estimated by the scan.

The bounds \(X^{(2)}\) and \(X^{(121)}\) are selected (\(r=m=2\)). If we then calculate

`solve.bintol.q(0.95, 122, 2, 2)`

`## [1] 0.9376753`

we can conclude that we are 95% confident that 93.8% of the drums have discrepancies between the second smallest and second largest observed discrepancies in the 122 observed drums.

In a chemical process a catalyst is being studied for its effect on the output of the process reaction. A random sample of 40 batches is created \(X_1, \dots, X_{40}\), where each \(X_i\) represents the reaction yield. All samples are generated at exactly the same temperature and pressure.

The bounds \(X_{(3)}\) and \(X_{(35)}\) are selected, and the confidence level 95% is chosen.

What is the percentage of the population \(q\), such that the probability that the random interval from \(X^{(3)}\) to \(X^{(35)}\) contains a proportion \(q\) or more of the population, is \(0.95\)?

Here we have \(r=3\) and \(m=n+1-35=6\), so we can calculate

`solve.bintol.q(0.95, 40, 3, 6)`

`## [1] 0.6680006`

so we have a probability of 0.95 that the random interval from \(X_{(3)}\) to \(X_{(35)}\) inclusive, contains 66.7% or more of the population.

Suppose we collected the following values for the reaction yield (in percentage).

```
yield <- c(73, 70, 85, 89, 66, 92, 77, 88, 77, 75, 69, 64, 77, 83, 77, 77, 72, 64, 87,
76, 62, 80, 87, 81, 60, 95, 92, 82, 65, 72, 78, 61, 68, 74, 72, 94, 82, 86, 89, 84)
```

Then we can calculate the 3rd and 35th observed order statistic as follows.

```
yield.sorted <- sort(yield)
c(yield.sorted[3], yield.sorted[35])
```

`## [1] 62 89`

Therefore, the probability is 0.95 that the random interval from 62 to 89, inclusive, contains 66.8% or more of the population.