A 95% confidence interval for a population mean is determined to be 100 to 120

Interval Estimation

Ronald N. Forthofer, ... Mike Hernandez, in Biostatistics (Second Edition), 2007

7.3.1 Confidence Interval for the Mean

In the preceding material, we saw how to construct a confidence interval for the population median. That confidence interval gave information to the dairy about the amount of vitamin D being added to the milk. As an alternative to the median, a confidence interval for the mean could have been used. To find a confidence interval for the mean, assuming that the data follow a specific distribution, we must know the sampling distribution of its estimator. We must also specify how confident we wish to be that the interval contains the population parameter. The sample mean is the estimator of the population mean, and the sampling distribution of the sample mean is easily found.

Since we are assuming the data follow a normal distribution, the sample mean — the average of the sample values — also follows a normal distribution. However, this assumption is not crucial. Even if the data are not normally distributed, the central limit theorem states that the sample mean, under appropriate conditions, will approximately follow a normal distribution.

To specify the normal distribution completely, we also have to provide the mean and variance of the sample mean. First we develop the confidence interval for the mean assuming population variance is known and extend it to the situation where population variance is unknown and it is estimated from the sample.

Known Variance: In Chapter 5, we saw that the mean of the sample mean was μ, the population mean, and its variance was σ2/n. The standard deviation of the sample mean is thus σ/n, and it is called the standard error of the sample mean (

A 95% confidence interval for a population mean is determined to be 100 to 120
). The use of the word error is confusing, since no mistake has been made. However, it is the traditional term used in this context. The term standard error is used instead of standard deviation when we are discussing the variation in a sample statistic. The term standard deviation is usually reserved for discussion of the variation in the sample data themselves. Thus, the standard deviation measures the unit-to-unit variation, while the standard error measures the sample-to-sample variation.

We now address the issue of how confident we wish to be that the interval contains the population mean (μ). From the material on the normal distribution in Chapter 5, we know that

Pr {−1.96 < Z < 1.96} = 0.95

where Z is the standard normal variable. In terms of the sample mean, this is

Pr{-1.96<x¯-μ(σ/n)<1.96 }=0.95.

But we want an interval for μ, not for Z. Therefore, we must perform some algebraic manipulations to convert this to an interval for μ. First we multiply all three terms inside the braces by σ/n. This yields

Pr{-1.96(σn)<x¯-μ< 1.96(σn)}=0.95.

We next subtract

A 95% confidence interval for a population mean is determined to be 100 to 120
from all the expressions inside the braces, and this gives

Pr{-1.96(σn )-x¯<μ<1.96(σn)-x¯}=0.95.

This interval is about −μ; to convert it to an interval about μ, we must multiply each term in the brackets by −1. Before doing this, we must be aware of the effect of multiplying an inequality by a minus number. For example, we know that 3 is less than 4. However, −3 is greater than −4, so the result of multiplying both sides of an inequality by −1 changes the direction of the inequality. Therefore, we have

Pr{1.96(σn)+x¯>μ>-1.96(σn)+x¯}=0.95.

We reorder the terms to have the smallest of the three quantities to the left — that is,

Pr{x¯-1.96(σ n)<μ<x¯+1.96(σn)} =0.95

or, more generally,

Pr{x¯-z1-α/2(σn)<μ<x¯+za-α/2(σn)}=1-α.

The (1 − α) * 100 percent confidence interval limits for the population mean can be expressed as

The result of these manipulations is an interval for μ in terms of σ, n, 1.96 (or some other z value), and

A 95% confidence interval for a population mean is determined to be 100 to 120
. The sample mean,
A 95% confidence interval for a population mean is determined to be 100 to 120
, is the only one of these quantities that varies from sample to sample. However, once we draw a sample, the interval is fixed as the sample mean's value,
A 95% confidence interval for a population mean is determined to be 100 to 120
, is known. Since the interval will either contain or not contain μ, we no longer talk about the probability of the interval containing μ.

Although we do not talk about the probability of an interval containing μ, we do know that in repeated sampling, intervals of the preceding form will contain the parameter, μ, 95 percent of the time. Thus, instead of discussing the probability of an interval containing μ, we say that we are 95 percent confident that the interval from [x¯-1.96(σ/n)]to[ x¯-1.96(σ/n)] will contain μ. Intervals of this type are therefore called confidence intervals. This reason for the use of the word confidence is the same as that discussed in the preceding distribution-free material. The limits of the confidence interval usually have the form of the sample estimate plus or minus some distribution percentile — in this case, the normal distribution — times the standard error of the sample estimate.

Example 7.1

The 95 percent confidence interval for the mean systolic blood pressure for 200 patients can be found based on the dig200 data set introduced in Chapter 3. We assume that the standard deviation for this patient population is 20 mmHg. As the sample mean,

A 95% confidence interval for a population mean is determined to be 100 to 120

, based on a sample size of 199 (one missing value) observations, was found to be 125.8 mmHg, the 95 percent confidence interval for the population mean ranges from [125.8-1.96(20/199)]to[ 125.8-1.96(20/199)] — that is, from 123.0 to 128.6 mmHg.

Table 7.4 illustrates the concept of confidence intervals. It shows the results of drawing 50 samples of size 60 from a normal distribution with a mean of 94 and a standard deviation of 11. These values are close to the mean and standard deviation of the systolic blood pressure variable for 5-year-old boys in the United States as reported by the NHLBI Task Force on Blood Pressure Control in Children (1987).

Table 7.4. Simulation of 95% confidence intervals for 50 samples of n = 60 from the normal distribution with μ = 94 and σ = 11 (standard error = 1.42).

SampleMeanStd95% CISampleMeanStd95% CI
1 94.75 10.25 (91.96, 97.54) 26 94.61 11.49 (91.82, 97.39)
2 94.85 10.86 (92.06, 97.63) 27 92.79 9.36 (90.00, 95.58)
3 94.71 10.09 (91.92, 97.50) 28 96.00 12.19 (93.22, 98.79)
4 94.03 12.27 (91.24, 96.82) 29 95.99 11.36 (93.20, 98.78)
5 93.77 10.05 (90.98, 96.56) 30 93.98 11.74 (91.19, 96.76)
6 92.54 9.32 (89.76, 95.33) 31 95.36 13.08 (92.57, 98.15)
7 93.40 12.07 (90.62, 96.19) 32 91.10 8.69 (88.31, 93.89)*
8 93.97 11.02 (91.18, 96.75) 33 93.85 12.94 (91.06, 96.63)
9 96.33 9.26 (93.54, 99.12) 34 96.01 9.63 (93.22, 98.79)
10 93.56 12.01 (90.78, 96.35) 35 95.20 8.94 (92.41, 97.99)
11 94.94 10.81 (92.15, 97.73) 36 95.64 9.41 (92.85, 98.43)
12 94.66 12.08 (91.88, 97.45) 37 94.74 10.31 (91.95, 97.53)
13 94.21 11.02 (91.42, 97.00) 38 93.52 10.30 (90.73, 96.31)
14 94.55 9.98 (91.76, 97.34) 39 92.92 10.27 (90.13, 95.71)
15 93.57 11.50 (90.79, 96.36) 40 95.08 10.07 (92.30, 97.87)
16 95.99 12.01 (93.20, 98.78) 41 93.88 10.53 (91.09, 96.66)
17 93.86 12.53 (91.08, 96.65) 42 95.38 9.98 (92.59, 98.17)
18 92.02 13.58 (89.23, 94.81) 43 94.38 11.65 (91.59, 97.17)
19 95.16 12.03 (92.38, 97.95) 44 91.55 10.63 (88.76, 94.33)
20 94.99 12.00 (92.20, 97.78) 45 95.41 12.79 (92.62, 98.20)
21 94.65 11.18 (91.86, 97.43) 46 92.40 10.57 (89.62, 95.19)
22 92.86 12.52 (90.07, 95.64) 47 96.00 11.45 (93.21, 98.78)
23 93.99 11.76 (91.20, 96.78) 48 95.39 10.56 (92.60, 98.18)
24 91.44 10.75 (88.65, 94.22) 49 97.69 10.89 (94.90, 100.47)*
25 96.07 11.89 (93.28, 98.86) 50 95.01 10.61 (92.22, 97.79)

*Does not contain 94

In this demonstration, 4 percent (2 out of 50 marked in the table) of the intervals did not contain the population mean, and 96 percent did. If we draw many more samples, the proportion of the intervals containing the mean will be 95 percent. This is the basis for the statement that we are 95 percent confident that the confidence interval, based on our single sample, will contain the population mean.

If we use a different value for the standard normal variable, the level of confidence changes accordingly. For example, if we had started with a value of 1.645, z0.95, instead of 1.96, z0.975, the confidence level would be 90 percent instead of 95 percent. The z0.95 value is used with the 90 percent level because we want 5 percent of the values to be in each tail. The lower and upper limits for the 90 percent confidence interval for the population mean for the data in the first sample of 60 observations are 92.41 [= 94.75 − 1.645(1.42)] and 97.09 [= 94.75 + 1.645(1.42)], respectively. This interval is narrower than the corresponding 95 percent confidence interval of 91.96 to 97.54. This makes sense, since, if we wish to be more confident that the interval contains the population mean, the interval will have to be wider. The 99 percent confidence interval uses z0.995, which is 2.576, and the corresponding interval is 91.09 [= 94.75 − 2.576(1.42)] to 98.41 [= 94.75 + 2.576(1.42)].

The fifty samples shown in Table 7.4 had sample means, based on 60 observations, ranging from a low of 91.1 to a high of 97.7. This is the amount of variation in sample means expected if the data came from the same normal population with a mean of 94 and a standard deviation of 11. The Second National Task Force on Blood Pressure Control in Children (1987) had study means ranging from 85.6 (based on 181 values) to 103.5 mmHg (based on 61 values), far outside the range just shown. These extreme values suggest that these data do not come from the same population, and this then calls into question the Task Force's combination of the data from these diverse studies.

The size of the confidence interval is also affected by the sample size that appears in the σ/n term. Since n is in the denominator, increasing n decreases the size of the confidence interval. For example, if we doubled the sample size from 60 to 120 in the preceding example, the standard error of the mean changes from 1.42(=1160)to1.004(=11/120). . Doubling the sample size reduces the confidence interval to about 71 percent (=1/2) of its former width. Thus, we know more about the location of the population mean, since the confidence interval is shorter as the sample size increases.

The size of the confidence interval is also a function of the value of σ, but to change σ means that we are considering a different population. However, if we are willing to consider homogeneous subgroups of the population, the value of the standard deviation for a subgroup should be less than that for the entire population. For example, instead of considering the blood pressure of 5-year-old boys, we consider the blood pressure of 5-year-old boys grouped according to height intervals. The standard deviation of systolic blood pressure in the different height subgroups should be much less than the overall standard deviation.

Another factor affecting the size of the confidence interval is whether it is a one-sided or a two-sided interval. If we are only concerned about higher blood pressure values, we could use an upper one-sided confidence interval. The lower limit would be zero, or −∞ for a variable that had positive and negative values, and the upper limit is

This is similar to the two-sided upper limit except for the use of z1−α instead of z1−α/2.

Unknown Variance: When the population variance, σ2, is unknown, it is reasonable to substitute its sample estimator, s2, in the confidence interval calculation. There is a problem in doing this, though. Although (x¯-μ)(σ/n) follows the standard normal distribution, (x¯-μ)(s/n) does not. In the first expression, there is only one random variable,

A 95% confidence interval for a population mean is determined to be 100 to 120
, whereas the second expression involves the ratio of two random variables,
A 95% confidence interval for a population mean is determined to be 100 to 120
and s. We need to know the probability distribution for this ratio of random variables.

Fortunately, Gosset, who we encountered in Chapter 5, already discovered the distribution of (x¯-μ)( s/n). The distribution is called Student's t — crediting Student, the pseudonym used by Gosset — or, more simply, the t distribution. For large values of n, sample values of s are very close to σ, and, hence, the t distribution looks very much like the standard normal. However, for small values of n, the sample values of s vary considerably, and the t and standard normal distributions have different appearances. Thus, the t distribution has one parameter, the number of independent observations used in the calculation of s. In Chapter 3, we saw that this value was n − 1, and we called this value the degrees of freedom. Hence, the parameter of the t distribution is the degrees of freedom associated with the calculation of the standard error. The degrees of freedom are shown as a subscript — that is, as tdf. For example, a t with 5 degrees of freedom is written as t5.

Figure 7.1 shows the distributions of t1 and t5 compared with the standard normal distribution over the range of −3.8 to 3.8. As we can see from these plots, the t distribution with one degree of freedom, the lowest curve, is considerably flatter — that is, there is more variability than for the standard normal distribution, the top curve in the figure. This is to be expected, since the sample mean divided by the sample standard deviation is more variable than the sample mean alone. As the degrees of freedom increase, the t distributions become closer and closer to the standard normal in appearance. The tendency for the t to approach the standard normal distribution as the number of degrees of freedom increases can also be seen in Table 7.5, which shows selected percentiles for several t distributions and the standard normal distribution. A more complete t table is found in Appendix Table B5.

A 95% confidence interval for a population mean is determined to be 100 to 120

Figure 7.1. Distributions of t1 and t5 compared with z distribution.

Table 7.5. Selected percentiles for several t distributions and the standard normal distribution.

Percentiles
Distribution0.800.900.950.99
t1 1.376 3.078 6.314 31.821
t5 0.920 1.476 2.015 3.365
t10 0.879 1.372 1.813 2.764
t30 0.854 1.310 1.697 2.457
t60 0.848 1.296 1.671 2.390
t120 0.845 1.289 1.658 2.358
Standard normal 0.842 1.282 1.645 2.326

Now that we know the distribution of (x¯-μ)(s/n), we can form confidence intervals for the mean even when the population variance is unknown. The form for the confidence interval is similar to that preceding for the mean with known variance except that s replaces σ and the t distribution is used instead of the standard normal distribution. Therefore, the lower and upper limits for the (1 − α) * 100 percent confidence interval for the mean when the variance is unknown are {x¯-tn-1,1-α/2( s/n)}and{x¯+tn-1 ,1-α/2(s/n)}, respectively.

Let us calculate the 90 percent confidence interval for the population mean of the systolic blood pressure for 5-year-old boys based on the first sample data in Table 7.4 (row 1). A 90 percent [= (1 − α) * 100 percent] confidence interval means that α is 0.10. Based on a sample of 60 observations, the sample mean was 94.75 and the sample standard deviation was 10.25 mmHg. Thus, we need the 95th (= 1 − α/2) percentile of a t distribution with 59 degrees of freedom. However, neither Table 7.5 nor Table B5 shows the percentiles for a t distribution with 59 degrees of freedom. Based on the small changes in the t distribution for larger degrees of freedom, there should be little error if we use the 95th percentile for a t60 distribution. Therefore, the lower and upper limits are approximately

94.75-1.671(10.25 60)and94.75+1.671(10.2560 )

or 92.54 and 96.96 mmHg, respectively.

If we use a computer package (see Program Note 7.1 on the website) to find the 95th percentile value for a t59 distribution, we find its value is 1.6711. Hence, there is little error introduced in this example by using the percentiles from a t60 instead of a t59 distribution.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123694928500121

Estimation

Gary Smith, in Essential Statistics, Regression, and Econometrics, 2012

6.48 The text reports a 95 percent confidence interval for the looseness coefficient for high-stakes Internet poker players to be 25.53 ± 1.18. Explain why you either agree or disagree with these interpretations of that number:

a.

95 percent of the players in this population have looseness coefficients in this interval.

b.

If a poker player is randomly selected from this population, there is a 0.95 probability that his or her looseness coefficient is in this interval.

c.

95 percent of the confidence intervals estimated in this way include the average looseness coefficient of the players in the population.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123822215000064

PARAMETER ESTIMATION

Sheldon M. Ross, in Introduction to Probability and Statistics for Engineers and Scientists (Fourth Edition), 2009

SOLUTION

We use Program 7.3.1 to obtain the solution (see Figure 7.3).

A 95% confidence interval for a population mean is determined to be 100 to 120

FIGURE 7.3. (a) Two-sided and (b) lower 95 percent confidence intervals for EXAMPLE 7.3f.

Our derivations of the 100(1 – α) percent confidence intervals for the population mean μ have assumed that the population distribution is normal. However, even when this is not the case, if the sample size is reasonably large then the intervals obtained will still be approximate 100(1 – α) percent confidence intervals for μ. This is true because, by the central limit theorem, n(X¯-μ)/σ will have approximately a normal distribution, and n(X ¯-μ)/S will have approximately a t -distribution.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123704832000126

Estimation

Sheldon M. Ross, in Introductory Statistics (Third Edition), 2010

Example 8.12

Find t8,0.05.

Solution

The value of t8,0.05 can be obtained from Table D.2. The following is taken from that table.

Values of tn,α

nα = 0.10α = 0.05α = 0.025
6 1.440 1.943 2.447
7 1.415 1.895 2.365
→ 8 1.397 1.860 2.306
9 1.383 1.833 2.262

Reading down the α = 0.05 column for the row n = 8 shows that t8,0,05 = 1.860.

By the symmetry of the t distribution about zero, it follows (see Fig. 8.10) that

A 95% confidence interval for a population mean is determined to be 100 to 120

FIGURE 8.10. P{|Tn|≤tn ,α/2}=P{−tn,α/2≤τn≤tn,α/2}=1−α.

P{Tn>tn,α}=α

Hence, upon using the result that n(X¯−μ)/S has a t distribution with n – 1 degrees of freedom, we see that

P{n|X¯−μ|S≤tn−1,α/2}=1−α

In exactly the same manner as we did when σ was known, we can show that the preceding equation is equivalent to

P{X¯−tn−1,α/2Sn≤μ≤X¯+tn−1,α/ 2Sn}=1−α

Therefore, we showed the following.

A 100(1 – σ) percent confidence interval estimator for the population mean μ is given by the interval

X¯±tn−1,α/2Sn

Program 8-3 will compute the desired confidence interval estimate for a given data set.

Example 8.13

The Environmental Protection Agency (EPA) is concerned about the amounts of PCB, a toxic chemical, in the milk of nursing mothers. In a sample of 20 women, the amounts (in parts per million) of PCB were as follows:

16,0,0,2,3,6,8,2,5,0,12,10,5,7,2,3,8,17,9,1

Use these data to obtain a

(a)

95 percent confidence interval

(b)

99 percent confidence interval

of the average amount of PCB in the milk of nursing mothers.

Solution

A simple calculation yields that the sample mean and sample standard deviation are

X¯=5.8 S=5.085

Since 100(1 – α) equals 0.95 when α = 0.05 and equals 0.99 when α = 0.01, we need the values of t19,0.025 and t19,0.005. From Table D.2 we see that

t19,0.025=2.093t19,0.005=2.861

Hence, the 95 percent confidence interval estimate of μ is

5.8±2.0935.085 20=5.8±2.38

and the 99 percent confidence interval estimate of μ is

5.8±2.8615.08520=5.8±3.25

That is, we can be 95 percent confident that the average amount of PCB in the milk of nursing mothers is between 3.42 and 8.18 parts per million; and we can be 99 percent confident that it is between 2.55 and 9.05 parts per million.

This could also have been solved by running Program 8-3, which yields the following.

A 95% confidence interval for a population mean is determined to be 100 to 120

A 95% confidence interval for a population mean is determined to be 100 to 120

A 95% confidence interval for a population mean is determined to be 100 to 120

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123743886000089

Parameter Estimation

Sheldon M. Ross, in Introduction to Probability and Statistics for Engineers and Scientists (Fifth Edition), 2014

Remark

The interpretation of “a 100(1 – α) percent confidence interval” can be confusing. It should be noted that we are not asserting that the probability that μ∈(x¯−1.96σ/n ,x¯+1.96σ/n) is .95, for there are no random variables involved in this assertion. What we are asserting is that the technique utilized to obtain this interval is such that 95 percent of the time that it is employed it will result in an interval in which μ lies. In other words, before the data are observed we can assert that with probability .95 the interval that will be obtained will contain μ, whereas after the data are obtained we can only assert that the resultant interval indeed contains μ “with confidence .95.”

Example 7.3d

From past experience it is known that the weights of salmon grown at a commercial hatchery are normal with a mean that varies from season to season but with a standard deviation that remains fixed at 0.3 pounds. If we want to be 95 percent certain that our estimate of the present season’s mean weight of a salmon is correct to within ±0.1 pounds, how large a sample is needed?

Solution

A 95 percent confidence interval estimate for the unknown mean μ, based on a sample of size n, is

μ∈(x¯−1.96σn, x¯+1.96σn)

Because the estimate is within 1.96(σ/n)=.588/n of any point in the interval, it follows that we can be 95 percent certain that is within 0.1 of μ provided that

.588n≤0.1

That is, provided that

n≥5.88

or

n≥34.57

That is, a sample size of 35 or larger will suffice.■

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123948113500071

Hypothesis Testing

Gary Smith, in Essential Statistics, Regression, and Econometrics (Second Edition), 2015

Confidence Intervals

Confidence intervals can be used for a statistical test when the alternative hypothesis is two sided. Specifically, if the two-sided P value is less than 0.05, then a 95 percent confidence interval does not include the value of the population mean specified by the null hypothesis. (If the two-sided P value is less than 0.01, a 99 percent confidence interval does not include the null hypothesis.)

Consider our poker example. A test of the null hypothesis μ = 35 has a P value less than 0.05 if the sample mean is more than (approximately) two standard errors from 35. A 95 percent confidence interval for μ includes all values that are within two standard errors of the sample mean. Thus, if the sample mean is more than two standard errors from 35, the two-sided P value is less than 0.05 and a 95 percent confidence interval does not include 35. Therefore, a hypothesis test can be conducted by seeing whether a confidence interval includes the null hypothesis.

The nice thing about a confidence interval is that it can give us a sense of the practical importance of the difference between the sample mean and the null hypothesis. If we just report that the P value is 6.46 × 10−37 or that we “found a statistically significant difference at the 5 percent level,” readers do not know the actual value of the estimator.

For our poker example, a 95 percent confidence interval was calculated in Chapter 6:

x¯±t∗sn=25.53±1.97(8.53203)=25.53±1.18

We see that 35 is not inside this interval and that, as a practical matter, 25.53 ± 1.18 is far from 35.

The fact that confidence intervals can be used for hypothesis tests illustrates why not rejecting a null hypothesis is a relatively weak conclusion. When the data do not reject a null hypothesis, we have not proven that the null hypothesis is true. Every value inside a 95 percent confidence interval is consistent with the data. Therefore, we say “the data do not reject the null hypothesis,” instead of “the data prove that the null hypothesis is true.”

Two economists studied the effects of inflation on election outcomes [2]. They estimated that the inflation issue increased the Republican vote in one election by 7 percentage points, plus or minus 10 percentage points. Because 0 is inside this interval, they concluded that “in fact, and contrary to widely held views, inflation has no impact on voting behavior.” That is not at all what their data show. The fact that they cannot rule out 0 does not prove that 0 is the correct value. Their 95 percent confidence interval does include 0, but it also includes everything from −3 percent to +17 percent. Their best estimate is 7 percent, plus or minus 10 percent, and 7 percent is more than enough to swing most elections one way or another.

Here is another poker example. Our second research question is whether experienced poker players tend to play differently after big losses than they do after big wins. To answer this research question, we can compare each player's looseness coefficient in the 12 hands following big losses with his or her looseness coefficient in the 12 hands following big wins. This difference measures whether this person plays less cautiously after a big loss than after a big win. The natural null hypothesis is that the population mean is 0, H0: μ = 0. For our sample of 203 players, the mean is 2.0996 and the standard deviation is 5.5000. Using Eqn (7.3), the t value is 5.439:

t=x¯−μs/n=2.0996−05.5000/203=5.439

With 203 − 1 = 202 degrees of freedom, the two-sided P value is 0.0000002.

Using Eqn (6.6), a 95 percent confidence interval for the population mean is:

x¯±t∗sn=2.0996±1.9716(5.5000203)=2.10±0.76

Poker players tend to play looser (less cautiously) after large losses, evidently attempting to recoup their losses.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128034590000078

REGRESSION

Sheldon M. Ross, in Introduction to Probability and Statistics for Engineers and Scientists (Fourth Edition), 2009

SOLUTION

To solve this, we first run Program 9.10, which gives the results shown in Figures 9.18, 9.19, and 9.20.

A 95% confidence interval for a population mean is determined to be 100 to 120

FIGURE 9.18.

A 95% confidence interval for a population mean is determined to be 100 to 120

FIGURE 9.19.

A 95% confidence interval for a population mean is determined to be 100 to 120

FIGURE 9.20.

Hence, a point estimate of the expected hardness of sheets containing .15 percent copper at an annealing temperature of 1,150 is 69.862. In addition, since t 025,7 = 2.365, a 95 percent confidence interval for this value is

69.862±4.083

When it is only a single experiment that is going to be performed at the input levels x 1,…, xk, we are usually more concerned with predicting the actual response than its mean value. That is, we are interested in utilizing our data set Y1,…, Yn to predict

Y(x)= ∑i=0kβixi+e,wherex0=1

A point prediction is given by ∑i=0kBix i where Bi is the least squares estimator of βi based on the set of prior responses Y 1,…, Yn, i = 1,…, k .

To determine a prediction interval for Y(x), note first that since B o,…, Bk are based on prior responses, it follows that they are independent of Y (x). Hence, it follows that Y(x)-∑i=0kB ixi is normal with mean o and variance given by

Var[Y(x)-∑i=0kBixi] =Var[Y(x)]+Var(∑ i=0kBixi)byindependence =σ2+σ2x′(X′X)-1xfromEquation9.10.10

and so

Y(x)-∑i=0kBixiσ1+x′(X′X)-1x∼N(0,1)

which yields, upon replacing σ by its estimator, that

Y(x)-∑i=0kBixiSSR(n-k-1)1+x′(X′X)-1x∼tn-k-1

We thus have:

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012370483200014X

Random Sequences and Series

Scott L. Miller, Donald Childers, in Probability and Random Processes, 2004

7.5 Confidence Intervals

Consider once again the problem of estimating the mean of a distribution from n IID observations. When the sample meanμˆ is formed, what have we actually learned? Loosely speaking, we might say that our best guess of the true mean is {μˆ=μ}. However, in most cases, we know that the event μˆ occurs with zero probability (since if μˆ is a continuous random variable, the probability of it taking on any point value is zero). Alternatively, it could be said that (hopefully) the true mean is “close” to the sample mean. While this is a vague statement, with the help of the central limit theorem, we can make the statement mathematically precise.

If a sufficient number of samples is taken, the sample mean can be well approximated by a Gaussian random variable with a mean of E[μˆ]=μ and (μˆ)=σx2/n.. Using the Gaussian distribution, the probability of the sample mean being within some amount ε of the true mean can be easily calculated:

(7.38)Pr(|μˆ-μX|<ɛ)=Pr (μx-ɛ<μˆ<μx+ɛ)=1-2Q(ɛn/σx).

Stated another way, let εα be the value of ε such that the right-hand side of the preceding equation is 1 –α; that is,

(7.39) ɛα=σxnQ-1(α2).

where Q−1 is the inverse of Q.

Then, given n samples that lead to a sample mean μˆ, the true mean will fall in the interval (μˆ -ɛα,μˆ+ɛα) with probability 1 – α. The interval (μˆ-ɛα,μˆ+ɛα) is referred to as the cronfidence interval while the probability 1 – α is the cronfidence level or, alternatively, αis the level of significance. The confidence level and level of significance are usually expressed as percentages. The corresponding values of the quantity cα = Q−1(α /2) are provided in Table 7.1 for several typical values of α. Other values not included in the table can be found from tables of the Q-function (such as those provided in Appendix E).

Table 7-1. Constants Used to Calculate Confidence Intervals

Confidence level (1 –α) * 100%Level of significance α * 100%cα=Q-1(α2)
90% 10% 1.64
95% 5% 1.96
99% 1% 2.58
99.9% 0.1% 3.29
99.99% 0.01% 3.89

EXAMPLE 7.9: Suppose the IID random variables each have a variance ofσx2=4. A sample of n = 100 values is taken and the sample mean is found to beμˆ=10.2 Determine the 95 percent confidence interval for the true mean μx. In this case, σx/n=0.2 and the appropriate value of cα is c0.05 = 1.96 from Table 7.1. The 95 percent confidence interval is then

(μˆ-σXn c0.05,μˆ+σXnc0.05)=(9.808,10.592).

EXAMPLE 7.10: Looking again at Example 7.9, suppose we want to be 99 percent confident that the true mean falls within ±0.5 of the sample mean. How many samples need to be taken in forming the sample mean? To ensure this level of confidence, it is required that

σXnc0.01=0.5

and hence,

n= (C0.01σX0.5)2=(2.58*20.5)2=106.5.

Since n must be an integer, it is concluded that at least 107 samples must be taken.

In summary, to achieve a level of significance specified by α, we note that by virtue of the central limit theorem, the sum

(7.40)Zˆn=μˆ-μXσX/n

approximately follows a standard normal distribution. We can then easily specify a symmetric interval about zero in which a standard normal random variable will fall with probability 1 – α. As long as n is sufficiently large, the original distribution of the IID random variables does not matter.

Note that in order to form the confidence interval as specified, the standard deviation of the Xi must be known. While in some cases this may be a reasonable assumption, in many applications the standard deviation is also unknown. The most obvious thing to do in that case would be to replace the true standard deviation in Equation 7.40 with the sample standard deviation. That is, we form a statistic

(7.41)Tˆn=μˆ-μxsˆ/n

and then seek a symmetric interval about zero (-tαt,α) such that the probability thatTˆn falls in that interval is 1 –α. For very large n, the sample standard deviation will converge to the true standard deviation and thus Tˆn will approachZˆn. Hence, in the limit as n→ ∞,Tˆn can be treated as having a standard normal distribution and the confidence interval is found in the same manner we've described. That is, as n→ ∞,tα→cα. For values of n that are not very large, the actual distribution of the statisticTˆn must be calculated in order to form the appropriate confidence interval.

Naturally, the distribution of Tˆn will depend on the distribution of the Xi. One case where this distribution has been calculated for finite n is when the Xi are Gaussian random variables. In this case, the statisticTˆn follows the so-called Student's t-distribution5 with n – 1 degrees of freedom:

(7.42)fTˆn (t)=(1+t2/n)-(n +1)/2Γ((n+1)/2)nπ Γ(n/2).

where Γ is the gamma function (See Chapter 3 (3.22) and Appendix E (E.39)).

From this PDF one can easily find the appropriate confidence interval for a given level of significance, α, and sample size, n. Tables of the appropriate confidence interval, tα, can be found in any text on statistics. It is common to use the t-distribution to form confidence intervals even if the samples are not Gaussian distributed. Hence, the t-distribution is very commonly used for statistical calculations.

Many other statistics associated with related parameter estimation problems are encountered and have been carefully expounded in the statistics literature. Indeed, many freshman- and sophomore-level statistics courses simply list all the cases and the corresponding statistical distributions without explaining the underlying probability theory. Left to memorize seemingly endless distributions and statistical tests, many students have been frightened away from the study of statistics before they ever have a chance to appreciate it. Rather than take that approach, we believe that with the probability theory developed to this point, the motivated student can now easily understand the motivation and justification for the variety of statistical tests that appear in the literature.

A 95% confidence interval for a population mean is determined to be 100 to 120

EXAMPLE 7.11: Suppose we wish to estimate the failure probability of some system. We might design a simulator for our system and count the number of times the system fails during a long sequence of operations of the system. Examples might include bit errors in a communications system, defective products in an assembly line, or the like. The failure probability can then be estimated as discussed at the end of Section 7.3. Suppose the true failure probability is p (which of course is unknown to us). We simulate operation of the system n times and count the number of errors observed, Ne. The estimate of the true failure probability is then just the relative frequency,

Pˆ=Nen.

If errors occur independently, then the number of errors we observe in n trials is a binomial random variable with parameters n and p. That is,

PNe(k)=(nk)pk(1-p)n-k,k=0,1,2,…,n.

From this we infer that the mean and variance of the estimated failure probability is E[pˆ]=p and var(pˆ)=n-1p(1-p). From this we can develop confidence intervals for our failure probability estimates. The MATLAB code that follows creates estimates as just described and plots the results, along with error bars indicating the confidence intervals associated with each estimate. The plot resulting from running this code is shown in Figure 7.5.

A 95% confidence interval for a population mean is determined to be 100 to 120

Figure 7.5. Estimates of failure probabilities along with confidence intervals. The solid line is the true probability while the circles represent the estimates

A 95% confidence interval for a population mean is determined to be 100 to 120

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780121726515500075

The Art of Regression Analysis

Gary Smith, in Essential Statistics, Regression, and Econometrics (Second Edition), 2015

Significant Is Not Necessarily Substantial

It is easy to confuse statistical significance with practical importance. The estimated linear relationship between two variables is statistically significant at the 5 percent level if the estimated slope is more than (approximately) two standard deviations from 0. Equivalently, a 95 percent confidence interval does not include 0.

This does not mean that the estimated slope is necessarily large enough to be of practical importance. Suppose that we use 100 observations to estimate the relationship between consumption C and household wealth W, each in billions of dollars:

cˆ=15.20+0.000022w(2.43)(0.000010)

The standard errors are in parentheses. The t value for testing the null hypothesis that the slope is 0 is 2.2:

t=b−0standarderrorofb=0.0000220.000010= 2.2

which gives a two-sided P value less than 0.05.

With 100 − 2 = 98 degrees of freedom, t∗ = 1.9846 and a 95 percent confidence interval excludes 0:

b±t∗SE[b]=0.000022±1.9846(0.000010)=0.000022±0.000020

There is a statistically significant relationship between wealth and spending.

Is this relationship substantial? The estimated slope is 0.000022, which means that a $1 billion increase in wealth is predicted to increase spending by $0.000022 billion, which is $22,000. Is this of any practical importance? Probably not. Practical importance is admittedly subjective. The point is that it is not the same as statistical significance.

It can also turn out that the estimated slope is large but not statistically significant, as in this example:

cˆ=15.20+0.22w(2.43)(0.12)

Now the estimated slope is 0.22, which means that a $1 billion increase in wealth is predicted to increase spending by $0.22 billion, which is a substantial amount.

However, the t value is less than 2:

t=b−0standarderrorofb=0.220.12 =1.8

and a 95 percent confidence interval includes 0:

b±t∗SE[b]=0.22±1.9846 (0.12)=0.22±0.24

In this second example, the estimated effect of an increase in wealth on spending is large but not statistically significant at the 5 percent level. That is the point. Statistical significance and practical importance are separate questions that need to be answered separately.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128034590000091

HYPOTHESIS TESTING

Sheldon M. Ross, in Introduction to Probability and Statistics for Engineers and Scientists (Fourth Edition), 2009

REMARKS

(a)

There is a direct analogy between confidence interval estimation and hypothesis testing. For instance, for a normal population having mean μ and known variance σ2, we have shown in Section 7.3 that a 100(1 – α) percent confidence interval for μ is given by

μ∈(x¯-zα/2σn,x¯+zα/2σn)

where is the observed sample mean. More formally, the preceding confidence interval statement is equivalent to

P{μ∈(X¯-zα/2σ n,X¯+zα/2σn )}=1-α

Hence, if μ = μ 0, then the probability that μ 0 will fall in the interval

(X¯-zα/2σn,X¯+zα/2σn)

is 1 –α, implying that a significance level α test of H0: μ = μ0 versus H1: μ ≠ μ 0 is to reject H 0 when

μ0∉( X¯-zα/2σn,X ¯+zα/2σn)

Similarly, since a 100(1 –α) percent one-sided confidence interval for μ is given by

μ∈(X¯-zασn,∞)

it follows that an α-level significance test of H0: μ ≤ μ 0 versus H 1: μ >μ0 is to reject H 0 when μ0∉(X¯-zασ/n,∞) –that is, when μ0<X¯-zασ/n.(b)

A Remark on Robustness A test that performs well even when the underlying assumptions on which it is based are violated is said to be robust. For instance, the tests of Sections 8.3.1 and 8.3.1.1 were derived under the assumption that the underlying population distribution is normal with known variance σ2. However, in deriving these tests, this assumption was used only to conclude that also has a normal distribution. But, by the central limit theorem, it follows that for a reasonably large sample size, will approximately have a normal distribution no matter what the underlying distribution. Thus we can conclude that these tests will be relatively robust for any population distribution with variance σ2 .

Table 8.1 summarizes the tests of this subsection.

TABLE 8.1. X1,…, Xn Is a Sample from a N (μ, σ2) Population σ2 Is Known,X¯=∑n=in xi/n

H oH 1Test Statistic TSLevel α Testp -Value if TS = t
μ = μ0 μ ≠ μ0 n(X¯-μ0)/σ Reject if | TS\ &gt; 2α/2 2P{Z ≥|t|}
μ ≤ μ 0 μ &gt;μ 0 n(X¯-μ0)/σ Reject if TS &gt; zα P {Z ≥ t}
μ ≥ μ0 μ &lt; μ0 n(X¯-μ0)/σ Reject if TS &lt; −zα P {Z ≤ t}

Z is a standard normal random variable .

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123704832000138

What is the 95% confidence interval for the population mean?

A 95% confidence interval is a range of values above and below the point estimate within which the true value in the population is likely to lie with 95% confidence. The other 5% is the possibility that the true value is not within the confidence interval.

What is for a 95% confidence interval of the population mean quizlet?

A range of possible values for the population mean that is centered about the sample mean. What does a 95% confidence interval indicate? That you are 95% confident that the population mean falls within the confidence interval.

What does a confidence interval of 100 mean?

A 100% confidence level means there is no doubt at all that if you repeated the survey you would get the same results.

When the confidence level is 95% is equal to?

For a two-tailed 95% confidence interval, the alpha value is 0.025, and the corresponding critical value is 1.96. This means that to calculate the upper and lower bounds of the confidence interval, we can take the mean ±1.96 standard deviations from the mean.