Does the population have to be normally distributed to test this hypothesis? why?

Yes, you can, for precisely the reason you give: even if the underlying population is not normally distributed, the mean (or more precisely the difference between the means) is asymptotically normal. (There are some conditions on the underlying populations that are usually satisfied in the real world, and certainly for underlying uniform distributions.)

Let's illustrate with a simulation (R code): we consider two populations, one $U[0,10]$ and the other $U[0.5,10.5]$, and a total sample size of 1000, half from each population. Here is a sample and a t-test:

nn <- 1000

draw_1 <- function(n) runif(n,0,10)
draw_2 <- function(n) runif(n,0.5,10.5)

set.seed(1)
sample_1 <- draw_1(nn/2)
sample_2 <- draw_2(nn/2)

t.test(sample_1,sample_2)

which yields

        Welch Two Sample t-test

data:  sample_1 and sample_2
t = -3.1827, df = 996.74, p-value = 0.001504
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.9387957 -0.2226748
sample estimates:
mean of x mean of y 
 4.956549  5.537284

Now, to see that the difference in means is normal enough, we simulate drawing samples and calculating means many times:

means <- replicate(1e4,{
    sample_1 <- draw_1(nn/2)
    sample_2 <- draw_2(nn/2)
    mean(sample_2)-mean(sample_1)})

hist(means)

Does the population have to be normally distributed to test this hypothesis? why?

Of course, this difference is not really normal (for one, it's bounded between -9.5 and 10.5, whereas the normal distribution is unbounded), but it's normal "enough" for the t test to work.

  1. Last updated
  2. Save as PDF
  • Page ID773
  • Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's \(t\)-distribution. (Remember, use a Student's \(t\)-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually \(n\) is large or the sample size is large).

    If you are testing a single population mean, the distribution for the test is for means:

    \[\bar{X} - N\left(\mu_{x}, \frac{\sigma_{x}}{\sqrt{n}}\right)\]

    or

    \[t_{df}\]

    The population parameter is \(\mu\). The estimated value (point estimate) for \(\mu\) is \(\bar{x}\), the sample mean.

    If you are testing a single population proportion, the distribution for the test is for proportions or percentages:

    \[P' - N\left(p, \sqrt{\frac{p-q}{n}}\right)\]

    The population parameter is \(p\). The estimated value (point estimate) for \(p\) is \(p′\). \(p' = \frac{x}{n}\) where \(x\) is the number of successes and n is the sample size.

    Assumptions

    When you perform a hypothesis test of a single population mean \(\mu\) using a Student's \(t\)-distribution (often called a \(t\)-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a \(t\)-test will work even if the population is not approximately normally distributed).

    When you perform a hypothesis test of a single population mean \(\mu\) using a normal distribution (often called a \(z\)-test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

    When you perform a hypothesis test of a single population proportion \(p\), you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are: there are a certain number \(n\) of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success \(p\). The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities \(np\) and \(nq\) must both be greater than five \((np > 5\) and \(nq > 5)\). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with \(\mu = p\) and \(\sigma = \sqrt{\frac{pq}{n}}\). Remember that \(q = 1 – p\).

    Summary

    In order for a hypothesis test’s results to be generalized to a population, certain requirements must be satisfied.

    When testing for a single population mean:

    1. A Student's \(t\)-test should be used if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with an unknown standard deviation.
    2. The normal test will work if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with a known standard deviation.

    When testing a single population proportion use a normal test for a single population proportion if the data comes from a simple, random sample, fill the requirements for a binomial distribution, and the mean number of successes and the mean number of failures satisfy the conditions: \(np > 5\) and \(nq > 5\) where \(n\) is the sample size, \(p\) is the probability of a success, and \(q\) is the probability of a failure.

    Formula Review

    If there is no given preconceived \(\alpha\), then use \(\alpha = 0.05\).

    Types of Hypothesis Tests

    • Single population mean, known population variance (or standard deviation): Normal test.
    • Single population mean, unknown population variance (or standard deviation): Student's \(t\)-test.
    • Single population proportion: Normal test.
    • For a single population mean, we may use a normal distribution with the following mean and standard deviation. Means: \(\mu = \mu_{\bar{x}}\) and \(\\sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}}\)
    • A single population proportion, we may use a normal distribution with the following mean and standard deviation. Proportions: \(\mu = p\) and \(\sigma = \sqrt{\frac{pq}{n}}\).

    Glossary

    Binomial Distributiona discrete random variable (RV) that arises from Bernoulli trials. There are a fixed number, \(n\), of independent trials. “Independent” means that the result of any trial (for example, trial 1) does not affect the results of the following trials, and all trials are conducted under the same conditions. Under these circumstances the binomial RV Χ is defined as the number of successes in \(n\) trials. The notation is: \(X \sim B(n, p) \mu = np\) and the standard deviation is \(\sigma = \sqrt{npq}\). The probability of exactly \(x\) successes in \(n\) trials is \(P(X = x) = \binom{n}{x} p^{x}q^{n-x}\).Normal Distributiona continuous random variable (RV) with pdf \(f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{-(x-\mu)^{2}}{2\sigma^{2}}}\), where \(\mu\) is the mean of the distribution, and \(\sigma\) is the standard deviation, notation: \(X \sim N(\mu, \sigma)\). If \(\mu = 0\) and \(\sigma = 1\), the RV is called the standard normal distribution. Standard Deviationa number that is equal to the square root of the variance and measures how far data values are from their mean; notation: \(s\) for sample standard deviation and \(\sigma\) for population standard deviation.Student's t-Distributioninvestigated and reported by William S. Gossett in 1908 and published under the pseudonym Student. The major characteristics of the random variable (RV) are:
    • It is continuous and assumes any real values.
    • The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution.
    • It approaches the standard normal distribution as \(n\) gets larger.
    • There is a "family" of \(t\)-distributions: every representative of the family is completely defined by the number of degrees of freedom which is one less than the number of data items.

    Does the population have to be normally distributed to test a hypothesis?

    It must be approximately normally distributed. You are performing a hypothesis test of a single population mean using a Student's t-distribution. The data are not from a simple random sample.

    Does the population need to be normally distributed?

    No because the Central Limit Theorem states that regardless of the shape of the underlying​ population, the sampling distribution of x-bar becomes approximately normal as the sample​ size, n, increases.

    What is normal distribution in hypothesis testing?

    A hypothesis test formally tests if the population the sample represents is normally-distributed. The null hypothesis states that the population is normally distributed, against the alternative hypothesis that it is not normally-distributed.

    How a hypothesis test using a t

    Like a standard normal distribution (or z-distribution), the t-distribution has a mean of zero. The normal distribution assumes that the population standard deviation is known. The t-distribution does not make this assumption. The t-distribution is defined by the degrees of freedom.