Sampling error is a statistical error resulting from estimating a parameter, e.g., the mean of a variable of interest, in a sample rather than the population. Different samples will produce different estimates of the unknown parameter; the difference between an estimate and the true value is referred to as sampling error. This terminology is used with regard to parameter estimation in the usual (frequentist) paradigm which assumes the existence of a true, underlying parameter value. Show
DescriptionQuantificationIn general, the true parameter value, and thus the magnitude of sampling error, is unknown. However, if the observed data represents a random sample from the population, the sampling error associated with the estimate of a particular value (e.g., mean, proportion, difference between means) can be predicted from the relevant theoretical sampling distribution, using the observed sample standard deviation and sample size. Theoretical sampling... This is a preview of subscription content, access via your institution. Buying optionsChapter EUR 29.95 Price includes VAT (Korea(Rep.))
eBookEUR 6,419.99Price includes VAT (Korea(Rep.))
Hardcover BookEUR 7,499.99Price excludes VAT (Korea(Rep.))
Learn about institutional subscriptions Biemer, P. P., & Lyberg, L. E. (2003). Introduction to survey quality. Hoboken, NJ: Wiley. Google Scholar Cochran, W. G. (1977). Sampling techniques (3rd ed). NY: John Wiley & Sons. Google Scholar Tille, Y. (2006). Sampling algorithms. New York: Springer Science + Business Media. Google Scholar Wackerly, D. D., Mendenhall, W. I. I. I., & Scheaffer, R. L. (2008). Mathematical statistics with applications (7th ed.). Belmont, CA: Thomson Learning. Google Scholar Download references Author informationAuthors and Affiliations
Authors
Corresponding authorCorrespondence to Elizabeth Holliday . Editor informationEditors and Affiliations
Rights and permissionsReprints and Permissions Copyright information© 2014 Springer Science+Business Media Dordrecht About this entryCite this entryHolliday, E. (2014). Sampling Error. In: Michalos, A.C. (eds) Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0753-5_2554 Sampling error is the difference between a sample statistic and the population parameter it estimates. It is a crucial consideration in inferential statistics where you use a sample to estimate the properties of an entire population. For example, you gather a random sample of adult women in the United States, measure their heights, and obtain an average of 5’ 4” (1.63m). The sample mean (x̄) estimates the population mean (μ). However, it’s virtually guaranteed that the sample mean doesn’t equal the population parameter exactly. The difference is sampling error. The preceding example illustrates sampling error when a sample estimates the mean. However, the same principles apply for other types of estimates, such as proportions, effect sizes, correlation, and regression coefficients. In this post, learn about what constitutes an acceptable sampling error, factors that contribute to it, how to minimize it, and statistical tools that evaluate it. Related post: Parameters vs. Statistics It’s Unavoidable but Knowledge Helps Minimize ItThere are tremendous benefits for working with samples. For one thing, it’s usually impossible to measure an entire population because they tend to be extremely large. Consequently, samples are the only way for most research even to proceed. Samples allow you to obtain a practical dataset with reasonable costs in a realistic timeframe. Unfortunately, sampling error is an inherent consideration when using samples. Even when researchers conduct their study perfectly, they can’t avoid some degree of sampling error. Why not? Randomness alone guarantees that your sample cannot be 100% representative of the population. Chance inevitably causes some error because the probability of obtaining just the right sample that perfectly matches the population value is practically zero. Additionally, samples can never provide a perfect depiction of the population with all its nuances because it is not the entire population. Samples are typically a tiny percentage of the whole population. The only way to prevent sampling error is to measure the entire population. Barring that approach, researchers can take steps to understand and minimize it. Given the inevitability of some sampling error for most studies, the question becomes, how close are the sample estimates likely to be to the correct population values? The best studies tend to have low amounts of it, while subpar studies have more. Let’s start by breaking down the properties of acceptable sampling error. Then we’ll move on to managing its sources. Related post: Sample Statistics are Always Wrong (to Some Extent)! Properties of Acceptable Sampling ErrorIn inferential statistics, the goal is to obtain a random sample from a population and use it to estimate the attributes of that population. Sample statistics are estimates of the relationships and effects in the population. Sampling error always occurs, so we have to live with it. But what do statisticians consider to be acceptable? In a nutshell, sampling error should be unbiased and small. Let’s explore these characteristics using sampling distributions. A key concept of inferential statistics is that the sample a researcher draws is only one of an infinite number of samples they could have drawn. Imagine we repeat a study many times. We collect many random samples from the same population and calculate each sample’s estimate. Later, we graph the distribution of the estimates. Statisticians refer to it as a sampling distribution. The concepts of bias and precision for sampling error relate to the tendency of multiple samples to center on the proper value and to cluster tightly around it. Learn more about Sampling Distributions. Related posts: Inferential vs. Descriptive Statistics and Populations, Parameters, and Samples in Inferential Statistics Unbiased Sampling ErrorUnbiased sampling error tends to be right on target. While the sample estimates won’t be exactly right, they should not be systematically too high or low. The average or expected value of multiple attempts should equal the population value. Statisticians refer to this property of being correct on average as unbiased. In the graph below, the population value is the target that the distribution should center on to be unbiased. The curve on the right centers on a value that is too high. The methodology behind this study tends to overestimate the population parameter, which is a positive bias. It is not correct on average. Statisticians refer to this problem as sampling bias. However, the left-hand curve centers on the correct value. That study’s procedures yield sample statistics that are correct on average—it’s unbiased. The expected value is the real population value. Please note that larger sample sizes do not reduce bias. When a methodology produces biased results, a larger sample size simply produces a greater number of biased values. Learn about Sampling Bias. Sampling Error and PrecisionRecognizing that sample statistics are rarely correct exactly, you want to minimize the difference between the estimate and the population parameter. Large differences are bad! Precision in statistics assesses how close you can expect your estimate to be to the correct population value. When your study has a low sampling error, it produces precise estimates that you can confidently expect to be close to the population value. That’s a better position than having high amounts of error, producing imprecise estimates. In that scenario, you know your estimate is likely to be wrong by a significant amount! Sampling distributions represent sampling error and precision using the width of the curves. Tighter distributions represent lower error and more precise estimates because they cluster more tightly around the population value. Conversely, broad distributions indication lower precision because estimates tend to fall further away from the correct value. In the graph, both curves center on the correct population value, indicating they’re both unbiased. That’s good. However, the red curve is broader than the blue curve because it has more sampling error. Its estimates tend to fall further away from the population value than the blue curve. That’s not good. We want our estimates to be close to the actual population value. Relatively precise estimates cluster more tightly around the parameter value, which you can see in the blue curve. Unlike biased results, increasing the sample size reduces the amount of sampling error and increases precision. We’ll come back to that! Sources of Sampling ErrorAs you saw above, you can understand sampling error through the bias and precision of sample statistics. Some sources of sampling error tend to produce bias, while others affect precision. Sources of BiasBiases in sampling error frequently occur when the sample or measurements do not accurately represent the population. These problems cause the sample statistics to be systematically higher or lower than the correct population values. The leading causes of bias relate to the study’s procedures. There are no statistical measures that assess bias. A sample’s properties cannot tell you whether the sample itself is biased. Instead, you must look at the study’s methods and procedures to determine whether they will likely introduce bias. Below are some of the top causes of sampling error bias:
Factors that Affect PrecisionRandom sampling error refers to chance differences between a random sample and the population. It excludes the biases that I discuss above. This type of error affects the estimate’s precision. Two key factors affect random sampling error, population variability and sample size.
Of these two factors, researchers usually have less control over the variability because it is an inherent property of the population. However, they can collect larger sample sizes. Consequently, increasing the sample size becomes the critical method for reducing random sampling error. Unlike bias, statistical measures can evaluate random sampling error and incorporate it into various inferential statistics procedures. To see how random sampling error works mathematically, read my post about the Standard Error, which I describe as the gateway from descriptive to inferential statistics. Or read about the Law of Large Numbers, a more conceptual approach to how larger samples lead to more precise estimates. Statistical Methods that Evaluate Random Sampling ErrorInferential statistics are procedures that use sample data to draw conclusions about populations. To do so, they must incorporate sampling error in their calculations. For example, imagine you’re studying the effectiveness of a new medication and find that it improves the health outcome in the treatment group by 10% relative to the control group. Does that effect exist in the population, or is the sample difference due to random sampling error? Inferential procedures can help make that determination. I’ll summarize several broad types, but please click the links to learn more about them.
Please note that these procedures evaluate only random sampling error. They cannot detect bias and actually assume there is no bias. Consequently, the presence of bias invalidates their results. Sampling error is unavoidable when you’re working with samples. However, you can minimize it and incorporate it into your results. What happens to the sampling error as the sample size decreases?Sampling error is affected by a number of factors including sample size, sample design, the sampling fraction and the variability within the population. In general, larger sample sizes decrease the sampling error, however this decrease is not directly proportional.
What happens to error when sample size increases?The prevalence of sampling errors can be reduced by increasing the sample size. As the sample size increases, the sample gets closer to the actual population, which decreases the potential for deviations from the actual population.
|