What type of statistics is used to organize and describe the characteristics of a collection of data quizlet?

  • Home/
  • Health topics/
  • Quality of care

What type of statistics is used to organize and describe the characteristics of a collection of data quizlet?

This document is intended to help you navigate the web-based WHO Quality Toolkit. It outlines what you need to know to navigate the online Quality Toolkit,...

The WHO global report on infection prevention and control (IPC) provides a global situation analysis of how IPC programmes are being implemented in countries...

This guideline aims to improve the quality of essential, routine postnatal care for women and newborns with the ultimate goal of improving maternal and...

The recommendations in this guideline are intended to inform the development of relevant national and subnational health policies, clinical protocols and...

The purpose of the new WHO document “Maternal and Perinatal Death Surveillance and Response (MPDSR): materials to support implementation” is...

It is estimated that between 5.7 and 8.4 million deaths are attributed to poor quality care each year in low- and middle-income countries, which accounts...

This knowledge brief is developed to support policy-makers, managers, practitioners and implementing partners engaged in improving quality of care (QoC)...

As countries commit to achieving universal health coverage, it is imperative to ensure that the design and delivery of palliative care services place attention...

Feature stories

While descriptive statistics summarize the characteristics of a data set, inferential statistics help you come to conclusions and make predictions based on your data.

When you have collected data from a sample, you can use inferential statistics to understand the larger population from which the sample is taken.

Inferential statistics have two main uses:

  • making estimates about populations (for example, the mean SAT score of all 11th graders in the US).
  • testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income).

Descriptive versus inferential statistics

Descriptive statistics allow you to describe a data set, while inferential statistics allow you to make inferences based on a data set.

Descriptive statistics

Using descriptive statistics, you can report characteristics of your data:

  • The distribution concerns the frequency of each value.
  • The central tendencyconcerns the averages of the values.
  • The variability concerns how spread out the values are.

In descriptive statistics, there is no uncertainty – the statistics precisely describe the data that you collected. If you collect data from an entire population, you can directly compare these descriptive statistics to those from other populations.

Example: Descriptive statisticsYou collect data on the SAT scores of all 11th graders in a school for three years.

You can use descriptive statistics to get a quick overview of the school’s scores in those years. You can then directly compare the mean SAT score with the mean scores of other schools.

Inferential statistics

Most of the time, you can only acquire data from samples, because it is too difficult or expensive to collect data from the whole population that you’re interested in.

While descriptive statistics can only summarize a sample’s characteristics, inferential statistics use your sample to make reasonable guesses about the larger population.

With inferential statistics, it’s important to use random and unbiased sampling methods. If your sample isn’t representative of your population, then you can’t make valid statistical inferences.

Example: Inferential statisticsYou randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics.

You can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on your sample data.

Sampling error in inferential statistics

Since the size of a sample is always smaller than the size of the population, some of the population isn’t captured by sample data. This creates sampling error, which is the difference between the true population values (called parameters) and the measured sample values (called statistics).

Sampling error arises any time you use a sample, even if your sample is random and unbiased. For this reason, there is always some uncertainty in inferential statistics. However, using probability sampling methods reduces this uncertainty.

Estimating population parameters from sample statistics

The characteristics of samples and populations are described by numbers called statistics and parameters:

  • A statistic is a measure that describes the sample (e.g., sample mean).
  • A parameter is a measure that describes the whole population (e.g., population mean).

Sampling error is the difference between a parameter and a corresponding statistic. Since in most cases you don’t know the real population parameter, you can use inferential statistics to estimate these parameters in a way that takes sampling error into account.

There are two important types of estimates you can make about the population: point estimates and interval estimates.

  • A point estimate is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.
  • An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

Both types of estimates are important for gathering a clear idea of where a parameter is likely to lie.

Confidence intervals

A confidence interval uses the variability around a statistic to come up with an interval estimate for a parameter. Confidence intervals are useful for estimating parameters because they take sampling error into account.

While a point estimate gives you a precise value for the parameter you are interested in, a confidence interval tells you the uncertainty of the point estimate. They are best used in combination with each other.

Each confidence interval is associated with a confidence level. A confidence level tells you the probability (in percentage) of the interval containing the parameter estimate if you repeat the study again.

A 95% confidence interval means that if you repeat your study with a new sample in exactly the same way 100 times, you can expect your estimate to lie within the specified range of values 95 times.

Although you can say that your estimate will lie within the interval a certain percentage of the time, you cannot say for sure that the actual population parameter will. That’s because you can’t know the true value of the population parameter without collecting data from the full population.

However, with random sampling and a suitable sample size, you can reasonably expect your confidence interval to contain the parameter a certain percentage of the time.

Example: Point estimate and confidence intervalYou want to know the average number of paid vacation days that employees at an international company receive. After collecting survey responses from a random sample, you calculate a point estimate and a confidence interval.

Your point estimate of the population mean paid vacation days is the sample mean of 19 paid vacation days.

With random sampling, a 95% confidence interval of [16 22] means you can be reasonably confident that the average number of vacation days is between 16 and 22.

Hypothesis testing

Hypothesis testing is a formal process of statistical analysis using inferential statistics. The goal of hypothesis testing is to compare populations or assess relationships between variables using samples.

Hypotheses, or predictions, are tested using statistical tests. Statistical tests also estimate sampling errors so that valid inferences can be made.

Statistical tests can be parametric or non-parametric. Parametric tests are considered more statistically powerful because they are more likely to detect an effect if one exists.

Parametric tests make assumptions that include the following:

  • the population that the sample comes from follows a normal distribution of scores
  • the sample size is large enough to represent the population
  • the variances, a measure of spread, of each group being compared are similar

When your data violates any of these assumptions, non-parametric tests are more suitable. Non-parametric tests are called “distribution-free tests” because they don’t assume anything about the distribution of the population data.

Statistical tests come in three forms: tests of comparison, correlation or regression.

Comparison tests

Comparison tests assess whether there are differences in means, medians or rankings of scores of two or more groups.

To decide which test suits your aim, consider whether your data meets the conditions necessary for parametric tests, the number of samples, and the levels of measurement of your variables.

Means can only be found for interval or ratio data, while medians and rankings are more appropriate measures for ordinal data.

Comparison testParametric?What’s being compared?Samples
t-testYes Means 2 samples
ANOVAYes Means 3+ samples
Mood’s medianNo Medians 2+ samples
Wilcoxon signed-rankNo Distributions 2 samples
Wilcoxon rank-sum (Mann-Whitney U)No Sums of rankings 2 samples
Kruskal-Wallis HNo Mean rankings 3+ samples

Correlation tests

Correlation tests determine the extent to which two variables are associated.

Although Pearson’s r is the most statistically powerful test, Spearman’s r is appropriate for interval and ratio variables when the data doesn’t follow a normal distribution.

The chi square test of independence is the only test that can be used with nominal variables.

Correlation testParametric?Variables
Pearson’s rYes Interval/ratio variables
Spearman’s rNo Ordinal/interval/ratio variables
Chi square test of independenceNo Nominal/ordinal variables

Regression tests

Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes.

Most of the commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations.

Data transformations help you make your data normally distributed using mathematical operations, like taking the square root of each value.

Regression testPredictorOutcome 
Simple linear regression1 interval/ratio variable 1 interval/ratio variable
Multiple linear regression2+ interval/ratio variable(s) 1 interval/ratio variable
Logistic regression1+ any variable(s) 1 binary variable
Nominal regression1+ any variable(s) 1 nominal variable
Ordinal regression1+ any variable(s) 1 ordinal variable

Frequently asked questions about inferential statistics

What type of statistics is used to organize and describe the characteristics of a collection of data?

Descriptive statistics summarize and organize characteristics of a data set.

Which of the following statistical methods used to describe the characteristics of a sample?

Inferential statistics use the characteristics in a sample to infer what the unknown parameters are in a given population. In this way, as shown in Figure 1.2, a sample is selected from a population to learn more about the characteristics in the population of interest.

Which of the following would be used to describe the type of statistical methods used to organize and describe the characteristics of a collection of data quizlet?

Define descriptive statistics. used to organize and describe the characteristics of a collection of data. Define inferential statistics.

Which type of statistics simplify and organize data?

Descriptive statistics are statistical procedures used to summarize, organize, and simplify data.