Which of the following are reasons why it is important for frequency claims to use random sampling?

A convenience sample of 1117 undergraduate students in American universities explored associations between perceptions of unethical consumer behavior and demographic factors.

From: Encyclopedia of Social Measurement, 2005

Non-Probability Sampling

Alison Galloway, in Encyclopedia of Social Measurement, 2005

Convenience Sampling

Definition

Convenience sampling involves using respondents who are “convenient” to the researcher. There is no pattern whatsoever in acquiring these respondents—they may be recruited merely asking people who are present in the street, in a public building, or in a workplace, for example. The concept is often confused with “random sampling” because of the notion that people are being stopped “at random” (in other words, haphazardly). However, whereas the correct definition of random sampling (using random numbers to pick potential respondents or participants from a sampling frame) generally results in a statistically balanced selection of the population, a convenience sample has an extremely high degree of bias.

Application

Typically, somebody undertaking a convenience sample will simply ask friends, relatives, colleagues in the workplace, or people in the street to take part in their research. One of the best ways of considering the pitfalls of this form of sampling is to look at this last approach—stopping people in the street. On a typical weekday morning in the shopping area of an average town, the people on the street at that time are likely to result in an overrepresentation of the views of, for example, the unemployed and the elderly retired population. There will be a corresponding underrepresentation of those working in traditional “9-to-5” jobs. This can, of course, be counterbalanced to some extent by careful selection of different times and days of the week to ensure a slightly more balanced sample.

Despite the enormous disadvantage of convenience sampling that stems from an inability to draw statistically significant conclusions from findings obtained, convenience sampling does still have some uses. For example, it can be helpful in obtaining a range of attitudes and opinions and in identifying tentative hypotheses that can be tested more rigorously in further research. Nevertheless, it is perhaps the weakest of all of the non-probability sampling strategies, and it is usually possible to obtain a more effective sample without a dramatic increase in effort by adopting one of the other non- probability methods. The following examples of convenience sampling from published research represent the wide range of applications in the social sciences and in business research:

A convenience sample of 1117 undergraduate students in American universities explored associations between perceptions of unethical consumer behavior and demographic factors. Instructors on two campuses were contacted to obtain permission to administer the surveys during scheduled classes.

Questionnaires were distributed using convenience methods in a study of the motives and behaviors of backpackers in Australia. The 475 surveys were delivered in cafes and hostels in areas popular with backpackers.

Differences in bargaining behavior of 100 American and 100 Chinese respondents were explored using the Fishbein behavioral intention model.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985003820

Answers to Chapter Exercises, Part I

ROBERT H. RIFFENBURGH, in Statistics in Medicine (Second Edition), 2006

CHAPTER 10

10.1

There is no unique answer for this exercise.

10.2

(a) Randomized controlled trial. Drug versus placebo is the independent variable. Nausea scores are the dependent variables. (b) Cohort study. EIB is the independent variable. eNO differences are the dependent variables.

10.3

No. Among the reasons why not are the following: Rather than being a prospective study, the sample is a “convenience sample,” that is, drawn from patients presenting already tattooed and who have decided on removal; there is no control group for comparison; and the investigator is not masked from the decision about “response” to the treatment.

10.4

(a) Yes. Nothing can be done; age and sex were not recorded. (b) Yes. Age and sex can be tested for bias.

10.5

Age: Continuous measurements, different averages for EIB versus non-EIB groups, assume normal, small sample, two groups, standard deviations estimated (i.e., s, not σ). Choose two-sample t test. Sex: Counts, small sample, two groups. Choose x2 test or Fisher's exact test of a contingency table.

10.6

This is no unique answer for this exercise.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120887705500502

Data and Statistics

Rudolf J. Freund, ... Donna L. Mohr, in Statistical Methods (Third Edition), 2010

1.9 Data Collection

Usually, our goal is to use the findings in our sample to make statements about the population from which the sample was drawn, that is, we want to make statistical inferences. But to do this, we have to be careful about the way the data was collected. If the process in some way, perhaps quite subtle, favored getting data that indicated a certain result, then we will have introduced a bias into the process. Bias produces a systematic slanting of the results. Unlike sampling error, its size will not diminish even for very large samples. Worse, its nature cannot be guessed from information contained within the sample itself.

To avoid bias, we need to collect data using random sampling, or some more advanced probability sampling technique. All the statistical inferences discussed in this text assume the data came from random sampling, where “blind chance” dominates the selection of the units. A simple random sample is one where each possible sample of the specified size has an equal chance of occurring.

The process of drawing a simple random sample is conceptually simple, but difficult to implement in practice. Essentially, it is like drawing for prizes in a lottery: the population consists of all the lottery tickets and the sample of winners is drawn from a well-shaken drum containing all the tickets. The most straightforward method for drawing a random sample is to create a numbered list, called a sampling frame, of all the sampling units in the population. A random number generator from a computer program, or a table of random numbers, is used to select units from the list.

Example 1.5

Medicare has selected a particular medical provider for audit. The Medicare carrier begins by defining the target population—say all claims from Provider X to Medicare for office visits with dates of service between 1/1/2007 and 12/31/2007. The carrier then combs its electronic records for a list of all claims fitting this description, finding 521. This set of 521 claims, when sorted by beneficiary ID number and date of service, becomes the sampling frame. The sampling units are the individual claims. Units in the list are numbered from 1 to 521. The carrier decides that it has sufficient time and money to carry out an exploratory audit of 30 claims. To select the claims, the carrier uses a computer program to generate 30 integers with values between 1 and 521. Since it would be a waste to audit the same claim twice, these integers will be selected without replacement. The 30 claims in the sampling frame that correspond to these integers are the ones for which the carrier will request medical records and carry out a review.

This procedure can be used for relatively small finite populations but may be impractical for large finite populations, and is obviously impossible for infinite populations. Nevertheless, some blind, unbiased sampling mechanism is important, particularly for observational studies. Human populations are notoriously difficult to sample. Aside from the difficulty of constructing reasonably complete sampling frames for a target population such as “all American men between the ages of 50 and 59,” people will frequently simply refuse to participate in a survey, poll, or experiment. This nonresponse problem often results in a sample that is drastically different from the target population in ways that cannot be readily assessed.

Convenience samples are another dangerous source of data. These samples consist of whatever data the researcher was most easily able to obtain, usually without any random sampling. Often these samples allow people to self-select into the data set, as in polls in the media where viewers call in or click a choice on-line to give their opinion. These samples are often wildly biased, as the most extreme opinions will be over-represented in the data. You should never attempt to generalize convenience sample results to the population.

True random samples are difficult. Designed experiments partially circumvent these difficulties by introducing randomization in a different way. Convenience samples are indeed selected, usually with some effort at obtaining a representative group of individuals. This nonrandom sample is then randomly divided into subgroups one of which is often a placebo, control, or standard treatment group. The other subgroups are given alternative treatments. Participants are not allowed to select which treatment they will be given; rather, that is randomly determined. Suppose, for example, that we wanted to know whether adding nuts to a diet low in saturated fat would lead to a greater drop in cholesterol than would the diet alone. We could advertise for volunteers with high total cholesterol levels. We would then randomly divide them into two groups. One group would go on the low saturated-fat diet, the second group would go on the same diet but with the addition of nuts. At the end of three months, we would compare their changes in cholesterol levels. The assumption here is that even though the participants were not recruited randomly, the randomization makes it fair to generalize our results regarding the effect of the addition of the nuts.

For more information on selecting random samples, or for advanced sampling, see a text on sampling (for example, Scheaffer et al. (2006) or Cochran (1977)). Designed experiments are covered in great detail in texts on experimental design (for example, Maxwell and Delaney (2000)). The overriding factor in all types of random sampling is that the actual selection of sample elements not be subject to personal or other bias.

In many cases experimental conditions are such that nonrestricted randomization is impossible; hence the sample is not a random sample. For example, much of the data available for economic research consists of measurements of economic variables over time. For such data the normal sequencing of the data cannot be altered and we cannot really claim to have a random sample of observations. In such situations, however, it is possible to define an appropriate model that contains a random element. Models that incorporate such random elements are introduced in Chapters 6 and 7.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123749703000019

Study Designs

Ronald N. Forthofer, ... Mike Hernandez, in Biostatistics (Second Edition), 2007

6.3.1 Sampling Frame

Before performing any sampling, it is important to define clearly the population of interest. Similarly, when we are given a set of data, we need to know what group the sample represents — that is, to know from what population the data were collected. The definition of population is often implicit and assumed to be known, but we should ask what the population was before using the data or accepting the information. When we read an election poll, we should know whether the population was all adults or all registered voters to interpret the results appropriately. In practice, the population is defined by specifying the sampling frame, the list of units from which the sample was selected. Ideally, the sampling frame must include all units of the defined population. But as we shall see, it is often difficult to obtain the sampling frame and we need to rely on a variety of alternative approaches.

The failure to include all units contained in the defined population in the sampling frame leads to selecting a biased sample. A biased sample is not representative of the population. The average of a variable obtained from a biased sample is likely to be consistently different from the corresponding value in the population. Selection bias is the consistent divergence of a sample value (statistic) from the corresponding population value (parameter) due to an improper selection process. Even with a complete sampling frame, selection bias can occur if proper selection rules were not followed. Two basic sources of selection bias are the use of an incomplete sampling frame and the use of improper selection procedures. The following example illustrates the importance of the sampling frame.

Example 6.5

The Report of the Second Task Force on Blood Pressure Control in Children (1987) provides an example of the possibility of selection bias in data. This Task Force used existing data from several studies, only one of which could be considered representative of the U.S. noninstitutionalized population. In this convenience sample, over 70 percent of the data came from Texas, Louisiana, and South Carolina, with little data from the Northeast or the West. Data from England were also used for newborns and children up to three years of age. The representativeness of these data for use in the creation of blood pressure standards for U.S. children is questionable. Unlike the Literary Digest survey in which the errors in the sampling were shown to lead to a wrong conclusion, it is not clear that the blood pressure standards are wrong. All we can point to is the use of convenience sampling, and with it, the likely introduction of selection bias by the Second Task Force.

Example 6.6

Telephone surveys may provide another example of the sampling frame failing to include all the members of the target population. If the target population is all the resident households in a geographical area, a survey conducted using the telephone will miss a portion of the resident households. Even though more than 90 percent of the households in the U.S. have telephones, the percentage varies with race and socioeconomic status. The telephone directory was used frequently in the past as the sampling frame, but it excluded households without telephones as well as households with unlisted numbers. A technique called random digit dialing (RDD) has been developed to deal with the unlisted number problem in an efficient manner (Waksberg 1978). As the name implies, telephone numbers are basically selected at random from the prefixes — the first 3 digits — thought to contain residential numbers, instead of being selected from a telephone directory. But the concern about the possible selection bias due to missing households without telephones and people who do not have a stable place of residence remains.

In order to avoid or minimize selection bias, every sample needs to be selected based on a carefully drawn sample design. The design defines the population the sample is supposed to represent, identifies the sampling frame from which the sample is to be selected, and specifies the procedural rules for selecting units. The sample data are then evaluated based on the sample design and the way the design was actually executed.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012369492850011X

Data, Data, Everywhere

Choh Man Teng, in Philosophy of Statistics, 2011

6.1 Random Sampling

Given a set of data, a model is constructed and used to predict future or unseen events. For a model to be predictive, we need the future to be just like the past, and we need the data used to construct the model to be a representative sample of the parent population. This notion is usually captured as a random sampling assumption: the data observations are independent and identically distributed (i.i.d.) with respect to the parent population.

Data collection is often opportunistic. Data are collected whenever and wherever they can be collected. They may consist of Psychology 101 students, or birds that were spotted on particular days, or households who have access to the internet (so they may participate in online polls). These convenience samples are hardly i.i.d. samples. Not taking into account the provenance of the data and instead treating the data as an i.i.d. sample will distort the inferences drawn from the data.

The problem goes deeper than that. Even in the best of circumstances (other than artificial formulations), i.i.d. samples simply cannot be had. Consider the case of Psychology 101 students. The parent population is usually not the restrictive set of students in Psychology 101 classes, or all college students, or even the far more expansive “people here and now”. The inferences drawn from the data on the behavior of Psychology 101 students in many cases are intended to implicate the behavior of the all inclusive set of human beings, geographically and temporally dispersed.

Even if we can sample members independently and with equal probability from all corners of the world, future members of the human population cannot be available for i.i.d. sampling at the present time at any cost. Yet prediction often are concerned with future events (although we can also aim to predict retrospective or current events that are not included in the sample data). Future events are then members of the parent population that are inaccessible for sampling, random or otherwise.

Thus the i.i.d. sampling requirement is at best only an idealized requirement. Not only do we not have good reasons for considering a data sample to have been drawn i.i.d. over the entire parent population, but we usually have strong evidence against such a claim.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444518620500344

Basic Concepts

William L. Thompson, ... Charles Gowan, in Monitoring Vertebrate Populations, 1998

1.4.1 NONRANDOM SAMPLING

Nonrandom sampling is the subjective choice of sampling units based on prior information, experience, convenience, or related criteria. General categories of nonrandom sampling include, but are not restricted to, purposive, haphazard, and convenience sampling (Cochran, 1977; Levy and Lemeshow, 1991).

A purposive sample contains sampling units chosen because they appear to be typical of the whole sampling frame (Levy and Lemeshow, 1991). For instance, a researcher may subjectively choose plots he or she considers representative habitat for the target species. Or, in assessing potential impacts of a management action, a comparison unit may be chosen based on its apparent similarity to a managed area. Representativeness is not something that can be accurately assessed subjectively. Moreover, animals could easily be focusing on different attributes than those used by the investigator to select a representative sample.

Choosing sampling units based on haphazard contact or unconsious planning is called a haphazard sample (Cochran, 1977). Repeatedly dropping a coin on a map overlay with delineated sampling units, and selecting the unit that the coin rests on after each drop, would be a haphazard sample. Another example could be choosing plots based on where a person, while traveling through a study area, encountered some predetermined number of predefined habitat features (e.g., sampling for terrestrial salamanders at the first 10 fallen logs that are encountered). In sampling terminology, haphazard is not synonymous with random.

A convenience sample contains sampling units chosen because they are easily accessible, such as those on or adjacent to a road or trail (Cochran, 1977). Examples include bird surveys conducted on roads, searches for animal sign on game trails, and fish surveys restricted to stretches of stream near bridges or road crossings. Roads and trails are usually placed where they are for a reason, and therefore adjacent habitats may be quite different from surrounding areas. Roads often follow watercourses through valleys or through level areas in general. Trails also may be placed for ease of travel or simply because of the scenic value of surrounding habitats. In addition, the rate of change in habitat composition and structure along roads and trails may be quite different from that of surrounding areas.

The problem with nonrandom sampling techniques is an inferential one. In a statistical sense, parameter estimates based on counts from nonrandomly chosen plots cannot be expanded to a larger area (i.e., unsampled plots). In other words, a misleading parameter estimate will result from nonrandom sampling because the selected plots are not truly representative of the unchosen plots (this is called selection bias). Consider a bird survey along a riparian area, which contains a variety of shrub and tree cover, surrounded by grassland. Would it be sensible to apply birds counts obtained within this habitat type to the surrounding grassland? The answer is obviously no. The two habitats contain very different species assemblages. Now consider a count of deer feeding at dusk in a roadside meadow surrounded by mature forest. Is it reasonable to calculate a density of deer within the meadow and then expand this to the surrounding forest? Again, the answer is obvious. Although these are extreme examples, the idea is still the same for less obvious examples. Attempting to generalize over a heterogeneous environment without the proper use of inferential statistics can lead to very misleading results.

All nonrandom samples share the common trait that one cannot assign a probability or chance of selection to each plot contained within the area of interest. If, as in a convenience sample, some plots have no chance of selection, then they are not part of the sampling frame. In this case, the sampling frame is only composed of sampled plots; hence, inferences are limited to animals within these plots. Further, to assess the representativeness of an estimate obtained from a nonrandom sample requires comparing it either with the true parameter of interest or with an unbiased estimate, which would require some type of random sample. The true parameter value is unknown, and one may just as well have obtained a random sample in the first place. Therefore, this book will concentrate on random sampling procedures because they, on average, yield unbiased results, as well as allow us to assign a known level of uncertainty to our parameter estimates.

We do not wish to imply that every aspect of a research study is, or should be, based on some type of random sampling. For example, choice of the area and time period of study could be dictated by funding agencies, political or public mandates, accessibility, and so forth. One does not randomly choose a 5-year interval from all possible intervals during some time period. Nonetheless, concepts of proper inference still apply, i.e., we can only make valid inferences to our parameter of interest within the selected study area during the selected time interval.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780126889604500010

Sampling

R.S. Brown, in International Encyclopedia of Education (Third Edition), 2010

Nonprobability Sampling

There are a number of nonprobability sampling techniques that have been identified in the sampling literature. While these approaches may yield subjects from whom data may be collected, these design do not benefit from the main advantage of probability approaches, namely, that probability designs allow for the development of statistical theory to examine the properties of sample estimators (Kalton, 1983). Nevertheless, these approaches are used widely in social science research and they include convenience sampling, purposive sampling, or quota sampling. In convenience sampling, a sample is selected based on ready availability – such as students in a given classroom, or passers-by on a street corner or in front of a busy market. Other convenience samples may include respondents to an advertisement in a magazine or dial-in number for a reality television program. Purposive sampling differs from convenience sampling in that certain characteristics of the sample are sought out a priori; that is, a sample that possesses certain characteristics, often to be seen as representative of some larger population is sought. One example of a purposive sample may involve a researcher selecting communities from across a state to ensure geographic diversity.

Another nonprobability sampling approach is quota sampling. In quota sampling, the researcher attempts to collect data on a specified number of respondents in each of a number of groups of potential respondents. For example, a researcher may seek responses from 20 third-grade teachers, 20 fourth-grade teachers, 20 eighth-grade teachers, and 20 tenth-grade teachers. One advantage of quota sampling is that it can reduce data-collection time and associated costs. It is somewhat similar to stratified sampling in that it seeks to obtain responses from more homogeneous subsets of the total populations. However, the major and pivotal distinction is that with stratified random sampling, the sample units within the strata are randomly selected, whereas in quota sampling, the sample units within the quota groups are not. Further, while some may argue that quota sampling reduces nonresponse, the reality is that quota sampling merely replaces nonrespondents with other respondents in the quota group, thereby underestimating the responses of hard-to-find or unwilling sample units.

The fundamental problem with nonprobability sampling designs is that they are potentially biased in their sample estimators, and the magnitude of this bias is unknown. What is known is that the concern with regard to the bias in sample estimators increases with sample size, since probability-sampling designs become more precise with larger samples. Thus, whereas bias in sample estimators may be of less concern when deploying nonprobability sampling approaches for small-scale studies, more care should be taken to avoid nonprobability designs with larger research efforts. These sampling approaches may be utilized widely, but as Pedhazur and Schmelkin state, “.., the incontrovertible fact is that, in nonprobability sampling, it is not possible to estimate sampling errors. Therefore, validity of inferences to a population cannot be ascertained” (p. 321).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080448947002943

Sample Design

W. Penn Handwerker, in Encyclopedia of Social Measurement, 2005

Selection Criteria Provide the Ingredients for Sample Designs

Cases can be selected on the basis of one or more of six criteria:

1.

Availability

2.

Fulfilling a size quota

3.

Random (or known probability) selection

4.

Case characteristics

5.

Presence in specific enumeration units

6.

Presence along transects or at specific map coordinates

All samples that utilize random (or known probability) selection are called probability samples. If one does not employ random selection, one produces one of four different forms of nonprobability samples.

Nonprobability Samples

If you select a predetermined number or proportion of cases with specific case characteristics, or from specific enumeration units, transects, or sets of map coordinates, you produce a quota sample. If you select cases on the basis of case characteristics to acquire specific forms of information, you produce a purposive (judgment) sample. If you select cases simply because they will participate in your study, you produce an availability (convenience) sample. If cases become available because one case puts you in contact with another, or other cases, you produce a snowball sample.

Probability Samples

Probability samples are distinguished from nonprobability samples because the former exhibit known sampling distributions that warrant parameter estimation with classical statistical tests (e.g., chi-squared, t test, and F ratio). By convention, we identify parameters with Greek letters, such as β (beta), α (alpha), ɛ (epsilon), ρ (rho), and σ (sigma). Samples, in contrast, yield statistics. By convention, we identify statistics with Latin letters and words (e.g., b, median, percentage, and mean). Each statistic constitutes a point estimate of a parameter, which is one's single best guess about the value of the parameter.

Statistics constitute point estimates of parameters because samples of populations cannot perfectly replicate the properties of the populations from which they derive. Every sample yields different findings, and every statistic contains three sources of error (construct, measurement, and sampling). Construct error derives from trying to measure a construct that imperfectly fits the culture or cultures found in the population studied. Measurement error derives from imperfections in the means by which a value is assigned to an observation from a set of possible outcomes. To the extent to which significant construct and measurement errors can be ruled out, the difference between a specific statistic and the population parameter constitutes sampling error in that specific sample. Measurements of the same variable made on a large number of samples of the same size drawn from the same population exhibit a characteristic sampling distribution of errors around the parameter. Some statistics underestimate the parameter, whereas others overestimate the parameter.

Sampling errors may reflect chance or bias. Sampling errors that derive from chance exhibit characteristic distributions. Many such sampling distributions (the family of t distributions and the normal distribution) are symmetrical and are summarized by a mean of 0 and a standard deviation of 1. The average amount of error in a sampling distribution is called the standard error rather than standard deviation to distinguish sampling distributions from the frequency distributions of the variables studied in social science research.

Although some statistics underestimate the parameter and others overestimate it, when cases are selected independently and have the same probability of inclusion in any one sample, sampling errors come solely from chance. When this condition applies, the sampling distribution of all possible statistics reveals that most statistics come very close to the parameter, and the average amount of sampling error is 0. With statistics that exhibit a normal sampling distribution, for example, 68% of all sample statistics fall within ±1.00 standard errors of the parameter, and 95% of all sample statistics fall within ±1.96 standard errors of the parameter.

Small samples contain large amounts of sampling error because randomly selected extreme values exert great effects. Large samples contain small amounts of sampling error and thus estimate parameters very precisely. Sample precision is measured by the size of confidence intervals. Accurate samples yield confidence intervals that contain the parameter a given proportion (usually 95%) of the time. Statistical test findings apply to samples of all sizes because they incorporate into their results the degree of sampling error contained in samples of different sizes. Confidence intervals for small samples are wider than confidence intervals for large samples, but statistics from both large and small samples estimate parameters equally accurately.

This generalization holds only for statistics from samples that are reasonably unbiased. Unbiased samples are those in which all members of the population have an equal chance of selection. The only way to reliably obtain a reasonably unbiased sample is to employ the random selection criterion.

Simple Random Samples

Simple random samples (SRSs) constitute the reference standard against which all other samples are judged. The procedure for selecting a random sample requires two steps. First, make a list of all members of the population. Second, randomly select a specific number of cases from the total list. Random selection may rely on tables of pseudo-random numbers or the algorithms that generate uniform pseudo-random number distributions in statistical analysis software such as SYSTAT. One may sample with or without replacing cases selected for the sample back into the population. Sampling without replacement produces unequal probabilities of case selection, but these are inconsequential except with very small populations. More important, even SRSs overestimate the true standard error by the factor, N/N−n. Application of the finite population multiplier, (N−n)/N, will produce correct standard errors. The importance of this correction increases as the ratio of sample size (n) to population size (N) increases.

Random Systematic Samples

Random systematic samples (RSSs) constitute a variation on SRSs in which random selection of a starting point is substituted for random selection of all cases. For example, to select an RSS of 20% of a population, randomly select a number between 1 and 5, make your first case the one with the randomly selected number, and select every fifth case thereafter. To select an RSS of 5% of a population, randomly select a number between 1 and 20, make your first case the one with the randomly selected number, and select every 20th case thereafter.

Periodicity in a list of population members introduces significant bias into RSSs. In the absence of periodicity, and with a known population size, to determine a sampling interval (k), divide the size of the population (N) by a desired sample size (n). RSSs produce unbiased samples when k is an integer. The bias introduced when k is not an integer is inconsequential with large populations. However, if you know the size of the population, the following procedure always yields unbiased estimates:

1.

Randomly select a number (j) between 1 and N.

2.

Express the ratio (j/k) as an integer and a remainder (m).

3.

When m equals 0, select the case numbered k as your first sample element; when m does not equal 0, select the case numbered m as your first sample element.

Stratified, Cluster, Transect, and Case-Control Samples

All other probability samples incorporate SRSs or RSSs into the selection process. Stratified samples, for example, consist of a series of simple random or random systematic samples of population sectors identified by case characteristics (e.g., age, class, gender, and ethnicity) or combinations of characteristics (e.g., old and young women, and old and young men). Disproportionally stratified samples employ a quota criterion to oversample population sectors that might otherwise be insufficiently represented in the final sample. Cluster samples consist of samples in which cases are selected from SRSs or RSSs of enumeration units that contain sets of cases, such as households, hospitals, city blocks, buildings, files, file drawers, or census enumeration districts. Probability proportional to size samples are cluster samples in which the number of cases selected from specific enumeration units matches a quota proportional to the size of unit relative to the entire population. Transect samples consist of samples in which cases or enumeration units are selected from SRSs or RSSs of units that lie along randomly drawn transects or randomly selected map coordinates. Case-control samples consist of a set of purposefully (judgmentally) identified cases, a small set of which may be selected randomly, plus a set of randomly selected controls. This sampling procedure originated in epidemiology, in which cases are characterized by specific health conditions not experienced by controls. However, the procedure is readily generalizable by defining cases and controls by reference to a binary variable that distinguishes cases with a specific experience from controls without that experience.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985000761

Supply chain forecasting: Theory, practice, their gap and the future

Aris A. Syntetos, ... Konstantinos Nikolopoulos, in European Journal of Operational Research, 2016

Appendix B survey results and discussion

We have run the questionnaire through a convenience sample of 43 academics; this was intentional as we had to obtain expertise in specific fields of interest, predominantly that of Supply Chain Forecasting (SCF), and as such sampling should be well targeted to probe such expertise. (At this point we should mention that we hoped to include some practitioners amongst our respondents, but unfortunately none were forthcoming.) Sampling has been executed in two phases, initially at the EURO-INFORMS Joint International Meeting in Rome 01–04 July 2013 via face-to-face invitations and since then with an online version of the exact same questionnaire disseminated via the online survey platform SurveyMonkey; for that latter phase, there were three main waves of data collection after two sets of direct invitations via emails in the near past, and more recently in October 2014 with an open call via the Oracle – the monthly newsletter of the International Institute of Forecasters (www.forecasters.org).

In the face-to-face invitations, the response rate was close to 100% while in the final phase and the open call it cannot be properly estimated: when invitations where provided via direct emails an average of about 40% of experts have responded to our call to share their views and expertise on the topic. However the great majority of them only partially completed the questionnaire. Nevertheless, as far as the first question is concerned, which is the most important one for the purposes of this review paper, 98% of the participants in this survey provided their views. The response rates are provided in Tables B1.

Table B1. Response rate per question (out of the 43 questionnaires).

QuestionResponse rate (%)
Q1.198
Q1.291
Q1.381
Q2.165
Q2.253
Q2.351
Q3.144
Q3.237
Q3.328
Q4.151
Q4.242
Q4.533
Q5.147
Q5.233
Q5.323
Q649
Q749

Given that all the responses are in a narrative format, coding was necessitated so as to present the main findings of this survey and expose the extent to which these reflect our own team's understanding and perceptions of the most important topics and trends in the field of SCF.

Hereafter, we will focus our analysis on the first question “Name the 3 most important research areas in Supply Chain Forecasting (SCF) during the last 20 years (with #1 being the most important, everywhere hereafter)” that includes 42, 39 and 35 responses for the first, second and third most important area respectively. In total, 116 qualitative answers were received that were coded as indicated in Table B2.

Table B2. Response CODEings.

CODE (alphabetical)TERMS included in coding
ANALYTICSAnalytics, Big Data
BULLWHIPBullwhip
COLLABORATIONCollaboration, Information Sharing
HIERARCHIESHierarchical Forecasting, Aggregation
IT/ISICT, IS, Systems, Knowledge Management, DBMS
JUDGMENTJudgment, Integration of Judgment, Changes in Statistical Forecasts, Group Forecasting
METHODSMethods, Forecasting Methods, Demand Forecasting, Intermittent Demand, GLM, Exponential smoothing, ARIMA, Causal Models, Nowcasting
OTHERValue Added, Spot Markets, Sales, Marketing, Pricing, Reliability of Information, Process improvement
PLANNINGMRP, Planning, Newspaper Model, Stock control, ECR, Lean, Inventory, Monitoring, Control, Spare Parts, Management, Improvement, Lifecycle Management
SELECTIONSelection of Methods, Automatic Prediction, Evaluation, Accuracy
SUPPLY CHAINLength of Supply Chain, Global Supply chains
SUSTAINABILITYSustainability, Green Supply Chains
UNCERTAINTYError, Uncertainty, Special Events, Interruptions, Demand Versus Sales

In Table B3 we summarize the responses per coding and per preference (1st, 2nd or 3rd) from the experts in our sample.

Table B3. Responses rate per CODE (Grey-shaded: the two most popular - including ties).

CODEFrequency of response
As most important (1st)Second (2nd)Third (3rd)
ANALYTICS 1 1 1
BULLWHIP2 3 1
COLLABORATION8 8 4
HIERARCHIES- 1 1
IT/IS3 1 1
JUDGMENT4 3 4
METHODS6 7 7
OTHER1 1 5
PLANNING6 8 3
SELECTION3 1 2
SUPPLY CHAIN2 2 -
SUSTAINABILITY1 - 2
UNCERTAINTY5 3 4
Total(42) (39) (35)

Furthermore our CODEs have been classified both in terms of their relevance to single-echelon or across-supply-chain topics, as well as according to the extent to which they are ‘unambiguously’ relevant to supply chain forecasting or not; in both cases (collective – across the members of our team) academic judgment needed to be exercised. This resulted in Table B4.

Table B4. Categorisation of CODEings.

CODE (alphabetical)Categorisation
CODE (alphabetical)Single-Echelon topic or across-Supply-Chain Relevance of topic to SCF
ANALYTICS Supply Chain High
BULLWHIPSupply Chain High
COLLABORATIONSupply Chain High
HIERARCHIESBoth High
IT/ISBoth Medium
JUDGMENTBoth High
METHODSSingle-Echelon High
OTHERBoth Low
PLANNINGBoth Medium
SELECTIONSingle-Echelon High
SUPPLY CHAINSupply Chain Low
SUSTAINABILITYSupply Chain Low
UNCERTAINTYBoth High

We then decided to focus on the codes (and related keywords) that relate to either across-supply-chain topics or both single-echelon and across-supply chain topics as well as being highly relevant to FORECASTING, resulting in Table B5.

Table B5. Responses focusing on SCF per CODE.

CODEFrequency of response
As most important SecondThirdTotal ranking
ANALYTICS1 1 1 6
BULLWHIP2 3 1 13
COLLABORATION8 8 4 44
HIERARCHIES- 1 1 3
JUDGMENT4 3 4 22
UNCERTAINTY5 3 4 25
Revised Total(20) (19) (15) (3 points for 'Most Important', 2 points
 for 'Second' and 1 point for 'Third')

Furthermore we wanted to see how our codes and the respective keywords relate to our methodological three-dimensional paradigm of length, depth and time. Given our definitions in Section 1 we believe that 'BULLWHIP' and 'COLLABORATION' clearly fall within the length dimension, 'HIERARCHIES' within depth, while 'JUDGMENT' has strong connections to time. More discussion could be brought into where 'ANALYTICS' fit in, but our consensus was that descriptive, predictive and prescriptive analytics primarily attempt to identify optimal horizons, frequencies, points in time, historical data to be used, etc., and as such this mostly relates to the dimension of time rather than anything else. This classification is also consistent with the classification of 'JUDGMENT', as analytics and judgment are supposed to complement each other in decision making. Finally the most difficult one to categorize was 'UNCERTAINTY' as this includes both keywords like 'interruptions' and 'demand vs. sales' that link more to the dimension of length, as well as keywords like 'errors' and 'special events' that relate more to the dimension of time. Therefore, we decided to equally break this category into the respective two methodological dimensions of length and time. All this analysis resulted in our final Table B6 and the associated Fig. B1.

Table B6. Linking the experts' survey CODEs with the methodological dimensions of length, depth and time

DimensionCODECODE scoreDimension score
lengthBULLWHIP 13 69.5
COLLABORATION 44
UNCERTAINTY 12.5
time 12.5 40.5
ANALYTICS 6
JUDGMENT 22
depthHIERARCHIES 3 3

Which of the following are reasons why it is important for frequency claims to use random sampling?

Fig. B1. Graphical representation of Experts' perceived importance of SCF topics.

This graph has informed our review, combined with what we consider to be the conceptual underpinnings of the major developments in recent years and aligned with our three-dimensional methodological framework of length, depth and time. It is clearly illustrated that the experts see the themes of Collaboration, Uncertainty and Judgment and respectively the dimensions of length and time being the most important ones, and as such those should receive more attention in the years to come from researchers in the field.

Table C1. General results on the propagation of ARIMA demand processes.

Downstream demand processForecasting methodUpstream demand processRelationshipReference
ARIMA(p, d, qdns)MMSE ARIMA (p, d, qups) qups = max {p+d, qdns-L} Gilbert (2005)
SMA(n) ARIMA (p, d, qups) qups= qdns+ n Ali and Boylan (2012)
SES ARIMA (p, d, qups) qups≈t – 1 where t is the current time period

Table C2. Results on the propagation of an AR(1) demand process.

Downstream demand processForecasting methodUpstream demand processReference
AR (1)MMSE ARMA(1,1) Alwan et al. (2003)a
SMA(n) ARMA(1,n)
SES ARMA(1,?)

a Alwan et al. (2003) assume that an infinite demand history is considered to derive these results.

Read full article

URL: https://www.sciencedirect.com/science/article/pii/S0377221715010231

Racial identification and developmental outcomes among Black–White multiracial youth: A review from a life course perspective

Annamaria Csizmadia, ... Teresa M. Cooney, in Advances in Life Course Research, 2012

5 Conclusions

Using the life course principles of historical time, linked lives, and human agency, we reviewed findings relating to Black–White multiracial youth's racial identification and psychosocial adjustment, and the potential roles that multiple levels of developmental ecology play in these links. We discussed how the LCP can be used to inform conceptualizations of multiracial youth's racial identification. We also demonstrated the importance of considering the intricate interplay between historical time and social connections in investigating relations between racial identification and psychosocial adjustment among multiracial youth.

Our review revealed that due to weakening structural constraints on (multi)racial identification, multiracial youth (a) have several racial identity options (e.g., biracial, monoracial Black, White, situational, and non-racial identities) available to them, and (b) avail themselves of these options depending on the social constraints that they experience at multiple contextual levels (region, neighborhood, and family). Furthermore, we found that within any given social context, perceived physical appearance further enables or constrains multiracial youth's racial identification. Our review also suggested that racial identity choices have psychosocial consequences and that there is not an ideal racial identity type for multiracial youth. Due to limitations in methodology and scope of these studies, the mechanisms that underlie relations between multiracial youth's racial identification and psychosocial adjustment are not yet well understood. To advance the growing body of research on multiracial youth's development, we now expound on limitations of the reviewed literature and make specific recommendations for future inquiry.

Studies in this review differed in a number of sampling and definitional considerations, including the geographical locations from which the samples were drawn. Some of the studies drew on data from nationally representative samples (e.g., Brown, Hitlin, & Elder, 2006; Campbell, 2007; Harris & Sim, 2002; Qian, 2004; Roth, 2005), whereas others—particularly those examining the psychosocial implications of racial identification—utilized convenience samples (e.g., Coleman & Carter, 2007; Khanna, 2010; Lusk et al., 2010; Sanchez, 2010). We cannot generalize findings of the latter studies to the multiracial population at large, which necessitates more research that utilizes probability samples.

Furthermore, some studies focused on multiracial youth in specific geographic regions (Binning et al., 2009; Brunsma, 2006; Herman, 2004; Khanna, 2010; Phillips, 2004; Rockquemore & Brunsma, 2002; Townsend et al., 2009), whereas others derived their non-representative samples from several states via the Internet (e.g., Lusk et al., 2010; Sanchez, 2010; Sanchez et al., 2009). Multiracial people are unevenly distributed across the United States, with the most residing in the West (40%) followed by the South, Northeast, and Midwest (Jones & Smith, 2001). As mentioned, according to the LCP, historical events and conditions may have different consequences and meaning to the lived experiences of multiracial youth in different regions (Elder et al., 2003). Drawing on this notion, we assert that geographic variation exists in the effects of historical time on multiracial youth's racial identification. Moreover, historical time exerts its influence through social relations. This suggests that grounded in the LCP, future research should examine the effects of multiple levels of developmental contexts such as geographic region, neighborhood, and school on relations between multiracial youth's racial identification and psychosocial adjustment. Analytic strategies such as multilevel modeling enable researchers to examine inter-individual differences in links between multiracial youth's racial identification and psychosocial adjustment at multiple levels of context (e.g., youth nested within neighborhoods that are nested within specific geographic regions). Guided by the LCP, the effect of neighborhood racial composition on links between racial identification and adjustment can be investigated within and across geographic regions. Such analyses can illuminate the dynamic relationship between structural and contextual constraints and human agency that contribute to developmental heterogeneity among multiracial youth.

In addition to providing insight into multiracial youth's racial self-categorization, reviewed studies shed some light on multiracial youth's internal racial self-understanding. Contrary to studies based on nationally representative samples that measured only external identification via racial self-categorization, studies of convenience samples assessed racial self-understanding via items drawn from the Survey of Biracial Experience. Scholars involved in designing large-scale studies of nationally representative samples should consider including measures that assess external racial identification and internal racial identity. Such a multi-dimensional conceptualization and assessment of racial identity will be grounded in the life course idea of linked lives and will potentially yield rich data on variation in Black–White multiracial youth's racial identification and identity as a function of social context.

In addition, how race is measured has implications for sampling, data analyses, and interpretation of study findings. Using Add Health data, Brown, Hitlin, and Elder (2007) demonstrated in their analyses of respondents who chose the “other” category that how race/ethnicity is operationalized affects self-identification, others’ identification of them, and has implications for distribution of resources. Researchers must carefully consider the implications of the race categories that they include in their race measures (e.g., Campbell & Eggerling-Boeck, 2006). In multiracial research, race measures guide sampling and inform hypothesis testing. Self- and other-reported (e.g., parent) race determines who will be included in and excluded from study samples. This also is an important concern in research that investigates racial group differences. For example, if marking more than one race or choosing “multiracial” is not an option, a Black–White multiracial youth may identify as “Black,” “White,” “Other,” or refuse to mark any race category. This person may choose a different option depending on whether he/she is asked in school or at home. Our review also suggests that it is possible that the same multiracial person self-identifies as multiracial (if given the choice), but is identified as Black by teachers or parents. Depending on the categories included in a race question, data collection site, and race reporter, the same individual may or may not be included in the same subgroup, or the study sample altogether. In sum, race measurement can introduce sampling bias that may compromise generalizability of results in studies of exclusively multiracial and racially diverse samples. Researchers must attend to the operationalizations utilized in previous work and incorporate innovations that are epistemologically grounded and psychometrically sound.

Finally, our review revealed the scarcity of longitudinal studies on Black–White multiracial youth's racial identification and virtual absence of research on the developmental consequences of identification over time. The few longitudinal studies utilized secondary data that provided limited measures of racial identification (e.g., Brown et al., 2006; Doyle & Kao, 2007). These investigations tell us about external identification over time, but not about stability and change in multiracial youth's internal identity. Framed by the LCP, researchers should examine predictors of patterns of stability and change in external identification and internal self-understanding among multiracial youth. Longitudinal work is also needed to understand how changes in social context and/or structural constraints specific to a historical time affect racial identity development. For example, it would be interesting to assess the extent to which election of a multiracial president is associated with multiracial youth's racial identity.

Lastly, longitudinal research is needed to begin understanding the antecedents and consequences of multiracial youth's racial identification. Because research on multiracials has focused almost exclusively on cross-sectional samples of adolescents and young adults (see Brunsma, 2005, for an exception), little is known about how multiracial children develop an understanding of race in general and their racial identity in particular. Children begin to develop racial awareness as early as the late preschool years. During the elementary school years, multiracial children are often faced with the “What are you?” question. As a result, they are likely to sense their “racial otherness” early on (Kerwin & Ponterotto, 1995).

Drawing on the LC concept of linked lives, scholars need to attend to family processes that facilitate or hamper multiracial children's development of racial identity over time. Research on racial/ethnic socialization, though largely limited to monoracial families of color, has revealed its developmental consequences for racial identity and various domains of psychosocial and academic adjustment (e.g., Neblett et al., 2008). Researchers should begin to build a body of research that investigates racial socialization processes and their implications for racial identity and psychosocial adjustment for multiracial youth. The LCP can be used to guide this important line of inquiry. First, we suggest that scholars embark on creative qualitative and ethnographic studies to explore the basic contours of how the life course plays out for multiracial youth—in context. Specifically, qualitative investigations will facilitate development of a typology of racial socialization strategies that parents of multiracial youth use. Quantitative studies then can examine how diverse racial socialization strategies affect multiracial children's racial identification and subsequent psychosocial adjustment, and how this varies by context and over time. Next scholars should investigate the moderating/mediating influences of individual characteristics, such as appearance, on links between racial identification, racial socialization, and psychosocial adjustment over the life course. Research that investigates multiracial youth's psychosocial development embedded within the life course can inform interventions that support the adjustment of this growing subgroup of the U.S. population.

Read full article

URL: https://www.sciencedirect.com/science/article/pii/S1040260811000542