What is a branch of mathematics that deals with collection analysis interpretation and presentation of data?

Presentation on theme: "STATISTICS Branch of mathematics that deals with the systematic method of collecting, classifying, presenting, analyzing and interpreting quantitative."— Presentation transcript:

1 STATISTICS Branch of mathematics that deals with the systematic method of collecting, classifying, presenting, analyzing and interpreting quantitative data

2 DIVISION OF STATISTICS DESCRIPTIVE To summarize and describe the group characteristics of data INFERENTIAL Drawing of conclusion or judgment about a population based on a representative sample

3 POPULATION Consists of the totality of the observations with which one is concerned SAMPLE Subset; taken from a population of objects or observation

4 Variable Is a characteristic or information of interest that is observable or measurable from every individual or object under consideration.

5 Types of variables  Qualitative or Categorical Variable  Quantitative or Numerical Variable Types of Quantitative Variables  Discrete Quantitative  Continuous Quantitative

6 Levels of Measurements of Variables:  Nominal Level (Classificatory Scale) Lowest level of measurement that simply labels or names or categories without any implicit or explicit ordering of the levels.  Ordinal Level ( Ranking Scale) Labels or classes with an implied ordering in these labels.  Interval Level The unit of measurement is arbitrary and there is no “true zero” point.  Ratio Level Contains all the properties of the interval level, and in addition, it has a “true zero” point.

7 STEPS IN STATISTICAL INQUIRY Collection of Data Processing of Data Presentations of Data Analysis of Data Interpretation of Data

8 TYPES OF DATA INTERNAL DATA Company’s own data EXTERNAL DATA Outside sources

9 METHODS OF DATA COLLECTION Interview or direct method Questionnaire or indirect method Registration method (e.g. NSO) Observation Experimentation

10 Sampling Techniques One of the most parts of the research work that needs preparation and planning is choosing the right and appropriate sampling method. Random Sampling A recommended process to prevent the possibility of a biased or erroneous inference. Under the concept of randomness, each member of the population has an equal chance to be included in the sample gathered.

11 Stratified Random Sampling This sampling technique is done through dividing the population into categories or strata and getting the members at random proportionate to each stratum or sub – group. Systematic Random Sampling Refers to a process of selecting every nth element in the population until the desired sample size is acquired.

12 Cluster Sampling Is the advantageous procedure when the population is spread over a wide geographical area CLUSTER – refers to an intact group which has a common characteristics Multistage Sampling More complex sampling technique, which includes the following steps: a)Divide the population into strata. b)Divide each stratum into clusters. c)Draw a sample from each cluster using the simple random sampling technique

13 PROCESSING OF DATA EDITING – to detect errors CODING – assigning numerals and other symbols to be able to group them CLASSIFYING – sorting and grouping

14 PRESENTATION OF DATA Textual Tabular Graphic Bar Line Pie Chart Scatter Pictograph

15 OTHER TERMS VARIABLE – fundamental quantity that changes DISCRETE VARIABLE – no in betweens CONTINUOUS VARIABLE – with in between CONSTANT – does not change

16 Frequency Distribution Table: is the organization of raw data in table form Consider the midyear scores of 45 students in Statistics 29 27 28 27 34 29 27 27 28 25 2335252933232733 27 2240272129222529 25 2120212325302028 30 2928302727271930 How to organize data?

17 Steps in Constructing Frequency Distribution Table Find the range r. The range is the difference between the highest score and the lowest score. Decide on the number of classes. A class is a grouping or category. The ideal number of classes is between 5 and 15. Determine the class interval i. Class interval or simply interval, is the size of each class.

18 Determine the classes starting with the lowest class. Determine the class frequency (f) for each class by counting the tally. The column for tally is optional.

19 The following numerical values are relevant in dealing with frequency distribution: 1. Class mark. It is the middle value in a class 2. Class boundaries. They are often described as the true limits.

20 The lower boundary of a class is 0.5 less than its lower limits, and the upper boundary is 0.5 more than its upper limit.

21 Cumulative frequency. is found by adding the frequency starting from the lowest class.

22 Grouped Frequency Distributions Class LimitsClass Boundaries Class Mark (X) TallyFrequencyCumulative Frequency 24 - 3023.5 – 30.527III33 31 - 3730.5 – 37.534I14 38 - 4437.5 – 44.541IIII59 45 - 5144.5 – 51.548IIII 918 52 - 5851.5 – 58.555IIII I624 59 - 6558.5 – 65.562I125 Total = 25

23 Example 1: These data represent the record high temperatures in ⁰F for each of the 50 States 112100127120134118105110109112 110118117116118122114 105109 107112114115118117118122106110 116108110121113120119111104111 120113120117105110118112114 Construct a grouped frequency distribution for the data using 7 classes

24 Example 2: Statistics Test Score of 50 Students 8862638865 8583767263 6046857167 7578877042 6390636073 5562 8379 7840515680 9047485477 8655765276 4052724360 Construct the GFD for the Statistics Test Scores with 11 classes.

25 Using the data, 1. Construct a frequency distribution with 11 classes. 2. Construct a histogram, a frequency polygon and ogive from the data.

26 Graphical Presentation of Data A histogram is a bar graph like representation of a frequency distribution. The rectangular bars are without space between them. The height of each bar corresponds to the frequency of the class and the width corresponds to the class marks.

27

28 -A well balanced histogram should have a height of 60%, 67% or 75% of its width. - A frequency polygon is a line graph where the frequency of each class is plotted against the corresponding class mark.

29

30 An ogive ( pronounced as o – jayv) is a line graph where the cumulative frequency of each class is plotted against the corresponding class boundary.

31 Cumulative Frequency Graph (ogive)

32 Exercises #1: I. Classify the following according to the scale of measurement. Write N if your answer is nominal, O if ordinal, I if interval or R if ratio. ______ 1. Newborns arranged according to gender. ______ 2. Banking hours of the different types of banks in Metro Manila ______ 3. Peso-dollar exchange rate ______ 4. Temperature range of patients afflicted with pneumonia. ______ 5. lead content in toys manufactured in the Phils.

33 II. Classify the following as descriptive statistics or inferential statistics. Write D if your answer is descriptive and write I if your answer is inferential. ______ 1. The time it takes a shipment of perishable goods to reach its destination. ______ 2. The number of times the peso –dollar rate fluctuates during the week. ______ 3. Based on the medical record of the patient, the patient has a high sugar level considered to be critical. ______ 4. The farm produce in Baguio. ______ 5. For the past one month, there was an increase number of cases of cholera in hog farms in Bulacan..

34 III: Statistics Test Score of 50 Students 8862638865 8583767263 6046857167 7578877042 6390636073 5562 8379 7840515680 9047485477 8655765276 4052724360 Construct the GFD for the Statistics Test Scores with 11 classes.

35 Example 1: The following data represents the weekly savings of employees in a manufacturing company. 49 849167 38 57822947 38 54524367 65 50185848 39 16653571 73 78563559 71 9265761 42 44245263 85 46343939 61 29463428 25

36 Using the data, 1. Construct a frequency distribution with 11 classes. 2. Construct a histogram and a frequency polygon from the data. 3. Construct a frequency distribution using 9 class interval. 4. Construct a histogram and a frequency polygon from the data.

37 MEASURES OF CENTRAL TENDENCY -It is a statistic that serves as a representative of the data under investigation. -This tends to lie within the center of the set of data. -There are three measures of central tendency such as the mean, median and mode.

38 It is the most important, the most useful, and the most widely used measure of central tendency. It refers to the sum of all the given values or items in a distribution divided by the number of values or items summed. Mean has limitations and uses.

39 The Mean is Used for interval and ratio measurement; If higher statistical computations are wanted; If there are no extreme values in the distribution since it is easily affected by extremely low scores or extremely high scores. Thus, the distribution is approximately normal;

40 - When the greater reliability of the measure of central tendency is wanted since its computations include all the given values.

41 The Limitations of the Mean It is the most widely used average, because it is the most familiar. It is often, however misused. It cannot be used if the clustering of values or items is not substantial. An example is when representing the scores or values, 10 and 100 since they are far apart. When the given values do not tend to cluster around a central value, the mean is a poor measure of central location. It is easily affected by extremely large or small values. One small value can easily pull down the mean.

42 - The mean cannot be utilized to compare distributions since the means of two or more distributions may be the same but their characteristics maybe entirely different. The means of distribution A whose values are 80, 85, and 90 and distribution B whose values are 86, 85 and 84 are both 85.

43 However, we cannot imply that both distribution posses the same characteristics since their patterns of dispersion or variations are markedly different despite having the same mean.

44 The formula for computing the Mean are: Ungrouped Data Where: = is the mean, xi stands for the values or items and n is the number of respondents.

45 Grouped Data: The midpoint formula Where: - is the mean Xifi- is the product of the classmark and the frequency n – is the number of respondents

46 The Mean for Grouped data can also be computed using the CODED FORMULA: = Where: AM – assumed mean Xi – deviation of the values from the assumed mean i – class size n – number of cases

47 Example: Compute for the mean using the two formulas. Class Interval f 90 - 942 85 - 896 80 - 843 75 - 798 70 - 745 65 - 692 60 - 6410 55 - 593 50 – 544 45 – 493 40 – 444

48 Solution for Mean (Using Midpoint Formula) Class Interval Class Mark (X)Frequency (f) 90 – 94922184 85 – 89876522 80 – 84823246 75 – 79778616 70 – 74725360 65 – 69672134 60 – 646210620 55 – 59573171 50 – 54524208 45 – 49473141 40 – 44424168

49 Using Midpoint Formula:

50 Solution for Mean (Using Unit Deviation Formula) Class Interval Class Mark (X) Frequency (f) 90 – 94922510 85 – 89876424 80 – 8482339 75 – 79778216 70 – 7472515 65 – 6967 (AM)2O0 60 – 646210-10 55 – 59573- 2-6 50 – 54524-3-12 45 – 49473-4-12 40 – 44424-5-20

51 = 67+0.4 = 67.4 Using Unit Deviation Method: Where: Assumed Mean(AM) may be one of the class marks but preferably one which is located at the center of the distribution or one which has the highest frequency.

52 This is the middle value in a set of quantities. It separates an ordered set of data into two equal parts. Half of the quantities are found above the median and the other half is below it. To find the median of an ungrouped data, follow these steps: 1. Arrange the quantities either in ascending or descending order.

53 2. Number the quantities consecutively from 1 to n. 3. If n is odd, the median is the (n+1/2)th quantity. If n is even, the median is the mean of (n/2+1)th and (n/2)th quantities.

54 The Median is Used for ordinal or ranked measurement; if there are extrme cases, thus the distribution is markedly skewed; if we desire to know whether the cases fall within the upper halves or the lower halves of the distribution; for an open-end distribution; that is, the lowest or the highest class interval or both are not defined as 50 and below or 100 and above;

55 Limitations of the Median: It is easily affected by the number of items in a distribution. It cannot be determined if the given values are not arranged according to magnitude. If several values are contained in a distribution, it becomes a laborious task to arrange them according to magnitude. Its value is not as accurate as the mean because it is just an ordinal statistic.

56 Formula for finding the Median: To get the median for ungrouped data, we simply arrange the data from the highest value to the lowest value or vice – versa. The median is the middle value in the distribution. If there is an odd number of observation, the middle value is the median. Ex. 6,7, 8, 9, 10, 12, 16 If the number of observation is even, the average of the two middle scores is the median. Ex. 8, 7, 6, 5, 4, 3

57 Grouped Data:

58 Solution for Median Class Interval Frequency(f)Cumulative frequency (cf) 90 – 94250 85 – 89648 80 – 84342 75 – 79839 70 – 74531 65 – 69(u = 64.5)226 60 – 641024 =cf 55 – 59314 50 – 54411 45 – 4937 40 – 4444

59 Solving for Median:

60 Examples: Solve for Median For ungrouped data Find the median of the set of measure: 23, 15, 9, 30, 27, 10, 18, 14, 13. 12.6, 15.0, 19.8, 17.9, 11.7, 18.6, 14.1, 13.4

61 It is the quantity with the most number of frequency. A set of data is unimodal distribution if it contains only one mode. For instance, the set 11, 15, 13, 15, 14, 13, 15 is unimodal. The mode is 15 with 3 frequencies. A set is bimodal distribution if it contains two modes. For example, the sets

62 88, 89, 82, 82, 82, 89, 88, 89 and 63, 55, 57, 60, 60, 66, 56, 58, 57 are bimodal. The modes are 82 and 89 and 60 respectively. A set of data with three modes is trimodal. But the distribution 40, 44, 37, 37,44, 40 has no mode.

63 The Mode is Used for nominal or categorical data; if the most popular or most typical case or value in the distribution is wanted. If a rough or quick estimate of a central value is wanted.

64 The Limitations of the Mode It is rarely or seldom used since it does not always exist. It is very unstable because its value changes depending on the approaches used in finding it. Its value is just a rough estimate of the center of concentration of a distribution.

65 Formula for Mode of Grouped Data The mode in grouped data is the class mark or midpoint of the class with the highest frequency.

66 Solution for Mode Class Interval Frequency(f) 90 – 942 85 – 896 80 – 843 75 – 798 70 – 745 65 – 692 60 – 64(modal class)10 55 – 593 50 – 544 45 – 493 40 – 444

67 Solving for Mode:

68 Example: Compute for the mean, median, and mode given the age brackets of the workers in a certain factory. AgeNo. of Workers(f) 42 – 4415 39 – 4118 36 – 3823 33 – 3520 30 – 3224 27 – 2916 24 -2625 21 – 2312 18 – 2010 15 – 1713

69 Skewness in Relation to Central Tendency The measure of central tendency are helpful describing the characteristics of a given distribution. When the values of the mean, median and mode are all equal, then they are all represented by a simple point in a distribution. The distribution in such case is normal or symmetrical.

70 -If the values of the mean, median and mode are not the same, the curve or distribution is skewed or assymetrically. -There are two types of skewed distribution. *Positively Skewed – the curve has a heavy right tail. This means that there are more high values, so the scores accumulate at the right.

71 Therefore, the mean is pulled into the tail of the distribution and its value is higher than the median. The mean here is easily affected by extreme cases which in a positively skewed distribution are found to the right. Moreover, the mean is also found to the right of the mode since skewness in this case is approximated by the distance of the mean from the mode.

72 * Negatively Skewed – the curve has a heavy left tail. This implies that there are more low scores, so that the values accumulate at the left. Therefore, the mean is pulled into the tail of the curve which is found at the left. So the value of the mean is lower than the median because extreme cases are found at the left of the distribution.

73 Quantiles: This refers to values which divides the distribution into a given number of equal parts. There are types of quantiles: Quartiles – divide the distribution into four equal parts. Deciles – divide the distribution into ten equal parts. Percentiles – divide the distribution into one hundred equal parts.

74 Percentiles(for ungrouped data) Are positions measures used in educational and health- related fields to indicate the position of a n individual in a group.

75 Percentile formula:

76 Example 1. A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank of a score of 12. 18,15, 12, 6, 8, 2, 3, 5, 20, 10

77 Solution:

78 Procedure Table

79 STEP 3A: If c is not a whole number, round up to the next whole number. Starting at the lowest value, count over to the number that corresponds to the rounded-up value. STEP 3B: If c is a whole number, use the value halfway between the cth and (c + 1)st values when counting up from the lowest value.

80 EXAMPLE 2: Using the scores in Example 1: a. find the 25 th percentile. b. find the 60 th percentile.

81 SOLUTION:

82

83 Examples: 1.Find the 20 th percentile or P20 of the following scores: 25, 22, 20, 16, 17, 12, 8, 6, 5 2. Find the 60 th percentile of the following scores: 99, 95, 80, 75, 70, 60, 40

84 Quartiles and Deciles(for ungrouped data) Finding Data values Corresponding to Q 1, Q 2, and Q 3 STEP 1: Arrange the data in order from lowest to highest. STEP 2: find the median of the data values. This is the value for Q 2. STEP 3: Find the median of the data values that fall below Q 2. This is the value for Q 1. STEP 4: Find the median of the data values that fall above Q 2. This is Q 3.

85 Example: Find Q 1, Q 2, Q 3 for the data set 15, 13, 6, 5, 12, 50, 22, 18

86 SOLUTION:

87 Computations of the Quantiles for Grouped Data The computations for the grouped data is similar to that of the median. The formula is

88 where: Pp – the desired quantiles u – exact lower limit of the class interval containing the median n - number of cases p – proportion corresponding to the desired quantiles cf – cumulative frequency immediately below the class interval containing p p f – frequency of the class interval containing p p i – class interval

89 The efficiency ratings of 200 faculty members of a certain college were taken and are shown below. CIf 73 – 752 76 – 786 79 – 8111 82 – 8418 85 – 8720 88 – 9039 91 – 9355 94 – 9639 97 – 9910

90 1.Compute for the value of the mean, median and mode 2.Determine the value of the following: a. lower boundary of the 2 nd quartile class b. upper limit of the 3 rd quartile class

91 c. classmark of the 78 th percentile class d. frequency of the 8 th decile class e. cumulative frequency before the 5 th decile class 3. Determine the value of the following: a. Q1e. D4 b. P36f. P55 c. D5g. P79 d. D7h. Q4

Is a branch of mathematics dealing with the collection analysis presentation interpretation and conclusion of data?

Statistics is a branch of applied mathematics that involves the collection, description, analysis, and inference of conclusions from quantitative data.

What is a branch of mathematics dealing with the collection analysis interpretation and presentation of numerical or quantitative data 1 point?

Statistics- the branch of mathematics that deals with the collection, organization, analysis, and interpretation of numerical data. Dot Plot- a statistical chart consisting of data points plotted on a fairly simple scale.

What is the branch of mathematics dealing with the collection analysis interpretation presentation and organization of data * 2 points?

Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.