What is meant by Statistics ?
- Statistics is one of the parts of mathematics in which we study about the collecting, organizing, analyzing, interpreting and presenting data.
- Statistics is very helpful in real life situations as it is easy to understand if we represent a data in a particular number which represents all numbers. This number is called the measure of central tendency.
Ungrouped Data
- Ungrouped data is data in its original or raw form.
- The observations are not classified into groups.
- For example, the ages of everyone present in a classroom of kindergarten kids with the teacher is as follows:
3, 3, 4, 3, 5, 4, 3, 3, 4, 3, 3, 3, 3, 4, 3, 27.
This data shows that there is one adult present in this class and that is the teacher.
- Ungrouped data is easy to work when the data set is small.
Grouped Data
- In grouped data, observations are organized in groups.
- For example, a class of students got different marks in a school exam. The data is tabulated as follows:
Mark interval | 0-20 | 21-40 | 41-60 | 61-80 | 81-100 |
No. of Students | 13 | 9 | 36 | 32 | 10 |
This shows how many students got the particular mark range.
- Grouped data is easier to work with when a large amount of data is present.
Frequency
Frequency is the number of times a particular observation occurs in data.
Class Interval
- Data can be grouped into class intervals such that all observations in that range belong to that class.
- Class width = upper class limit – lower class limit
Mean
- It is the average of “n” numbers, which is calculated by dividing the sum of all the numbers by n.
- The mean
of n values x1, x2, x3, ...... xn is given by
Median
- If we arrange the numbers in an ascending or descending order then the middle number of the series will be median.
- If the number of series is even then the median will be the average of two middle numbers.
- If n is odd then the median is
observation.
- If the n is even then the median is the average of
Mode
- The number which appears most frequently in the series then it is said to be the mode of n numbers.
Mean of Grouped Data (Without Class Interval)
If the data is organized in such a way that there is no class interval then we can calculate the mean by
where, x1, x2, x3,...... xn are the observations
f1, f2, f3, ...... fn are the respective frequencies of the given observations.
Example
Grouped Population Mean | ||
x | f | fx |
20 | 40 | 800 |
40 | 60 | 2400 |
60 | 30 | 1800 |
80 | 50 | 4000 |
100 | 20 | 2000 |
200 | ∑fx = 11000 |
Here, x1, x2, x3, x4, x5 are 20, 40, 60, 80, 100 respectively and f1, f2, f3 , f4, f5 are 40, 60, 30, 50, 20 respectively.
Mean of Grouped Data (With Class-Interval)
When the data is grouped in the form of class interval then the mean can be calculated by three methods.
1. Direct Method
- In this method, we use a midpoint which represents the whole class. It is called the class mark.
- Class Mark is the average of the upper limit and the lower limit.
Steps for Direct Method of finding mean
Step 1: Classify the data into intervals and find the corresponding frequency of each class.
Step 2: Find the class mark by taking the midpoint of the upper and lower class limits.
Step 3: Tabulate the product of class mark and its corresponding frequency for each class. Calculate their sum (∑fixi).
Step 4: Divide the above sum by the sum of frequencies (∑fi) to get the mean.
or
Example
A teacher marks the test result of the class of 55 students for mathematics. Find the mean for the given group.
Marks of Students | 0 – 10 | 10 – 20 | 20 – 30 | 30 – 40 | 40 – 50 | 50 – 60 |
Frequency | 27 | 10 | 7 | 5 | 4 | 2 |
To find the mean we need to find the mid-point or class mark for each class interval which will be the x and then by multiplying frequency and midpoint we get fx.
Marks of students | Frequency(f) | Midpoint(x) | fx |
0 – 10 | 27 | 5 | 135 |
10 – 20 | 10 | 15 | 150 |
20 – 30 | 7 | 25 | 175 |
30 – 40 | 5 | 35 | 175 |
40 – 50 | 4 | 45 | 180 |
50 – 60 | 2 | 55 | 110 |
∑f = 55 | ∑fx = 925 |
2. Deviation or Assumed Mean Method
- If we have to calculate the large numbers then we can use this method to make our calculations easy.
- In this method, we choose one of the x’s as assumed mean and let it as “a”.
- Then we find the deviation which is the difference of assumed mean and each of the x.
- The rest of the method is the same as the direct method.
Steps for Assumed mean method of finding mean
Step 1: Classify the data into intervals and find the corresponding frequency of each class.
Step 2: Find the class mark by taking the midpoint of the upper and lower class limits.
Step 3: Take one of the xi’s (usually one in the middle) as assumed mean and denote it by ′a′.
Step 4: Find the deviation of ′a′ from each of the x′is
di=xi−a
Step 5: Find the mean of the deviations
d¯=∑xidi∑fi
Step 6: Calculate the mean as
x¯=a+∑xidi∑fi
or
The relation between Mean of deviations and mean
di=xi−a
Summing over all x′is,
∑di=∑xi−∑a
Dividing throughout by ∑fi=n, Where ′n′ is the total number of observations.
d¯=x¯−a
Example
If we have the table of the expenditure of the company's workers in the household, then what will be the mean of their expenses?
Expense(Rs.) | 100 - 150 | 150 - 200 | 200 - 250 | 250 - 300 | 300 - 350 | 350 - 400 |
Frequency | 24 | 40 | 33 | 28 | 30 | 22 |
Solution
As we can see that there are big values of x to calculate so we will use the assumed mean method.
Here we take 275 as the assumed mean.
Expenses(Rs.) | Frequency(f) | Mid value(x) | d = x – 275 | fd |
100 – 150 | 24 | 125 | - 150 | - 3600 |
150 – 200 | 40 | 175 | - 100 | - 4000 |
200 – 250 | 36 | 225 | - 50 | -1650 |
250 – 300 | 28 | 275 | 0 | 0 |
300 – 350 | 30 | 325 | 50 | 1500 |
350 – 400 | 22 | 375 | 100 | 2200 |
∑f = 180 | ∑fd = - 5550 |
3. Step Deviation Method
In this method, we divide the values of d with a number "h" to make our calculations easier.
Steps for Step-Deviation method of finding mean
Step 1: Classify the data into intervals and find the corresponding frequency of each class.
Step 2: Find the class mark by taking the midpoint of the upper and lower class limits.
Step 3: Take one of the x′is (usually one in the middle) as assumed mean and denote it by ′a′.
Step 4: Find the deviation of a from each of the x′is
di=xi−a
Step 5: Divide all deviations −di by the class width (h) to get u′is.
ui=xi−ah
Step 6: Find the mean of u′is
u¯=∑fiui∑fi
Step 7: Calculate the mean as
x¯=a+h×∑fiui∑fi=a+hu¯
Relation between mean of Step- Deviations (u) and mean
ui=xi−ah u¯=∑fixi−ah∑fi u¯=1h×∑fixi−a∑fi∑fi u¯=1h×(x¯−a)
Example
The wages of the workers are given in the table. Find the mean by step deviation method.
Wages | 20 - 30 | 20 - 30 | 30 - 40 | 40 - 50 | 50 - 60 |
No. of workers | 8 | 9 | 12 | 11 | 6 |
Solution
Wages | No. of workers (f) | Mid-point(x) | Assume mean (a) = 35, d = x - a | h = 10, u = (x – a)/h | fu |
10 – 20 | 8 | 15 | -20 | -2 | -16 |
20 – 30 | 9 | 25 | -10 | -1 | -9 |
30 – 40 | 12 | 35 | 0 | 0 | 0 |
40 – 50 | 11 | 45 | 10 | 1 | 11 |
50 – 60 | 6 | 55 | 20 | 2 | 12 |
∑f = 46 | ∑fu = -2 |
Important relations between methods of finding mean
- All three methods of finding mean yield the same result.
- Step deviation method is easier to apply if all the deviations have a common factor.
- Assumed mean method and step deviation method are simplified versions of the direct method.
Mode of Grouped Data
- In the ungrouped data the most frequently occurring no. is the mode of the sequence, but in the grouped data we can find the class interval only which has the maximum frequency number i.e. the modal class.
- The value of mode in that modal class is calculated by
l = lower class limit of the modal class
h = class interval size
f1 =frequency of the modal class
f0 =frequency of the preceding class
f2 = frequency of the succeeding class
Example
The table of the marks of the students of a class is given. Find the modal class and the mode.
Marks | 0 – 20 | 20 – 40 | 40 – 60 | 60 – 80 | 80 – 100 |
No. of students | 4 | 8 | 6 | 7 | 5 |
Solution
Here we can see that the class interval with the highest frequency 8 is 20 – 40.
So this is our modal class.
Modal class = 20 - 40
Lower limit of modal class (l) = 20
Class interval size (h) = 20
Frequency of the modal class(f1) = 8
Frequency of the preceding class(f0) = 4
Frequency of the succeeding class (f2) = 6
Median of Grouped Data
- To find the median of a grouped data, we need to find the cumulative frequency and n/2
- Then we have to find the median class, which is the class of the cumulative frequency near or greater than the value of n/2.
- Cumulative Frequency is calculated by adding the frequencies of all the classes preceding the given class.
- Then substitute the values in the formula
where l = lower limit of median class
n = no. of observations
cf = cumulative frequency of the class preceding to the median class
f = frequency of the median class
h = size of class
Example
Find the median of the given table.
Class Interval | Frequency | Cumulative Frequency (fc) | |
1 – 5 | 4 | 4 | 4 |
6 – 10 | 3 | 7 | 4 + 3 = 7 |
11 – 15 | 6 | 13 | 7 + 6 = 13 |
16 – 20 | 5 | 18 | 13 + 5 = 18 |
21 – 25 | 2 | 20 | 18 + 2 = 20 |
N = 20 |
Solution
Let’s find the n/2.
n = 20, so n/2 = 20/2 = 10
The median class is 11 - 15 as its cumulative frequency is 13 which is greater than 10.
13.5
Remark: The best-suited measure of central tendency in different cases and the Empirical relationship between them
i) The mean takes into account all the observations and lies between the extremes. It enables us to compare distributions.
ii) In problems where individual observations are not important, and we wish to find out a ‘typical’ observation where half the observations are below and half the observations are above, the median is more appropriate. Median disregards the extreme values.
iii) In situations which require establishing the most frequent value or most popular item, the mode is the best choice.
Mean, mode and median are connected by the empirical relationship
3 Median = Mode + 2 Mean
thus,
The empirical relation between the three measures of central tendency is3 Median = Mode + 2 Mean
Measures of Central Tendency for Grouped Data
i) Mean is the average of a set of observations.
ii) Median is the middle value of a set of observations.
iii) A mode is the most common observation.
Graphical Representation of Cumulative Frequency Distribution
The graph makes the data easy to understand. So to make the graph of the cumulative frequency distribution, we need to find the cumulative frequency of the given table. Then we can plot the points on the graph.
The cumulative frequency distribution can be of two types -
1. Cumulative Frequency distribution of less than type
- Cumulative frequency of the less than type indicates the number of observations which are less than or equal to a particular observation.
- The graph of a cumulative frequency distribution of the less than type is called an ‘ogive of the less than type’.
- To draw the graph of less than ogive we take the lower limits of the class interval and mark the respective less than frequency. Then join the dots by a smooth curve.
2.Cumulative Frequency distribution of more than type / more than Ogive
- Cumulative frequency of more than type indicates the number of observations which are greater than or equal to a particular observation.
- The graph of a cumulative frequency distribution of the more than type is called an ‘ogive of the more than type’.
- To draw the graph of more than ogive we take the upper limits of the class interval on the x-axis and mark the respective more than frequency. Then join the dots.
Relation between the less than and more than type curves
The point of intersection of the ogives of more than and less than types gives the median of the grouped frequency distribution.
Visualising formula for median graphically
Step 1: Identify the median class.
Step 2: Mark cumulative frequencies on the y-axis and observations on the x-axis corresponding to the median class.
Step 3: Draw a straight line graph joining the extremes of class and cumulative frequencies.
Step 4: Identify the point on the graph corresponding to cf=n/2
Step 5: Drop a perpendicular from this point on to the x-axis.
Example
Draw the cumulative frequency distribution curve for the following table.
Marks of students | 0 – 10 | 10 – 20 | 20 – 30 | 30 – 40 | 40 – 50 | 50 – 60 |
No. of students | 7 | 10 | 14 | 20 | 6 | 3 |
Solution
To draw the less than and more than give, we need to find the less than cumulative frequency and more than cumulative frequency.
Marks | No. of students | Less than cumulative frequency | More than cumulative frequency | ||
0 – 10 | 7 | Less than 10 | 7 | More than 0 | 60 |
10 – 20 | 10 | Less than 20 | 17 | More than 10 | 53 |
20 – 30 | 14 | Less than 30 | 31 | More than 20 | 43 |
30 – 40 | 20 | Less than 40 | 51 | More than 30 | 29 |
40 – 50 | 6 | Less than 50 | 57 | More than 40 | 9 |
50 – 60 | 3 | Less than 60 | 60 | More than 50 | 3 |
More than 60 | 0 |
Now we plot all the points on the graph and we get two curves.
Remark
-
The class interval should be continuous to make the ogive curve.
-
The x-coordinate at the intersection of the less than and more than ogive is the median of the given data.