Introduction
Central tendency is a vital concept in statistics that helps us grasp the essence of a dataset by identifying a single value that represents the entire distribution. In this blog, we’ll delve into the fundamentals of central tendency, exploring measures such as mean, median, mode, and standard deviation, along with their merits, demerits, and real-world applications.
Example 1: Suppose, one of your relative ask you about your exam result. If you explain the marks of every subjects separately, he/she will become boring and would not able understand your performance.
What will you do? You will answer 90%.
This is the central value of your marks score in different subjects
Â Example 2:
Â Suppose, you are a class teacher. Marks
score by five students in a test is given below. Will you able to compare the
performance of these students by comparing the marks score in different
subjects separately.
What will you do? You will compare by their percentage.
This is the central value of marks score by the five students in the test.
Measures of central tendency
Mean:
Definition: The mean is the average of a set of numbers. It is calculated by summing all the values in the dataset and dividing the sum by the number of values.
Mean:
For Ungroup Data
Â
If
the variable x assumes n values x1, x2 â€¦ xn, then the mean is given by
x Ì…=(x_1+x_2+x_3+ .Â .Â . +x_n)/n=Â 1/n âˆ‘_(i=1)^nâ–’x_i
Example:
Suppose marks scored by five students in a test isÂ 66, 72, 85, 52, and 75.
Â
x Ì…=(66+72+85+52+75)/5=350/5=70
Mean:For Grouped data
Â
x Ì…=(âˆ‘fx)/n
Where
x = the mid-point of individual class, f = the frequency of individual class
Â
N = the sum of the frequencies or total frequencies in a sample.Â
Â
For example
(âˆ‘fx)/nÂ =1320/50=25.x Ì…=38
Marks | No. of Students (f) | x | Fx |
0-10 | 6 | 5 | 30 |
10-20 | 8 | 15 | 120 |
20-30 | 17 | 25 | 425 |
30-40 | 11 | 35 | 385 |
40-50 | 8 | 45 | 360 |
Â | N=50 | Â | 1,320 |
Merits:
1.Â Â Â Easy to understand and calculate.
2.Â Â Â It is rigidly defined.
3.Â Â Â Utilizes all the data points in the dataset.
4.Â Â Â Sensitive to small changes in values.
5.Â Â Â Widely used in inferential statistics.
6.Â Â Â Applicable to both discrete and continuous data.
7.Â Â Â It provides a good basis for comparison.Â
Â
Demerits:
1. It cannot be obtained by inspection nor located through a frequency graph.
2. It cannot be in the study of qualitative phenomena not capable of numerical measurement i.e., Intelligence, beauty, honesty etc.,
3. It can ignore any single item only at the risk of losing its accuracy.
4. It is affected very much by extreme values.
5. It cannot be calculated for open-end classes.
6. It may lead to fallacious conclusions, if the details of the data from which it is computed are not given.
Â
Â
Some more examples:
1) Consider the following dataset representing the salaries of employees in a small company: {30000, 35000, 40000, 45000, 50000}
Mean= 30,000+35,000+40,000+45,000+50,000/5 = 200,000/5 = 40,000
Now, let’s introduce an extreme value, such as the CEO’s salary, which is significantly higher: {30000, 35000, 40000, 45000, 500000}
Recalculating the mean = (30,000+35,000+40,000+45,000+500,000)/5
=6,50,000/5 = 1,30,000
Â
2) The Per capita Income of India isÂ Rs. 1,72,000 and the BPL income limit in India is Rs. 27,000.
India should not have anyÂ PBL population, But India has more than 145.71 million or 10.2% of the total population.
Mean interpretation of SPSS data table :
Interpretation of Mean:
Variable name: 1) Number of Extracurricular Activities participated by the student
The mean of 2.68 indicates that, on average, each student participates in approximately 2.68 extracurricular activities.
This mean suggests a moderate level of participation in extracurricular activities among the students surveyed. While not exceptionally high, it indicates that students are somewhat involved in activities beyond their regular academic curriculum.
Â Variable name: 2) Income Levels of Students Parent (In Thousand per month)
The mean income of 33,800 rupees per month suggests that, on average, the parents of these 100 students earn approximately 33,800 rupees per month.
The mean income level of 33,800 rupees per month provides valuable insight into the socioeconomic context of the student population, informing decision-making processes aimed at promoting equity, access, and inclusivity in education.
- Â Both variables value calculated is scale or pure numeric. Therefore mean or arithmetic mean or average is used in statistical analysis.
Median
Definition: The median is the middle value in a sorted list of numbers. If there is an even number of values, it is the average of the two middle values.
The median is the middle most item that divides the group into two equal parts, one part comprising all values greater, and the other, all values less than that item.
Â
Suppose median salary of a company is Rs. 40,000. This measure of central tendency means that one one-half of all employee earn more than 40,000, and one-half earn less than 40,000.
Ungrouped or Raw data
Arrange
the given values in ascending order. If the number of values is odd,
the median is the middle value. If the number of values are even, median is the
mean of middle two values.
When n is odd, Median = Md = ((n+1)/2)^th value
When n is even, Median = Average of (n/2)^th and(n/2+1)^th value
Some examples:
1) The number of rooms in 7 hotels in Delhi is 713, 300, 618, 595, 311, 401, and 292. Find the median.
Here n = 7, First arrange it in ascending order:292, 300, 311, 401, 595, 618, 713
Median =((n+1)/2)^th value=((7+1)/2)^th value=4^th value = 401
2) The number of floods that have occurred in India over an 8-year period follows. Find the median. 684, 764, 656,702, 856, 1133, 1132, 130
Here n = 8, First arrange it in ascending order: 656, 684, 702, 764, 856, 1132, 1133, 1303
Median =
Average of
(n/2)^th and(n/2+1)^th value= Average of (8/2)^th and(8/2+1)^th value
= Average of 4^th andã€– 5ã€—^th value = (764+856)/2 = 810
Grouped
data
In
a grouped distribution, values are associated with frequencies. Grouping can be
in the form of a discrete frequency distribution or a continuous frequency
distribution. Whatever may be the type of distribution, cumulative frequencies
have to be calculated to know the total number of items.Â
Cumulative
frequency (cf)
Cumulative
frequency of each class is the sum of the frequency of the class and the
frequencies of the pervious classes, i.e. adding the frequencies successively,
so that the last cumulative frequency gives the total number of items.
Discrete Series
Step1: Find cumulative frequencies.Â Â
Step2: Find (n/2+1)Â Â
Â Step3: See in the cumulative frequencies the value just greater thanÂ Â Â (n/2+1)Â Â Â Â
Â
Â Step4: Then the corresponding value of x is median
Â
The following data pertaining to the number of customers per day in
beauty salon in Delhi. Find the median number of customers.
Â
Median = size of ((n+1)/2)^th itemÂ
Here the number of observations is even. Therefore median =
Â average of (n/2)^th item and(n/2+1)^th item.Â
Â Â Â Â Â Â Â = (30th item +31st item) / 2 = (6+6)/2
Â Â Â Â Â Â Â Â = 6
Â
Median (Grouped Data)
Continuous Series
The steps given below are followed for the calculation of median in continuous series.
Step1: Find cumulative frequencies.Â Step2: Find (n/2)Â Â Step3: See in the cumulative frequency the value first greater than (n/2), Then the corresponding class interval is called the Median class. Then apply the formula
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Median = l+(n/2-m)/f xc
where l = Lower limit of the median class,
Â Â Â Â Â Â f =frequency in the median class.
Â Â Â Â Â Â m = cumulative frequency preceding the median class,
Â Â Â Â Â Â c = width of the class
Â Â Â Â Â Â n=Total frequency
Â
For the frequency distribution given in table below, calculate the median.
Here = (n/2)= (164/2)=82
The cumulative frequency just greater than 82 is 105. Therefore, the median class is 100-120. Its lower limit is 100.
Here l = 100,
Â Â Â Â Â n=164 ,
Â Â Â Â Â f = 45 ,Â
Â Â Â Â Â c = 20,
Â Â Â Â Â m =60
Median = l+(n/2-m)/f xcÂ Â
Â Â Â Â Â Â Â Â = 100+(164/2-60)/45 x20Â
Â Â Â Â Â Â Â Â = 109.78
Example: Spss data to analys Median for two ordinal Variables
Interpretation of median:
Variable name:1) Grade of the Student
The median represents the middle value of the dataset when all grades are arranged in ascending order. In this case, with 100 students, the median indicates that approximately 50 students have grades lower than 2.00, and approximately 50 students have grades higher than 2.00.
Variable name:2) the variable “Your college learning environment improved your performance” is a subjective measure likely to be recorded on a scale (e.g., strongly agree, agree, neutral, disagree, strongly disagree), let’s assume it’s on a scale from 1 to 5:
This indicates that a significant portion of the student population perceives a positive impact of the college learning environment on their performance.
Both of the variables are ordinal in nature.Â Therefore median is used for statistical analysis.
Â
Merits:
1.Â Â Â Not influenced by extreme values or outliers.
2.Â Â Â Suitable for skewed distributions.
3.Â Â Â Provides a better representation of the central value when the dataset is not symmetric.
4.Â Â Â Applicable to ordinal and interval data.
5.Â Â Â Robust measure of central tendency.
Demerits:
1.Â Â Â Tedious to compute for large datasets.
2.Â Â Â Ignores most of the information in the dataset.
3.Â Â Â Limited in its ability to reflect the variability of data.
4.Â Â Â Requires data to be ordered.
Â
5.Â Â Â May not be unique if the dataset contains repeated values.
Some Examples
1) Consider the following dataset representing the salaries of employees in a small company: {30000, 35000, 40000, 45000, 50000}Â Â Â Median = 40,000
Â
Now, let’s introduce an extreme value, such as the CEO’s salary, which is significantly higher: {30000, 35000, 40000, 45000, 500000}Â Â Â Â New Median = 40,000
Mode
Definition: The mode is the value that appears most frequently in a dataset.
Â
The mode refers to that value in a distribution, which occur most frequently. It is an actual value, which has the highest concentration of items in and around it. It shows the center of concentration of the frequency in around a given value. Therefore, where the purpose is to know the point of the highest concentration it is preferred.Â
Mode
Definition: The mode is the value that appears most frequently in a dataset.
Â
The mode refers to that value in a distribution, which occur most frequently. It is an actual value, which has the highest concentration of items in and around it. It shows the center of concentration of the frequency in around a given value. Therefore, where the purpose is to know the point of the highest concentration it is preferred.Â
Ungrouped or Raw Data
For ungrouped data or a series of individual observations, mode is often found by mere inspection.
Find the mode for the following data: 2, 7, 10, 15, 10, 17, 8, 10, 2Â Â âˆ´Mode = 10
In some cases, the mode may be absent or there may be more than one mode.
(1) 12, 10, 15, 24, 30 (no mode)
(2) 7, 10, 15, 12, 7, 14, 24, 10, 7, 20, 10Â
Here, the modal values are 7 and 10 as both occur 3 times each
Â
Continuous distribution
Locate the highest frequency the class corresponding to that frequency is called the modal class.
Then apply the formula.
Mode = l+(f_1-f_0)/(ã€–2fã€—_1-f_0-f_2 ) xc
Where l = lower limit of the model class
f_0= the frequency of the class preceding the model class
f_1= the frequency of the model class
f_2= the frequency of the class succeeding the model class
and c = class interval
Merits:
1.Â Â Â Simple to understand and calculate.
2.Â Â Â Applicable to nominal, ordinal, and interval data.
3.Â Â Â Not affected by extreme values.
4.Â Â Â Suitable for non-parametric data analysis.
5.Â Â Â Useful for identifying the most common observation in a dataset.
Demerits:
1.Â Â Â May not exist if no value is repeated.
2.Â Â Â Not unique if multiple values have the same highest frequency.
3.Â Â Â Ignores the magnitude of values.
4.Â Â Â Not sensitive to small variations in the dataset.
Â
5.Â Â Â Not applicable to continuous data without discretization.
SPSS analysis for calculating mode:
Interpretation of Mode:
Variable name:1) Grade of the Student
If the mode for the variable “stream of the students” is 1, it means that the most common or frequently occurring stream among the students is stream 1. Stream 1 is the most popular choice among the students surveyed.
Variable name:2)
Â
Â·Â Â Â Â Â Â Â Mode is used for nominal and ordinal variable.Â Therefore, for above variables, mode is used for statistical analysis.
Summarization
Real-world Applications with Examples
Â
A. Finance
Example: Average Returns
1.Â Â Â Mean: Calculating the mean return on investment provides a measure of the average performance.
2.Â Â Â Median: In a scenario where extreme values can skew the mean, the median return might offer a more representative value.
B. Healthcare
Example: Patient Data Analysis
1.Â Â Â Mean and Median: Analyzing patient data, such as hospital stay durations, can involve both the mean and median to understand the central tendencies and potential outliers.
2.Â Â Â Mode: Identifying the mode in healthcare data could reveal the most frequently occurring conditions.
C. Retail
Example: Sales Data
1.Â Â Â Mode: In retail sales, identifying the mode of popular products can guide inventory management.
2.Â Â Â Mean and Median: Calculating the mean and median of sales amounts can help in understanding the central tendency of sales data.
.
Â
A. Outliers
Example: Income Distribution
1.Â Â Â In a dataset representing income, a few individuals with exceptionally high incomes can significantly impact the mean. The median would be less affected and might better represent the typical income.
B. Skewness
Example: Exam Scores
1.Â Â Â In a dataset of exam scores where a few students performed exceptionally well, the mean might be higher than the median, indicating a positively skewed distribution.
Â
In real-world applications, the choice of a measure of central tendency depends on the specific characteristics of the data and the goals of the analysis. Understanding these considerations enhances the accuracy and relevance of statistical analysis in various fields.