SPM Wiki

SPM WikiMathematicsChapter 17: Measures of Dispersion for Grouped Data

Chapter 17: Measures of Dispersion for Grouped Data

Master data analysis for grouped data including histograms, ogives, and dispersion measures.

Chapter 17: Measures of Dispersion for Grouped Data

Overview

Welcome to Chapter 17 of Form 5 Mathematics! This chapter builds on your statistics knowledge from Form 4 by extending measures of dispersion to grouped data. You'll learn to construct histograms and frequency polygons, compare and interpret data dispersion, and calculate various statistical measures for categorized data. These skills are essential for real-world data analysis and interpretation.

What You'll Learn:

  • Construct histograms, frequency polygons, and ogives for grouped data
  • Compare and interpret dispersion of grouped data
  • Calculate measures of dispersion for grouped data
  • Determine and interpret the effect of data changes on dispersion

Learning Objectives

After completing this chapter, you will be able to:

  • Construct histograms and frequency polygons
  • Construct and interpret ogives
  • Compare and interpret dispersion of grouped data
  • Determine and interpret range, interquartile range, variance, and standard deviation for grouped data
  • Determine and interpret the effect of changes in data on dispersion

Statistical Diagrams Overview

Data Organization and Class Intervals

Types of Classifications

Class Interval Properties

Key Properties:

  • Class Width: Difference between upper and lower limits
  • Class Midpoint: Average of upper and lower limits
  • Class Boundary: Exact limits between adjacent classes
  • Frequency: Number of observations in each class

Formulas:

  • Midpoint: m=Lower Limit+Upper Limit2m = \frac{\text{Lower Limit} + \text{Upper Limit}}{2}
  • Class Width: w=Upper LimitLower Limit+1w = \text{Upper Limit} - \text{Lower Limit} + 1 (for discrete data)
  • Class Boundary: Upper Boundary=Upper Limit+0.5\text{Upper Boundary} = \text{Upper Limit} + 0.5

Example: Data Organization

Raw Data: 12, 15, 18, 22, 25, 28, 32, 35, 38, 42, 45, 48, 52, 55, 58

Grouped Data (Class Width = 10):

Class IntervalFrequencyMidpointfxfx2x^2
10-19214.529420.5
20-29324.573.51800.75
30-39334.5103.53570.75
40-49344.5133.55940.75
50-59254.51095940.5
50-59254.51095940.5

Total: ∑f = 13, ∑fx = 448, ∑fx2x^2 = 17613.25

Advanced Graph Construction

Detailed Histogram Construction

Important Considerations:

  • Bars must be adjacent (no gaps)
  • Y-axis should start at zero
  • Proper scaling for both axes
  • Clear labeling of class intervals

Frequency Polygon Construction

Key Features:

  • Connects midpoints of classes
  • Shows distribution shape
  • Can be superimposed on histogram
  • Useful for comparing multiple datasets

Ogive Construction

Cumulative Frequency Calculation:

  • Start with first class frequency
  • Add each subsequent frequency to previous total
  • Always begins at zero for first lower boundary

Box Plot Construction

Five-number summary:

  • Minimum: Smallest value
  • Q1Q_1: First quartile (25th percentile)
  • Median: Second quartile (50th percentile)
  • Q3Q_3: Third quartile (75th percentile)
  • Maximum: Largest value

Measures of Dispersion for Grouped Data

Range and Interquartile Range

Range for Grouped Data:

Range=Upper boundary of highest classLower boundary of lowest class\text{Range} = \text{Upper boundary of highest class} - \text{Lower boundary of lowest class}

Interquartile Range (IQR):

IQR=Q3Q1\text{IQR} = Q_3 - Q_1

Quartile Formulas:

Q1=L+(N4Ff)×wQ_1 = L + \left(\frac{\frac{N}{4} - F}{f}\right) \times w Q3=L+(3N4Ff)×wQ_3 = L + \left(\frac{\frac{3N}{4} - F}{f}\right) \times w

Where:

  • L = lower boundary of quartile class
  • N = total frequency
  • F = cumulative frequency before quartile class
  • f = frequency of quartile class
  • w = class width

Variance and Standard Deviation

Population Variance:

σ2=f(xμ)2f=fx2fμ2\sigma^2 = \frac{\sum f(x - \mu)^2}{\sum f} = \frac{\sum fx^2}{\sum f} - \mu^2

Sample Variance:

s2=f(xxˉ)2f1=fx2(fx)2ff1s^2 = \frac{\sum f(x - \bar{x})^2}{\sum f - 1} = \frac{\sum fx^2 - \frac{(\sum fx)^2}{\sum f}}{\sum f - 1}

Standard Deviation:

σ=σ2ors=s2\sigma = \sqrt{\sigma^2} \quad \text{or} \quad s = \sqrt{s^2}

Coefficient of Variation

Coefficient of Variation (CV):

CV=σμ×100%\text{CV} = \frac{\sigma}{\mu} \times 100\%

Uses:

  • Compare dispersion between datasets with different units
  • Measure relative variability
  • Useful for quality control

Mean Absolute Deviation

Mean Absolute Deviation (MAD):

MAD=fxμf\text{MAD} = \frac{\sum f|x - \mu|}{\sum f}

Advantages over variance:

  • Easier to interpret
  • Less affected by extreme values
  • Same units as original data

Statistical Analysis Techniques

Comparison of Datasets

Comparison Methods:

  1. Visual Comparison: Overlay histograms or frequency polygons
  2. Statistical Comparison: Compare means, medians, and dispersion measures
  3. Shape Analysis: Compare skewness and kurtosis
  4. Outlier Detection: Identify extreme values in each dataset

Data Transformation Effects

Effect of Adding Constant:

  • Mean increases by constant
  • Median increases by constant
  • Mode increases by constant
  • Dispersion measures unchanged

Effect of Multiplying by Constant:

  • Mean multiplied by constant
  • Median multiplied by constant
  • Mode multiplied by constant
  • Dispersion measures multiplied by constant

Effect of Adding/Removing Data:

  • Adding extreme values increases dispersion
  • Removing outliers decreases dispersion
  • Changes in middle values affect IQR less than range

Real-world Analysis Examples

Example 1: Income Distribution Analysis

Analysis:

  • Most people in medium income range
  • Right-skewed distribution
  • High variance due to extreme incomes
  • IQR shows middle 50% concentration

Example 2: Test Score Analysis

Analysis:

  • Bell-shaped distribution (normal)
  • Low variance (consistent performance)
  • Mean around 75-80
  • Few outliers in both directions

Advanced Statistical Concepts

Skewness and Kurtosis

Skewness Formula for Grouped Data:

Skewness=f(xμ)3σ3f\text{Skewness} = \frac{\sum f(x - \mu)^3}{\sigma^3 \sum f}

Kurtosis Formula for Grouped Data:

Kurtosis=f(xμ)4σ4f3\text{Kurtosis} = \frac{\sum f(x - \mu)^4}{\sigma^4 \sum f} - 3

Interpretation:

  • Skewness > 0: Right-skewed (tail to right)
  • Skewness < 0: Left-skewed (tail to left)
  • Skewness = 0: Symmetric
  • Kurtosis > 0: Heavy tails (outliers)
  • Kurtosis < 0: Light tails (less outliers)

Percentiles and Deciles

General Percentile Formula:

Pk=L+(kN100Ff)×wP_k = L + \left(\frac{\frac{kN}{100} - F}{f}\right) \times w

Deciles (D1D_1 to D9D_9):

Dj=L+(jN10Ff)×wD_j = L + \left(\frac{\frac{jN}{10} - F}{f}\right) \times w

Where:

  • k = percentile (1-100)
  • j = decile (1-9)
  • Other variables same as quartile formula

Statistical Inference for Grouped Data

Confidence Interval for Mean:

xˉ±zsn\bar{x} \pm z \frac{s}{\sqrt{n}}

Where:

  • xˉ\bar{x} = sample mean
  • z = z-score for confidence level
  • s = sample standard deviation
  • n = sample size

Data Organization Examples

Example: Organizing Exam Scores

Raw Scores: 45, 52, 58, 62, 65, 71, 75, 78, 82, 85, 89, 92, 95, 98, 100

Grouped Data (Class Width = 10):

Class IntervalFrequencyMidpoint (x)fxfx2x^2
40-49144.544.51980.25
50-59254.5109.05940.5
60-69264.5129.08320.25
70-79374.5223.516650.25
80-89384.5253.521420.25
90-99394.5283.526780.25
100-1091104.5104.510920.25

Calculations:

  • ∑f = 15
  • ∑fx = 1147.5
  • ∑fx2x^2 = 92012.0
  • Mean = 1147.5 / 15 = 76.5

Example: Manufacturing Data

Production Data (units/day): 125, 142, 158, 163, 171, 175, 182, 188, 195, 203, 215, 228, 245, 268, 295

Grouped Data (Class Width = 50):

Class IntervalFrequencyMidpoint (x)fxfx2x^2
100-1492124.5249.031000.25
150-1996174.51047.0182752.25
200-2494224.5898.0201750.25
250-2993274.5823.5225877.25

Calculations:

  • ∑f = 15
  • ∑fx = 3017.5
  • ∑fx2x^2 = 641380.0
  • Mean = 3017.5 / 15 = 201.17

Key Concepts

Grouped Data

Grouped data is data that has been organized into class intervals or categories rather than individual values.

Example: Individual scores: 45, 52, 58, 62, 65, 71, 75, 78, 82, 85, 89, 92, 95, 98, 100 Grouped: 40-49: 1, 50-59: 2, 60-69: 2, 70-79: 3, 80-89: 3, 90-100: 4

Histogram

A histogram is a graphical representation of grouped data using adjacent rectangular bars.

Key Features:

  • X-axis: Class intervals or class boundaries
  • Y-axis: Frequency
  • Bars: Adjacent rectangles with heights representing frequencies
  • Width: Represents class width

Example: For data 40-49, 50-59, 60-69 with frequencies 1, 2, 2:

  • Bar for 40-49: height 1
  • Bar for 50-59: height 2
  • Bar for 60-69: height 2

Frequency Polygon

A frequency polygon is a line graph connecting midpoints of class intervals at their respective frequencies.

Construction:

  1. Find midpoint of each class interval
  2. Plot (midpoint, frequency) points
  3. Connect points with straight lines
  4. Close the polygon by connecting to adjacent zero frequencies

Example: For class 50-59 with frequency 2:

  • Midpoint = (50 + 59)/2 = 54.5
  • Plot point (54.5, 2)

Ogive

An ogive (cumulative frequency graph) shows cumulative frequency against upper class boundaries.

Construction:

  1. Find cumulative frequency for each class
  2. Plot (upper class boundary, cumulative frequency) points
  3. Connect points with smooth curve
  4. Always starts from zero at first lower boundary

Uses:

  • Estimate median, quartiles, percentiles
  • Determine number of values below certain thresholds

Important Formulas and Methods

Mean for Grouped Data

μ=fxf\mu = \frac{\sum fx}{\sum f}

Where:

  • f = frequency of each class
  • x = midpoint of each class
  • ∑ = summation

Example: For class 40-49 with frequency 3, midpoint 44.5: fx = 44.5 × 3 = 133.5

Mode from Histogram

Mode is estimated from the tallest bar in the histogram.

Class with highest frequency = Modal class

For more precise mode calculation within modal class:

Mode=L+(fmfm12fmfm1fm+1)×w\text{Mode} = L + \left(\frac{f_m - f_{m-1}}{2f_m - f_{m-1} - f_{m+1}}\right) \times w

Where:

  • L = lower boundary of modal class
  • f_m = frequency of modal class
  • f_{m-1} = frequency of class before modal class
  • f_{m+1} = frequency of class after modal class
  • w = class width

Median and Quartiles from Ogive

Median (Q2Q_2): Value at 50% cumulative frequency First Quartile (Q1Q_1): Value at 25% cumulative frequency Third Quartile (Q3Q_3): Value at 75% cumulative frequency

Method:

  1. Draw ogive
  2. Find cumulative frequency total (N)
  3. For median: N/2
  4. For Q1Q_1: N/4
  5. For Q3Q_3: 3N/4
  6. Read corresponding values from ogive

Variance and Standard Deviation for Grouped Data

Variance:

σ2=f(xμ)2f or σ2=fx2fμ2\sigma^2 = \frac{\sum f(x - \mu)^2}{\sum f} \text{ or } \sigma^2 = \frac{\sum fx^2}{\sum f} - \mu^2

Standard Deviation:

σ=Variance\sigma = \sqrt{\text{Variance}}

Where:

  • f = frequency
  • x = midpoint of class
  • μ = mean

Step-by-Step Solved Examples

Example 1: Histogram Construction

Problem: Construct a histogram for grouped exam scores: 60-69: 5 students, 70-79: 8 students, 80-89: 12 students, 90-99: 5 students

Solution: Step 1: Identify class boundaries and frequencies

  • 60-69: frequency 5
  • 70-79: frequency 8
  • 80-89: frequency 12
  • 90-99: frequency 5

Step 2: Calculate class width Each class has width = 10 (69-60+1=10, etc.)

Step 3: Construct histogram

  • X-axis: Class intervals (60-69, 70-79, 80-89, 90-99)
  • Y-axis: Frequency (0-15)
  • Bars: Height equal to frequency for each class

Answer: Histogram with bars of heights 5, 8, 12, 5 respectively

Example 2: Frequency Polygon

Problem: Construct a frequency polygon for the data: Class intervals: 10-19, 20-29, 30-39, 40-49 Frequencies: 3, 7, 5, 2

Solution: Step 1: Calculate midpoints

  • 10-19: (10+19)/2 = 14.5
  • 20-29: (20+29)/2 = 24.5
  • 30-39: (30+39)/2 = 34.5
  • 40-49: (40+49)/2 = 44.5

Step 2: Plot points

  • (14.5, 3)
  • (24.5, 7)
  • (34.5, 5)
  • (44.5, 2)

Step 3: Connect points and close polygon

  • Connect with straight lines
  • Extend to zero frequencies at 5 and 54.5

Answer: Frequency polygon connecting points (14.5,3), (24.5,7), (34.5,5), (44.5,2)

Example 3: Ogive and Percentiles

Problem: Construct ogive and find median, Q1Q_1, Q3Q_3 for: 40-49: 2, 50-59: 5, 60-69: 8, 70-79: 4, 80-89: 1

Solution: Step 1: Calculate cumulative frequencies

  • 40-49: 2
  • 50-59: 2 + 5 = 7
  • 60-69: 7 + 8 = 15
  • 70-79: 15 + 4 = 19
  • 80-89: 19 + 1 = 20

Step 2: Find upper class boundaries

  • 40-49: upper boundary = 49.5
  • 50-59: upper boundary = 59.5
  • 60-69: upper boundary = 69.5
  • 70-79: upper boundary = 79.5
  • 80-89: upper boundary = 89.5

Step 3: Plot ogive points

  • (49.5, 2), (59.5, 7), (69.5, 15), (79.5, 19), (89.5, 20)

Step 4: Find percentiles Total N = 20

  • Median (Q2Q_2): N/2 = 10 → between (59.5,7) and (69.5,15)
  • Q1Q_1: N/4 = 5 → at (59.5,7)
  • Q3Q_3: 3N/4 = 15 → at (69.5,15)

Answer: Median ≈ 65.2, Q1Q_1 = 59.5, Q3Q_3 = 69.5

Example 4: Mean and Variance Calculation

Problem: Calculate mean and variance for grouped data: 10-19: 3, 20-29: 7, 30-39: 5, 40-49: 2

Solution: Step 1: Calculate midpoints and fx

  • 10-19: midpoint = 14.5, fx = 14.5 × 3 = 43.5
  • 20-29: midpoint = 24.5, fx = 24.5 × 7 = 171.5
  • 30-39: midpoint = 34.5, fx = 34.5 × 5 = 172.5
  • 40-49: midpoint = 44.5, fx = 44.5 × 2 = 89.0

Step 2: Calculate mean ∑fx = 43.5 + 171.5 + 172.5 + 89.0 = 476.5 ∑f = 3 + 7 + 5 + 2 = 17 μ = 476.5 / 17 ≈ 28.03

Step 3: Calculate variance Using σ² = ∑fx2x^2/∑f - μ²

  • fx2x^2: (14.5)²×3 = 630.75, (24.5)²×7 = 4201.75, (34.5)²×5 = 5951.25, (44.5)²×2 = 3960.5 ∑fx2x^2 = 630.75 + 4201.75 + 5951.25 + 3960.5 = 14744.25 σ² = 14744.25/17 - (28.03)² ≈ 867.31 - 785.68 ≈ 81.63

Answer: Mean ≈ 28.03, Variance ≈ 81.63, Standard Deviation ≈ 9.03

Example 5: Effect of Data Changes on Dispersion

Problem: Original grouped data: 10-19: 2, 20-29: 4, 30-39: 3, 40-49: 1 If the highest value changes from 49 to 100, how does dispersion change?

Solution: Original Data:

  • Range: 49 - 10 = 39
  • Q1Q_1: 25th percentile ≈ 22.5, Q3Q_3: 75th percentile ≈ 34.5, IQR = 12
  • Mean: ≈ 26.4, SD: ≈ 10.2

Modified Data (10-19: 2, 20-29: 4, 30-39: 3, 40-100: 1):

  • Range: 100 - 10 = 90 (increased significantly)
  • Q1Q_1 and Q3Q_3 unchanged (same frequencies in lower classes)
  • IQR: still 12 (unchanged)
  • Mean: ∑fx/∑f = (14.5×2 + 24.5×4 + 34.5×3 + 70×1)/10 = (29 + 98 + 103.5 + 70)/10 = 300.5/10 = 30.05
  • SD: √[∑fx2x^2/10 - (30.05)²] = √[(420.5 + 2401 + 3570.75 + 4900)/10 - 903] = √[11292.25/10 - 903] = √[1129.23 - 903] = √226.23 ≈ 15.04

Analysis:

  • Range increased from 39 to 90
  • IQR unchanged (12) because middle 50% of data unchanged
  • Mean increased slightly
  • Standard deviation increased significantly from 10.2 to 15.04

Answer: Range and standard deviation increased, IQR unchanged, indicating extreme values affect overall dispersion more than middle spread.

Real-world Applications

1. Market Research

  • Consumer Surveys: Analyzing age groups, income brackets
  • Product Categories: Sales by price ranges, age demographics
  • Customer Satisfaction: Rating distributions (1-5, 6-10 scales)

2. Quality Control

  • Manufacturing: Product measurements in ranges
  • Testing: Score distributions for quality assurance
  • Defect Analysis: Frequency of defect categories

3. Education

  • Test Scores: Grade distributions by percentage ranges
  • Student Performance: Achievement level categories
  • Standardized Testing: Percentile rankings and distributions

4. Demographics

  • Age Groups: Population by age brackets
  • Income Distribution: Household income categories
  • Geographic Data: Regional population statistics

Important Terms

TermDefinitionExample
Grouped DataData organized into class intervals10-19, 20-29, 30-39
HistogramBar graph for grouped dataAdjacent rectangles for frequencies
Frequency PolygonLine graph connecting midpointsConnected (midpoint, frequency) points
OgiveCumulative frequency graphShows cumulative vs. upper boundary
Class IntervalRange of values in a group50-59 minutes
MidpointAverage of class boundaries(50+59)/2 = 54.5 for 50-59
Cumulative FrequencyRunning total of frequencies2, 7, 15, 19, 20
Modal ClassClass with highest frequencyClass 80-89 with frequency 12
Class BoundaryExact limits between classes59.5 between 50-59 and 60-69

Summary Points

  • Histogram: Adjacent bars representing class frequencies
  • Frequency Polygon: Line graph connecting midpoints of classes
  • Ogive: Cumulative frequency graph for median/quartile estimation
  • Mean Formula: μ = ∑fx/∑f (f = frequency, x = midpoint)
  • Median from Ogive: Value at N/2 cumulative frequency
  • Variance for Grouped: σ² = ∑fx2x^2/∑f - μ²
  • IQR: Q3Q_3 - Q1Q_1 (measures middle 50% spread)
  • Effect of Changes: Extreme values affect range and SD more than IQR

Practice Tips for SPM Students

1. Graph Construction Skills

  • Practice drawing accurate histograms with proper scaling
  • Learn to calculate and plot frequency polygons
  • Master ogive construction and interpretation

2. Statistical Calculations

  • Practice mean and variance calculations for grouped data
  • Learn to estimate mode from histograms
  • Understand how to find percentiles from ogives

3. Interpretation Skills

  • Learn to interpret dispersion measures in context
  • Understand the relationship between different measures
  • Practice explaining data characteristics using statistics

4. Common Mistakes to Avoid

  • Incorrect midpoint calculations
  • Wrong cumulative frequency sums
  • Confusing class boundaries with class intervals
  • Misinterpretation of ogive values

SPM Exam Tips

Paper 1 (Multiple Choice)

  • Look for key grouped data terminology
  • Remember histogram and ogive characteristics
  • Practice quick statistical calculations
  • Use elimination method for difficult questions

Paper 2 (Structured)

  • Show all calculation steps clearly
  • Demonstrate graph construction methods
  • Explain statistical interpretations in context
  • Use proper statistical terminology throughout

Did You Know? The concept of grouped data analysis originated from the work of Karl Pearson in the late 19th century. Today, it's used everywhere from election polling to market research, helping us understand patterns in large datasets that would otherwise be overwhelming to analyze!

Next Chapter: In Chapter 8, you'll explore mathematical modeling, learning to apply linear, quadratic, and exponential functions to model real-world situations and solve practical problems.