Chapter 17: Measures of Dispersion for Grouped Data
Master data analysis for grouped data including histograms, ogives, and dispersion measures.
Chapter 17: Measures of Dispersion for Grouped Data
Overview
Welcome to Chapter 17 of Form 5 Mathematics! This chapter builds on your statistics knowledge from Form 4 by extending measures of dispersion to grouped data. You'll learn to construct histograms and frequency polygons, compare and interpret data dispersion, and calculate various statistical measures for categorized data. These skills are essential for real-world data analysis and interpretation.
What You'll Learn:
- Construct histograms, frequency polygons, and ogives for grouped data
- Compare and interpret dispersion of grouped data
- Calculate measures of dispersion for grouped data
- Determine and interpret the effect of data changes on dispersion
Learning Objectives
After completing this chapter, you will be able to:
- Construct histograms and frequency polygons
- Construct and interpret ogives
- Compare and interpret dispersion of grouped data
- Determine and interpret range, interquartile range, variance, and standard deviation for grouped data
- Determine and interpret the effect of changes in data on dispersion
Statistical Diagrams Overview
Data Organization and Class Intervals
Types of Classifications
Class Interval Properties
Key Properties:
- Class Width: Difference between upper and lower limits
- Class Midpoint: Average of upper and lower limits
- Class Boundary: Exact limits between adjacent classes
- Frequency: Number of observations in each class
Formulas:
- Midpoint:
- Class Width: (for discrete data)
- Class Boundary:
Example: Data Organization
Raw Data: 12, 15, 18, 22, 25, 28, 32, 35, 38, 42, 45, 48, 52, 55, 58
Grouped Data (Class Width = 10):
| Class Interval | Frequency | Midpoint | fx | f |
|---|---|---|---|---|
| 10-19 | 2 | 14.5 | 29 | 420.5 |
| 20-29 | 3 | 24.5 | 73.5 | 1800.75 |
| 30-39 | 3 | 34.5 | 103.5 | 3570.75 |
| 40-49 | 3 | 44.5 | 133.5 | 5940.75 |
| 50-59 | 2 | 54.5 | 109 | 5940.5 |
| 50-59 | 2 | 54.5 | 109 | 5940.5 |
Total: ∑f = 13, ∑fx = 448, ∑f = 17613.25
Advanced Graph Construction
Detailed Histogram Construction
Important Considerations:
- Bars must be adjacent (no gaps)
- Y-axis should start at zero
- Proper scaling for both axes
- Clear labeling of class intervals
Frequency Polygon Construction
Key Features:
- Connects midpoints of classes
- Shows distribution shape
- Can be superimposed on histogram
- Useful for comparing multiple datasets
Ogive Construction
Cumulative Frequency Calculation:
- Start with first class frequency
- Add each subsequent frequency to previous total
- Always begins at zero for first lower boundary
Box Plot Construction
Five-number summary:
- Minimum: Smallest value
- : First quartile (25th percentile)
- Median: Second quartile (50th percentile)
- : Third quartile (75th percentile)
- Maximum: Largest value
Measures of Dispersion for Grouped Data
Range and Interquartile Range
Range for Grouped Data:
Interquartile Range (IQR):
Quartile Formulas:
Where:
- L = lower boundary of quartile class
- N = total frequency
- F = cumulative frequency before quartile class
- f = frequency of quartile class
- w = class width
Variance and Standard Deviation
Population Variance:
Sample Variance:
Standard Deviation:
Coefficient of Variation
Coefficient of Variation (CV):
Uses:
- Compare dispersion between datasets with different units
- Measure relative variability
- Useful for quality control
Mean Absolute Deviation
Mean Absolute Deviation (MAD):
Advantages over variance:
- Easier to interpret
- Less affected by extreme values
- Same units as original data
Statistical Analysis Techniques
Comparison of Datasets
Comparison Methods:
- Visual Comparison: Overlay histograms or frequency polygons
- Statistical Comparison: Compare means, medians, and dispersion measures
- Shape Analysis: Compare skewness and kurtosis
- Outlier Detection: Identify extreme values in each dataset
Data Transformation Effects
Effect of Adding Constant:
- Mean increases by constant
- Median increases by constant
- Mode increases by constant
- Dispersion measures unchanged
Effect of Multiplying by Constant:
- Mean multiplied by constant
- Median multiplied by constant
- Mode multiplied by constant
- Dispersion measures multiplied by constant
Effect of Adding/Removing Data:
- Adding extreme values increases dispersion
- Removing outliers decreases dispersion
- Changes in middle values affect IQR less than range
Real-world Analysis Examples
Example 1: Income Distribution Analysis
Analysis:
- Most people in medium income range
- Right-skewed distribution
- High variance due to extreme incomes
- IQR shows middle 50% concentration
Example 2: Test Score Analysis
Analysis:
- Bell-shaped distribution (normal)
- Low variance (consistent performance)
- Mean around 75-80
- Few outliers in both directions
Advanced Statistical Concepts
Skewness and Kurtosis
Skewness Formula for Grouped Data:
Kurtosis Formula for Grouped Data:
Interpretation:
- Skewness > 0: Right-skewed (tail to right)
- Skewness < 0: Left-skewed (tail to left)
- Skewness = 0: Symmetric
- Kurtosis > 0: Heavy tails (outliers)
- Kurtosis < 0: Light tails (less outliers)
Percentiles and Deciles
General Percentile Formula:
Deciles ( to ):
Where:
- k = percentile (1-100)
- j = decile (1-9)
- Other variables same as quartile formula
Statistical Inference for Grouped Data
Confidence Interval for Mean:
Where:
- = sample mean
- z = z-score for confidence level
- s = sample standard deviation
- n = sample size
Data Organization Examples
Example: Organizing Exam Scores
Raw Scores: 45, 52, 58, 62, 65, 71, 75, 78, 82, 85, 89, 92, 95, 98, 100
Grouped Data (Class Width = 10):
| Class Interval | Frequency | Midpoint (x) | fx | f |
|---|---|---|---|---|
| 40-49 | 1 | 44.5 | 44.5 | 1980.25 |
| 50-59 | 2 | 54.5 | 109.0 | 5940.5 |
| 60-69 | 2 | 64.5 | 129.0 | 8320.25 |
| 70-79 | 3 | 74.5 | 223.5 | 16650.25 |
| 80-89 | 3 | 84.5 | 253.5 | 21420.25 |
| 90-99 | 3 | 94.5 | 283.5 | 26780.25 |
| 100-109 | 1 | 104.5 | 104.5 | 10920.25 |
Calculations:
- ∑f = 15
- ∑fx = 1147.5
- ∑f = 92012.0
- Mean = 1147.5 / 15 = 76.5
Example: Manufacturing Data
Production Data (units/day): 125, 142, 158, 163, 171, 175, 182, 188, 195, 203, 215, 228, 245, 268, 295
Grouped Data (Class Width = 50):
| Class Interval | Frequency | Midpoint (x) | fx | f |
|---|---|---|---|---|
| 100-149 | 2 | 124.5 | 249.0 | 31000.25 |
| 150-199 | 6 | 174.5 | 1047.0 | 182752.25 |
| 200-249 | 4 | 224.5 | 898.0 | 201750.25 |
| 250-299 | 3 | 274.5 | 823.5 | 225877.25 |
Calculations:
- ∑f = 15
- ∑fx = 3017.5
- ∑f = 641380.0
- Mean = 3017.5 / 15 = 201.17
Key Concepts
Grouped Data
Grouped data is data that has been organized into class intervals or categories rather than individual values.
Example: Individual scores: 45, 52, 58, 62, 65, 71, 75, 78, 82, 85, 89, 92, 95, 98, 100 Grouped: 40-49: 1, 50-59: 2, 60-69: 2, 70-79: 3, 80-89: 3, 90-100: 4
Histogram
A histogram is a graphical representation of grouped data using adjacent rectangular bars.
Key Features:
- X-axis: Class intervals or class boundaries
- Y-axis: Frequency
- Bars: Adjacent rectangles with heights representing frequencies
- Width: Represents class width
Example: For data 40-49, 50-59, 60-69 with frequencies 1, 2, 2:
- Bar for 40-49: height 1
- Bar for 50-59: height 2
- Bar for 60-69: height 2
Frequency Polygon
A frequency polygon is a line graph connecting midpoints of class intervals at their respective frequencies.
Construction:
- Find midpoint of each class interval
- Plot (midpoint, frequency) points
- Connect points with straight lines
- Close the polygon by connecting to adjacent zero frequencies
Example: For class 50-59 with frequency 2:
- Midpoint = (50 + 59)/2 = 54.5
- Plot point (54.5, 2)
Ogive
An ogive (cumulative frequency graph) shows cumulative frequency against upper class boundaries.
Construction:
- Find cumulative frequency for each class
- Plot (upper class boundary, cumulative frequency) points
- Connect points with smooth curve
- Always starts from zero at first lower boundary
Uses:
- Estimate median, quartiles, percentiles
- Determine number of values below certain thresholds
Important Formulas and Methods
Mean for Grouped Data
Where:
- f = frequency of each class
- x = midpoint of each class
- ∑ = summation
Example: For class 40-49 with frequency 3, midpoint 44.5: fx = 44.5 × 3 = 133.5
Mode from Histogram
Mode is estimated from the tallest bar in the histogram.
Class with highest frequency = Modal class
For more precise mode calculation within modal class:
Where:
- L = lower boundary of modal class
- f_m = frequency of modal class
- f_{m-1} = frequency of class before modal class
- f_{m+1} = frequency of class after modal class
- w = class width
Median and Quartiles from Ogive
Median (): Value at 50% cumulative frequency First Quartile (): Value at 25% cumulative frequency Third Quartile (): Value at 75% cumulative frequency
Method:
- Draw ogive
- Find cumulative frequency total (N)
- For median: N/2
- For : N/4
- For : 3N/4
- Read corresponding values from ogive
Variance and Standard Deviation for Grouped Data
Variance:
Standard Deviation:
Where:
- f = frequency
- x = midpoint of class
- μ = mean
Step-by-Step Solved Examples
Example 1: Histogram Construction
Problem: Construct a histogram for grouped exam scores: 60-69: 5 students, 70-79: 8 students, 80-89: 12 students, 90-99: 5 students
Solution: Step 1: Identify class boundaries and frequencies
- 60-69: frequency 5
- 70-79: frequency 8
- 80-89: frequency 12
- 90-99: frequency 5
Step 2: Calculate class width Each class has width = 10 (69-60+1=10, etc.)
Step 3: Construct histogram
- X-axis: Class intervals (60-69, 70-79, 80-89, 90-99)
- Y-axis: Frequency (0-15)
- Bars: Height equal to frequency for each class
Answer: Histogram with bars of heights 5, 8, 12, 5 respectively
Example 2: Frequency Polygon
Problem: Construct a frequency polygon for the data: Class intervals: 10-19, 20-29, 30-39, 40-49 Frequencies: 3, 7, 5, 2
Solution: Step 1: Calculate midpoints
- 10-19: (10+19)/2 = 14.5
- 20-29: (20+29)/2 = 24.5
- 30-39: (30+39)/2 = 34.5
- 40-49: (40+49)/2 = 44.5
Step 2: Plot points
- (14.5, 3)
- (24.5, 7)
- (34.5, 5)
- (44.5, 2)
Step 3: Connect points and close polygon
- Connect with straight lines
- Extend to zero frequencies at 5 and 54.5
Answer: Frequency polygon connecting points (14.5,3), (24.5,7), (34.5,5), (44.5,2)
Example 3: Ogive and Percentiles
Problem: Construct ogive and find median, , for: 40-49: 2, 50-59: 5, 60-69: 8, 70-79: 4, 80-89: 1
Solution: Step 1: Calculate cumulative frequencies
- 40-49: 2
- 50-59: 2 + 5 = 7
- 60-69: 7 + 8 = 15
- 70-79: 15 + 4 = 19
- 80-89: 19 + 1 = 20
Step 2: Find upper class boundaries
- 40-49: upper boundary = 49.5
- 50-59: upper boundary = 59.5
- 60-69: upper boundary = 69.5
- 70-79: upper boundary = 79.5
- 80-89: upper boundary = 89.5
Step 3: Plot ogive points
- (49.5, 2), (59.5, 7), (69.5, 15), (79.5, 19), (89.5, 20)
Step 4: Find percentiles Total N = 20
- Median (): N/2 = 10 → between (59.5,7) and (69.5,15)
- : N/4 = 5 → at (59.5,7)
- : 3N/4 = 15 → at (69.5,15)
Answer: Median ≈ 65.2, = 59.5, = 69.5
Example 4: Mean and Variance Calculation
Problem: Calculate mean and variance for grouped data: 10-19: 3, 20-29: 7, 30-39: 5, 40-49: 2
Solution: Step 1: Calculate midpoints and fx
- 10-19: midpoint = 14.5, fx = 14.5 × 3 = 43.5
- 20-29: midpoint = 24.5, fx = 24.5 × 7 = 171.5
- 30-39: midpoint = 34.5, fx = 34.5 × 5 = 172.5
- 40-49: midpoint = 44.5, fx = 44.5 × 2 = 89.0
Step 2: Calculate mean ∑fx = 43.5 + 171.5 + 172.5 + 89.0 = 476.5 ∑f = 3 + 7 + 5 + 2 = 17 μ = 476.5 / 17 ≈ 28.03
Step 3: Calculate variance Using σ² = ∑f/∑f - μ²
- f: (14.5)²×3 = 630.75, (24.5)²×7 = 4201.75, (34.5)²×5 = 5951.25, (44.5)²×2 = 3960.5 ∑f = 630.75 + 4201.75 + 5951.25 + 3960.5 = 14744.25 σ² = 14744.25/17 - (28.03)² ≈ 867.31 - 785.68 ≈ 81.63
Answer: Mean ≈ 28.03, Variance ≈ 81.63, Standard Deviation ≈ 9.03
Example 5: Effect of Data Changes on Dispersion
Problem: Original grouped data: 10-19: 2, 20-29: 4, 30-39: 3, 40-49: 1 If the highest value changes from 49 to 100, how does dispersion change?
Solution: Original Data:
- Range: 49 - 10 = 39
- : 25th percentile ≈ 22.5, : 75th percentile ≈ 34.5, IQR = 12
- Mean: ≈ 26.4, SD: ≈ 10.2
Modified Data (10-19: 2, 20-29: 4, 30-39: 3, 40-100: 1):
- Range: 100 - 10 = 90 (increased significantly)
- and unchanged (same frequencies in lower classes)
- IQR: still 12 (unchanged)
- Mean: ∑fx/∑f = (14.5×2 + 24.5×4 + 34.5×3 + 70×1)/10 = (29 + 98 + 103.5 + 70)/10 = 300.5/10 = 30.05
- SD: √[∑f/10 - (30.05)²] = √[(420.5 + 2401 + 3570.75 + 4900)/10 - 903] = √[11292.25/10 - 903] = √[1129.23 - 903] = √226.23 ≈ 15.04
Analysis:
- Range increased from 39 to 90
- IQR unchanged (12) because middle 50% of data unchanged
- Mean increased slightly
- Standard deviation increased significantly from 10.2 to 15.04
Answer: Range and standard deviation increased, IQR unchanged, indicating extreme values affect overall dispersion more than middle spread.
Real-world Applications
1. Market Research
- Consumer Surveys: Analyzing age groups, income brackets
- Product Categories: Sales by price ranges, age demographics
- Customer Satisfaction: Rating distributions (1-5, 6-10 scales)
2. Quality Control
- Manufacturing: Product measurements in ranges
- Testing: Score distributions for quality assurance
- Defect Analysis: Frequency of defect categories
3. Education
- Test Scores: Grade distributions by percentage ranges
- Student Performance: Achievement level categories
- Standardized Testing: Percentile rankings and distributions
4. Demographics
- Age Groups: Population by age brackets
- Income Distribution: Household income categories
- Geographic Data: Regional population statistics
Important Terms
| Term | Definition | Example |
|---|---|---|
| Grouped Data | Data organized into class intervals | 10-19, 20-29, 30-39 |
| Histogram | Bar graph for grouped data | Adjacent rectangles for frequencies |
| Frequency Polygon | Line graph connecting midpoints | Connected (midpoint, frequency) points |
| Ogive | Cumulative frequency graph | Shows cumulative vs. upper boundary |
| Class Interval | Range of values in a group | 50-59 minutes |
| Midpoint | Average of class boundaries | (50+59)/2 = 54.5 for 50-59 |
| Cumulative Frequency | Running total of frequencies | 2, 7, 15, 19, 20 |
| Modal Class | Class with highest frequency | Class 80-89 with frequency 12 |
| Class Boundary | Exact limits between classes | 59.5 between 50-59 and 60-69 |
Summary Points
- Histogram: Adjacent bars representing class frequencies
- Frequency Polygon: Line graph connecting midpoints of classes
- Ogive: Cumulative frequency graph for median/quartile estimation
- Mean Formula: μ = ∑fx/∑f (f = frequency, x = midpoint)
- Median from Ogive: Value at N/2 cumulative frequency
- Variance for Grouped: σ² = ∑f/∑f - μ²
- IQR: - (measures middle 50% spread)
- Effect of Changes: Extreme values affect range and SD more than IQR
Practice Tips for SPM Students
1. Graph Construction Skills
- Practice drawing accurate histograms with proper scaling
- Learn to calculate and plot frequency polygons
- Master ogive construction and interpretation
2. Statistical Calculations
- Practice mean and variance calculations for grouped data
- Learn to estimate mode from histograms
- Understand how to find percentiles from ogives
3. Interpretation Skills
- Learn to interpret dispersion measures in context
- Understand the relationship between different measures
- Practice explaining data characteristics using statistics
4. Common Mistakes to Avoid
- Incorrect midpoint calculations
- Wrong cumulative frequency sums
- Confusing class boundaries with class intervals
- Misinterpretation of ogive values
SPM Exam Tips
Paper 1 (Multiple Choice)
- Look for key grouped data terminology
- Remember histogram and ogive characteristics
- Practice quick statistical calculations
- Use elimination method for difficult questions
Paper 2 (Structured)
- Show all calculation steps clearly
- Demonstrate graph construction methods
- Explain statistical interpretations in context
- Use proper statistical terminology throughout
Did You Know? The concept of grouped data analysis originated from the work of Karl Pearson in the late 19th century. Today, it's used everywhere from election polling to market research, helping us understand patterns in large datasets that would otherwise be overwhelming to analyze!
Next Chapter: In Chapter 8, you'll explore mathematical modeling, learning to apply linear, quadratic, and exponential functions to model real-world situations and solve practical problems.