SPM Wiki

SPM WikiMathematicsChapter 8: Measures of Dispersion for Ungrouped Data

Chapter 8: Measures of Dispersion for Ungrouped Data

Master data spread analysis using range, quartiles, variance, and standard deviation for ungrouped data.

Chapter 8: Measures of Dispersion for Ungrouped Data

Overview

Welcome to Chapter 8 of Form 4 Mathematics! This chapter introduces you to measures of dispersion, which help us analyze how data is spread out. While measures of central tendency (mean, median, mode) tell us about the center of the data, measures of dispersion tell us about the spread or variability of the data.

You'll learn to compare and interpret data dispersion using stem-and-leaf plots, dot plots, and various statistical measures including range, interquartile range, variance, and standard deviation.

What You'll Learn:

  • Understand the concept of data dispersion.
  • Compare and interpret dispersion of multiple data sets.
  • Construct and interpret stem-and-leaf plots and dot plots.
  • Calculate and interpret various measures of dispersion (Range, IQR, Variance, Standard Deviation).
  • Construct and interpret box plots.
  • Analyze the effect of data changes on dispersion.

Learning Objectives

After completing this chapter, you will be able to:

  • Explain the meaning of dispersion.
  • Compare and interpret dispersion of two or more sets of data based on stem-and-leaf plots and dot plots.
  • Determine and explain the range, interquartile range, variance, and standard deviation.
  • Construct and interpret box plots.
  • Determine and interpret the effect of changes in data on dispersion.

Key Concepts

1. Dispersion

Dispersion (or spread) measures how scattered the data points are in a dataset.

  • Small Dispersion: Data points are clustered closely around the mean or median. The data is consistent.
  • Large Dispersion: Data points are spread far apart from the mean or median. The data is inconsistent or variable.

Visual Comparison:

2. Data Representation

Stem-and-Leaf Plots

A stem-and-leaf plot splits each data value into a "stem" (the first digit or digits) and a "leaf" (usually the last digit).

Example: Data: 23, 25, 28, 31, 32, 35

Stem | Leaf
 2   | 3 5 8
 3   | 1 2 5

Key: 2 | 3 means 23

Construction Steps:

  1. Identify the stem (tens digit) and leaf (units digit).
  2. List stems in a vertical column.
  3. Write leaves in the row corresponding to their stem.
  4. Sort the leaves in ascending order.

Dot Plots

A dot plot uses dots placed above a number line to represent the frequency of data values.

Example: Data: 1, 2, 2, 3, 3, 3, 4, 4, 5

      •
    • • •
  • • • • •
--------------
  1 2 3 4 5

3. Measures of Dispersion

We use four main measures to quantify dispersion:

A. Range

The simplest measure of spread.

Range=Maximum ValueMinimum Value\text{Range} = \text{Maximum Value} - \text{Minimum Value}

Note: Very sensitive to outliers (extreme values).

B. Interquartile Range (IQR)

Measures the spread of the middle 50% of the data. It is less affected by outliers.

  • First Quartile (Q1Q_1): The median of the lower half of the data (25th percentile).
  • Median (Q2Q_2): The middle value of the data (50th percentile).
  • Third Quartile (Q3Q_3): The median of the upper half of the data (75th percentile).
IQR=Q3Q1\text{IQR} = Q_3 - Q_1

How to find Quartiles (Method of Splitting):

  1. Arrange data in ascending order.
  2. Find the Median (Q2Q_2).
  3. Split the data into a lower half and an upper half.
    • If NN is odd, exclude the median from both halves.
    • If NN is even, split the data down the middle.
  4. Q1Q_1 is the median of the lower half.
  5. Q3Q_3 is the median of the upper half.

C. Variance (σ2\sigma^2)

The average of the squared differences from the mean. It measures how far each number in the set is from the mean and thus from every other number in the set.

σ2=(xμ)2N=x2Nμ2\sigma^2 = \frac{\sum (x - \mu)^2}{N} = \frac{\sum x^2}{N} - \mu^2

Where:

  • xx = data value
  • μ\mu = mean
  • NN = number of data points

D. Standard Deviation (σ\sigma)

The square root of the variance. It is the most common measure of dispersion because it is in the same units as the original data.

σ=σ2=(xμ)2N\sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum (x - \mu)^2}{N}}

4. Box Plots

A box plot (or box-and-whisker plot) visually displays the five-number summary of a set of data.

The Five-Number Summary:

  1. Minimum
  2. First Quartile (Q1Q_1)
  3. Median (Q2Q_2)
  4. Third Quartile (Q3Q_3)
  5. Maximum

Structure:

  • A box spans from Q1Q_1 to Q3Q_3 (representing the IQR).
  • A line inside the box marks the Median (Q2Q_2).
  • Whiskers extend from the box to the Minimum and Maximum values.

Outliers: Values that are unusually small or large.

  • Lower Boundary = Q11.5×IQRQ_1 - 1.5 \times \text{IQR}
  • Upper Boundary = Q3+1.5×IQRQ_3 + 1.5 \times \text{IQR} Any data point outside these boundaries is considered an outlier.

Step-by-Step Solved Examples

Example 1: Basic Measures

Problem: Find the range, variance, and standard deviation for: 25, 30, 35, 40, 45, 50, 55.

Solution:

  1. Range: 5525=3055 - 25 = 30
  2. Mean (μ\mu): 25+30+35+40+45+50+557=2807=40\frac{25+30+35+40+45+50+55}{7} = \frac{280}{7} = 40
  3. Variance (σ2\sigma^2): σ2=(2540)2+(3040)2+...+(5540)27\sigma^2 = \frac{(25-40)^2 + (30-40)^2 + ... + (55-40)^2}{7} σ2=225+100+25+0+25+100+2257=7007=100\sigma^2 = \frac{225 + 100 + 25 + 0 + 25 + 100 + 225}{7} = \frac{700}{7} = 100
  4. Standard Deviation (σ\sigma): 100=10\sqrt{100} = 10

Example 2: Comparing Dispersion

Problem: Class A: 70, 72, 75, 78, 80 Class B: 50, 60, 75, 90, 100

Analysis:

  • Range A: 8070=1080 - 70 = 10
  • Range B: 10050=50100 - 50 = 50
  • Conclusion: Class B has a larger dispersion (spread) than Class A. Class A's results are more consistent.

Example 3: Box Plot Construction (Corrected)

Problem: Construct a box plot for the data: 12,15,18,20,22,25,28,30,35,4012, 15, 18, 20, 22, 25, 28, 30, 35, 40

Solution:

  1. Arrange Data: 12,15,18,20,22,25,28,30,35,4012, 15, 18, 20, 22, 25, 28, 30, 35, 40 (N=10N=10)
  2. Find Median (Q2Q_2):
    • NN is even (10). Median is average of 5th and 6th values.
    • 5th value = 22, 6th value = 25.
    • Q2=22+252=23.5Q_2 = \frac{22 + 25}{2} = 23.5
  3. Find Quartiles:
    • Split data into lower and upper halves.
    • Lower Half: 12,15,18,20,2212, 15, 18, 20, 22 (5 values).
    • Median of Lower Half (Q1Q_1) = 3rd value = 18.
    • Upper Half: 25,28,30,35,4025, 28, 30, 35, 40 (5 values).
    • Median of Upper Half (Q3Q_3) = 3rd value of upper half = 30.
  4. Calculate IQR:
    • IQR=Q3Q1=3018=12\text{IQR} = Q_3 - Q_1 = 30 - 18 = 12.
  5. Check for Outliers:
    • Lower Limit: 181.5(12)=1818=018 - 1.5(12) = 18 - 18 = 0. (Min is 12, so no lower outliers).
    • Upper Limit: 30+1.5(12)=30+18=4830 + 1.5(12) = 30 + 18 = 48. (Max is 40, so no upper outliers).

Five-Number Summary:

  • Min: 12
  • Q1Q_1: 18
  • Median: 23.5
  • Q3Q_3: 30
  • Max: 40

Visual Representation:

      Min=12     Q1=18      Median=23.5    Q3=30      Max=40
        |----------|-----------|-----------|-----------|
                   [-----------|-----------]
                       Box (Q1 to Q3)

Example 4: Effect of Extreme Values

Problem: Set A: 10, 20, 30, 40, 50 (Mean = 30, Range = 40) Set B: 10, 20, 30, 40, 100 (Mean = 40, Range = 90)

Observation: Changing just one value to an extreme (30 \to 100) significantly increases the Range, Variance, and Standard Deviation. However, the IQR might remain unchanged or change less drastically depending on the position of the change, making IQR a more robust measure for skewed data.

Real-world Applications

1. Quality Control in Manufacturing

Factories use standard deviation to ensure products are consistent.

  • Low SD: Products are identical (High Quality).
  • High SD: Products vary in size/weight (Low Quality).

2. Finance and Investment

Investors use variance to measure risk.

  • High Dispersion: High volatility (risky stock).
  • Low Dispersion: Stable price (safe investment).

3. Education

Teachers use dispersion to understand class performance.

  • A class with a small range of scores means students are at a similar level.
  • A class with a large range implies a gap between high and low achievers.

Summary Points

MeasureFormula/DefinitionKey Characteristic
RangeMax - MinEasiest to calculate, sensitive to outliers.
IQRQ3Q1Q_3 - Q_1Measures middle 50%, robust against outliers.
Variance(xμ)2N\frac{\sum(x-\mu)^2}{N}Uses all data points, units are squared.
Std DevVariance\sqrt{\text{Variance}}Most common, same units as data.

SPM Exam Tips

  1. Show Your Steps: In Paper 2, always show the substitution into the variance/standard deviation formula.
  2. Check Units: Remember that Variance has squared units (e.g., cm2cm^2), while Standard Deviation has the same units as the data (e.g., cmcm).
  3. Outliers: If asked to check for outliers, calculate the boundaries Q11.5(IQR)Q_1 - 1.5(\text{IQR}) and Q3+1.5(IQR)Q_3 + 1.5(\text{IQR}).
  4. Calculator: Learn to use your scientific calculator's "Statistics Mode" (SD mode) to verify your answers quickly.