Measures of Spread

Ray Block, Jr.
PS 585 (Research Methods)
Fall 2003

Today’s Blueprint
Last Class
n    Univariate Data Analysis (Part 1)
n    Statistical Models
n    Measures of Central Tendency
Today’s Class
n    Univariate Data Analysis (Part 2)
n    Statistical Models (A Recap)
n    Measures of Variability

Statistical Models (A Recap)

Recap
What Models Are:
n    Symbolic representations of social phenomena
n    Statistical models use mathematical/statistical symbols

What Purpose Do [Statistical] Models Serve:
n    Discuss significant relationships among concepts
n    Enable researchers to form testable propositions between variables
n    Summarize data

The Goal of Statistical Modeling:
n    To build a model that best represents the real-world phenomena of interest
n    The degree to which a statistical model represents the data collected is known as the fit of the model to the data

How Do You Build Statistical Models
n    Observe some facts about the world
n    Speculate about the process(es) that produced those facts
n    Collect data that represent the process(es)
n    Reduce the process(es) to a statistical model using the data you collected

Statistical Models Fall Into 2 Categories
n    Models measuring Central Tendencies
n    Models measuring Variability

Recap
Central Tendencies (The 4 “Ms”)
1) The Midpoint/Midrange
n    Description: Picking the middle slice of bread
n    Level of measurement:  Ordinal, Interval, Ratio
n    Shape of Distribution: N/A
n    Research Objective: Crude measure of central tendency
n    Note: Seldom used in social science

2) The Mode (Mo)
n    Description: Maximum Frequency
n    Level of Measurement: Nominal
n    Shape of Distribution: Most appropriate for bimodal or multimodal
n    Research Objective: Fast, simple, but rough measure of central tendency

3) The Median (Mdn)
n    Description: Middlemost Value
n    Level of Measurement: Ordinal, Interval, or Ratio
n    Shape of Distribution: Most appropriate for highly skewed
n    Research Objective: Precise measure of central tendency
n    Note: Sometimes used to split distributions into categories (i.e. high vs. low)

4) The [Arithmetic] Mean (X-Bar)
n    Description: Center of Gravity
n    Level of Measurement: Interval or Ratio
n    Shape of Distribution: Most appropriate for unimodal symmetrical
n    Research Objective: Precise measure of central tendency
n    Note: Most commonly-used central measure.  Used for hypothesis tests and other statistical operations

Recap
Finding the mode, median, and mean:
n    Arrange scores from highest to lowest
n    The mode is the most frequent score
n    The Median is the middlemost value in the ordered list of scores  
n    If there is an odd number of scores, then median is in the exact middle of the list
n    If there is an even number of scores, then the median is halfway between the two middlemost scores
n    Determine the sum of the scores
n    Calculate the mean by dividing the sum by the number of scores

Measures of Variability
AKA: Measures of “Spread” “Width” or “Dispersion”

Measures of Variability
n    In data analysis, the purpose of calculating measures of dispersion is to discover the extent to which scores differ, cluster, or scatter around a measure of central tendency

Some Measures of Spread:
n    The Range
n    The Mean Deviation
n    The Variance
n    The Standard Deviation
n    Standard Error

Measures of Variability
n    The Range is the difference between the highest and the lowest score: R = H – L
n    Where:
n    R = Range
n    H = Highest score in a distribution
n    L = Lowest score in a distribution
n    Advantages:
n    Quick and easy to calculate
n    Disadvantage:
n    Crude measure of variability
n    Why? Because it depends only on lowest and highest values in distribution

Measures of Variability
n    Deviation = The distance between any given raw score and its mean (Xi – X-Bar)
n    Mean Deviation = The average distance between the raw scores and the mean

Where:
n    MD = Mean Deviation
n    S|Xi - X-Bar| = Sum of absolute deviations (disregarding plus or minus signs)
n    N = Total number of scores

Step-by-Step Illustration:
n    Take the following list of numbers (arranged from highest to lowest):








n    Step 1: Find the mean of the distribution


n    Step 2: Subtract the mean from each raw score
n    Take the absolute values (ignore the signs)
n    Add up these absolute deviations


n    Step 3: To get MD, Divide S|X - X-bar| by N to adjust for the cases involved



n    Note: Mean deviations are no longer widely used in social sciences. However, calculating MD is not a complete waste of time
n    …Here’s why…
n    Recall that we took the absolute values to avoid getting minus signs: S|Xi - X-bar|
n    We use absolute values so that the different signs of values in S(X - X-bar) do not cancel themselves out












n    We can also get around this sign canceling issue by squaring S(Xi - X-bar)











n    Therefore, the variance = The mean of the squared deviations

n    Where:
n    S2 = Variance
n    S(X - X-bar)2 = Sum of squared deviations from mean
n    N = Total number of observations

n    Variance = The average difference between the mean and the observations made
n    Caveat:
n    Squaring the deviations alters the units of measurement
n    We need to bring the units back to their original non-squared values
n    The simplest way to do this is to take the square root everything





n    Standard deviation = square root of the variance
n    Squared values are not standard (doesn’t make sense to talk in terms of things squared)
n    Standard deviations restate variance in standard units

References
n    FYI:
n    Levin, Jack and James Alan Fox. 2003. Elementary Statistics in Social Research, 9th Edition. Boston, MA: Pearson Education Group, Inc.
n    Salkind, Neil. 2003.  Exploring Research, 5th Edition. Upper Saddle River, NJ: Prentice Hall.