Comparing Two Proportions

STAT 218 - Week 5, Lecture 3

Introduction

  • So far, we have interested in the analysis of a single variable.
    • In practice, most of the scientific research involves the comparison of 2 or more samples from different populations.
  • If the observed variable is categorical, the comparison of two samples can include comparison of proportions.

Notation

  • To be able to differentiate two samples from each other, we will use subscript.

Figure 1. Naturally Occurring Populations
  • The two populations that we are interested in can be either
    • naturally occurring populations (Figure 1) OR
    • conceptual populations defined by certain experimental conditions.

Let’s Refresh Our Memory

Sampling Distribution of \(\hat{p}\)

We will scaffold today’s content with those previous knowledge

The sampling proportion for \(\hat{p}\) based on a sample size \(n\) from a population with a true proportion \(\pi\) is nearly normal when

  • at least 10 successes and 10 failures in the sample. We call this success-failure condition.

  • The standard error was

\[ SE_{\hat{p}} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

\(\pi\) = Population proportion

\(\hat{p}\) = Sample proportion

Confidence Intervals for a Proportion


A confidence interval provides a range of plausible values for the parameter \(\pi\), and when \(\hat{p}\) can be modeled using a normal distribution, the confidence interval for \(\pi\) takes the form

\[ \hat{p} \pm multiplier \times SE_{\hat{p}} \]

Comparing Two Proportions

Introduction

Sampling Distribution of \(\hat{p_1}\) - \(\hat{p_2}\)

We can extend what we have learned.

The differences in population proportions for \(\hat{p_1} - \hat{p_2}\) can be modeled using a normal distribution when

  • The data are independent within and between the two groups.
    • Generally this is satisfied if the data come from two independent random samples or if the data come from a randomized experiment.
  • The success-failure condition holds for both groups, where we check successes and failures in each group separately.

When these conditions/assumptions are met, then the standard error of \(\hat{p_1} - \hat{p_2}\) is equal to

\[ SE = \sqrt{\frac{\hat{p_1}(1-\hat{p_1)}}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}} \] where \(\hat{p_1}\) and \(\hat{p_2}\) represent the sample proportions, and \(n_1\) and \(n_2\) represent the sample sizes.

An Example - I

Scientists predict that global warming may have big effects on the polar regions within the next 100 years. One of the possible effects is that the northern ice cap may completely melt.

Would this bother you a great deal, some, a little, or not at all if it actually happened?

  1. A great deal
  2. Some
  3. A little
  4. Not at all

Motivating Example - II

The GSS asks the same question, below are the distributions of responses from the 2010 GSS as well as from a group of introductory statistics students at Duke University:

  • Parameter of interest: Difference between the proportions of all Duke students and all Americans who would be bothered a great deal by the northern ice cap completely melting.

  • Point estimate: Difference between the proportions of sampled Duke students and sampled Americans who would be bothered a great deal by the northern ice cap completely melting.

An Example - CI

Construct a 95% confidence interval for the difference between the proportions of Duke students and Americans who would be bothered a great deal by the melting of the northern ice cap (\(\pi_{Duke}\) - \(\pi_{US}\)).

  • Check Conditions/Assumptions
  • Calculate the Standard Error of a sample proportion
  • Calculate 95% Confidence Interval
    • 95% CI = (-0.108, 0.086)
  • I’ll leave this as an exercise.