STAT 218 - Week 4, Lecture 3
On Monday, we explored a method for estimating a confidence interval for a long-run proportion or population proportion.
Yesterday, we explored that there are also other ways to approximate a confidence interval.
Theory-based approach to generating confidence intervals,
allows us to easily use any confidence level we would like, not just 95%.
# CI Methods for a Single Mean
In previous chapters, we were focusing on inferences about a population proportion (categorical variable).
Now, we will focus on data consisting of a single quantitative variable.
We will make inferences about a population mean by creating confidence intervals.
2 SD approach should be reasonable as long as the sampling distribution of the sample means is reasonably symmetric.
The general form of a confidence interval is
\[ \\ statistic \pm multiplier \times (SD \ of \ statistic) \]
Here, our statistic will be the sample mean and multiplier will be 2.
\[ \\ \bar{x} \pm 2 \times (s/ \sqrt{n}) \]
To distinguish a quantitative variable from a categorical variable, we use different symbols to show population parameters and sample statistics.
Parameters
\(\mu\) = population mean
\(\sigma\) = population standard deviation
Statistics
\(\bar{x}\) = sample mean
\(s\) = sample standard deviation
\(t\)-distribution is another bell shape and symmetric distribution that can be useful if we do not know anything about population parameters.
The \(t\)-distribution is always centered at zero and has a single parameter: degrees of freedom.
Broadly speaking, we use \(t\)-distribution with \(df = n − 1\)
Both are symmetric and bell-shaped but \(t\)-distribution has a larger standard deviation.
The \(t\)-distribution has a single parameter: degrees of freedom.
Standard Normal Distribution has two parameters: \(\mu\) and \(\sigma\).
Let’s have a look wing areas of 14 male Monarch butterflies at Oceano Dunes State Park in California
Suppose we consider these 14 observations as a random sample from a population.
We should be aware of the fact that these estimates are subject to sampling error.
Warning
And our goal is to estimate \(\mu\).
Tip
If \(Z\) is a standard normal random variable, then the probability that \(Z\) is between \(\pm\) 2 is about 0.95 (OR 95% if we remember The 68/95/99.7 rule)
To understand how to calculate confidence intervals, we need to have
Standar error is the standard deviation of the null distribution.
The standard error of the mean is calculated as follow:
\[ SE_\bar{X} = \frac{s}{\sqrt{n}} \]
Let’s have a look wing areas of 14 male Monarch butterflies at Oceano Dunes State Park in California
Suppose we consider these 14 observations as a random sample from a population. For the multiplier, it is given as
\[ \\ multiplier = 2.160 \]
95% confidence interval (CI) for \(\mu\) can be calculated as following:
\[ \\95 \% \ CI = (\bar{x} \pm multiplier \ \times \ SE_{\bar{x}}) \\95 \% \ CI = (32.8143 \pm 2.160 \ \times \ 2.4757 / \sqrt{14}) \]
90% confidence interval (CI) for \(\mu\) can be calculated as following (multiplier:1.771):
\[ \\90 \% \ CI = (\bar{y} \pm multiplier \ \times SE_{\bar{x}}) \\90 \% \ CI = (32.8143 \pm 1.771 \ \times \ 2.4757 / \sqrt{14}) \]
\[ \\= 32.81 \pm 1.17 \\ 31.64 \ cm^2 < \mu < 33.98 \ cm^2 \]
What is the difference between 90% CI and 95% CI?
To help you visualize, imagine we have a population, and from that population, we randomly select a group of 20 observational units
95%CI = (-44.47, 20.13)
If we repeat this process 100 times, creating 100 different samples of 20 observational units each, we would end up with 100 different samples drawn from the population.
If we calculate confidence intervals for each of these 100 samples, we will find that…
And…
If we calculate confidence intervals for each of these 100 samples, we will find that around 95% of these intervals capture the true population mean.
We are 95% confident that the true population mean is in this confidence interval.
Recall that
\[ SE_{\bar{x}} = \frac{s}{\sqrt{n}} \]
We can use this formula to determine our sample size as follows:
\[ Desired \ SE = \frac{Guessed \ SD}{\sqrt{n}} \]
Suppose the researcher is now planning a new study of butterflies Monarch butterflies at Oceano Dunes State Park in California and has decided that it would be desirable that the SE be no more than \(0.4 \ cm^2\)
\[ SE_{\bar{y}} = s / \sqrt{n} \]
\[ Desired \ SE = Guessed \ SD / \sqrt{n} \]
\[ \\Desired \ SE = 2.48 / \sqrt{n} \ \le 0.4 \\ n\ge 38.4 \] \[ \\ at \ least \ 39 \ butterflies \]