Sampling in Statistics

STAT 218

Today’s Outline

Late Submission Policy
Investigation 1 Highlights
Lab 0 Instructions
Types of Sampling in Statistics

Late Submission Policy

I do not want class deadlines to cause you extreme stress or anxiety.

I offer 3 “grace days” – days to turn in the assignment late without a penalty.”
These can be used ONLY on weekly assignments, lab assignments and investigation assignments (a single group member must be willing to use one of their grace days for the entire group), but NOT exams or the midterm group project.
These “grace days” can be used all at once on a single assignment or used on separate assignments throughout the quarter. Simply add a comment on the assignment in Canvas BEFORE THE DEADLINE.
After the expiration of your ‘grace days,’ a 10% grade reduction will be applied for each day that the assignment is overdue.
Late submissions will not be accepted after one week from the original due date.
Resubmitting assignments is not allowed.

Installing R for STAT 218

Install R and RStudio

Install R: Yes, you do need to download and install R even if you have downloaded before. There is a newer version and it is free.
- Link to install R.
Install RStudio: Yes, you do need to download and install RStudio even if you have downloaded before. There is a newer version. Download the free Desktop version.
- Link to install RStudio Desktop.
After installation, try the following test and contact if you need help.

Is my RStudio working?

If it is not working…

In very rare situations, it is not working.
OR you want to bring your iPad instead of your laptop.
In these cases, please use the online version of R Studio (again, it is free but you have to sign up with your email address)
- The link is here

A Quick Snapshot

Investigation 1 Glossary and Highlights

Glossary

The terms that we saw in Investigation 1 were:

Data
Data Frame
Observational Unit (case, study subject)
Variable
- Categorical (sometimes binary if it has 2 sub-groups)
- Quantitative
Anectodal Evidence
Observational Study & Experimental Study
- Empirical evidence were collected in these studies

Types of Evidence

Anectodal Evidence

An anecdote is a concise story or illustration of a captivating event, as illustrated in our cases.

Important

Be cautious when handling data collected in a haphazard manner.
While such evidence may be authentic and verifiable, it often represents exceptional cases rather than forming a reliable basis for general conclusions.

Observational Studies

In an observational study, the researcher systematically collects data from subjects as an observer, without manipulating conditions.

Important

The Presence Confounding Variables: Observational studies may lead to misinterpretations due to the presence of confounding variables.
The context in which data collected is crucial in statistics. It alerts us to potential effects of other factors.
Data analysis without reference to context is considered meaningless.

Experimental Studies

Light and exam performance. A study was designed to test the effect of light levels on the exam performance of students. The treatments included fluorescent overhead lighting, yellow overhead lighting, and no overhead lighting (only desk lamps). The researchers randomly assigned students to each light level and found a discernible difference in exam performance based on the varying light levels.

Experimental Studies

The design of this experiment allows for the investigation of the interaction between two factors:

light level and exam performance.
In this scenario, researchers applied the conditions—specifically, different light levels to the subjects, which were Homo sapiens.
By randomly assigning treatments to the subjects, we can address the issue of confounding that complicates observational studies, thereby expanding the scope of conclusions we can draw from the research.
Randomized Experiments as the Pinnacle in Scientific Inquiry: Randomized experiments are often regarded as the pinnacle in scientific investigation due to their ability to overcome confounding.
- However, it’s crucial to acknowledge that they are not without their own set of challenges.

Experimental Studies

Randomized experiments are generally built on four principles.

Controlling
Randomization
Replication
Blocking

Reducing Bias in Experimental Studies

We can reduce bias in experimental studies by employing:

Treatment/Control Group
- Placebo Group
- Blinding / Double Blinding

Placebo

Placebos are commonly administered to human subjects in experiments, often in the form of an inert substance like a sugar pill.
The well-documented placebo response illustrates that individuals frequently exhibit positive reactions to any treatment, even when it lacks active ingredients.
In many cases, a placebo leads to a subtle yet genuine improvement in patients, a phenomenon known as the placebo effect.
- However, when implementing a placebo control, it is crucial for subjects to remain unaware of their group assignment—whether they are receiving the active treatment or the inert placebo.

Blinding/Double-Blinding

If researchers keep patients unaware of their treatment, the study is termed blind.
When both researchers and patients remain unaware of the individuals in the treatment groups, it is referred to as double-blind.

Sampling in Statistics

Population and Sample

population: consists of all subjects/participants/observational unit of interest (e.g., all squirrels in Cal Poly)
sample: a subset of a population with size n.
Generally, we would like to estimate something or make inferences about something that we want to know by selecting a sample from the population of interest.
e.g. Eighteen (n = 18) squirrels lived in Cal Poly.

Simple Random Sample

If a sample is determined through simple random sampling, it means that

Every item, subject, specimen, or observational unit in the population has an equal chance of being selected in that sample.
The selection of these members of the sample are chosen independently of each other.

How to Choose A Random Sample

To be able to gain benefit from employing randomness, we generally use tools to eliminate bias.

Here are the steps for choosing a random sample of n observational units from a population of interest.

Determine the sampling frame: Give a unique ID number for each member (e.g., from 01 to 50).
Start reading numbers from a Random Digits Table or a computer.
Ignore any number that is not present in your population (e.g., 72)
Ignore any repeated occurrence of the same number.
Continue until the intended sample size is reached.

Employing Randomness

Let’s play with our applet

Practical Concerns

In many cases, choosing and implementing simple random sample is either difficult or impossible
- What should we do, then?
- How can we label ALL the squirrels in Cal Poly?

Nonsimple Random Sampling Methods

Of course, there are other random sampling options that are not simple. Two of them are:
- Random cluster sampling: Remember that we gave a unique ID number for each member (e.g., from 01 to 50)
  - In random cluster sampling, unique ID numbers are assigned to clusters or groups of observational units (e.g., from 01 to 50), and then all the observational units in those clusters are recruited for the study.
- Stratified random sampling: Our population of interest may consists of various strata (e.g., age groups, biological sex, geographical region, grade levels of students).
  - After determining the strata, multiple simple random samples are drawn from each stratum, and these multiple samples collectively represent a representative sample of the population of interest.

Stratified vs Cluster Sampling

Stratified Random Sampling

Dan Kernler, CC BY-SA 4.0, via Wikimedia Commons

Random Cluster Sampling

Dan Kernler, CC BY-SA 4.0, via Wikimedia Commons

Sampling Error vs Nonsampling Errors

Sampling Error: The discrepancy between the sample and the population of interest.
- It is crucial to quantify this because that makes statistics one of the backbones of scientific thinking.
- There will always be a sampling error BUT
  - If we use some nonrandom sampling techniques, sampling error will become unpredictable (sampling bias)
Nonsampling Error: An error caused by other factors rather than sampling error.
- wording of the questions in a questionnaire
- nonresponse bias: e.g., bias occurs because of participants who are not responding some of the questions or not returning a written survey.
- missing data: e.g., experimental living organisms may die during the experiment, human subjects fail to participate in some/all of the sessions of a treatment group.
- …