STAT 218 - Week 3, Lecture 4
In this lecture, I will again introduce more concepts in data visualization AND the anatomy of ggplot()
function.
Please DO NOT CODE with me, just watch the demo to understand the basics.
After that, you will have a chance to try out some of these codes with your group members.
I will use a dataset in my slides but you are going to use another data set in your lab assignment.
are graphical representations of data
use different colors, shapes, and the coordinate system to summarize data
can tell a story or can be useful for exploring data
(A quick note: I used some of Dr Dogucu’s materials to this class because I love them!)
OR
- We could tell R something like…
smoke
smoke
on x-axis.count
on y-axis.
These ideas are all correct but some are not necessary in R
smoke
smoke
on x-axis.count
on y-axisR will do some of these steps by default.
We need to learn the variables before proceeding.
case
: id number
bwt
: birth weight, in ounces
gestation
: length of gestation, in days
parity
: binary indicator for a first pregnancy (0 = first pregnancy)
age
: mother’s age in years
height
: mother’s height in inches
weight
: mother’s weight in pounds
smoke
: binary indicator for whether the mother smokes
Pick data
Map data onto aesthetics
Add the geometric layer
Let’s use smoke
variable within babies
dataset which is a categorical variable indicating whether the mother smokes or not.
Let’s use smoke
variable within babies
dataset which is a categorical variable indicating whether the mother smokes or not.
Let’s use smoke
variable within babies
dataset which is a categorical variable indicating whether the mother smokes or not.
Let’s use bwt
variable which is a numeric variable indicating birth weight in ounces
Let’s use bwt
variable which is a numeric variable indicating birth weight in ounces
Let’s use bwt
variable which is a numeric variable indicating birth weight in ounces
Let’s use bwt
variable which is a numeric variable indicating birth weight in ounces
Choose your own color
bwt
to the x-axis.We are visualizing a single numerical and single categorical variable by using geom_boxplot
We colored continuous variables by smoke
We put different shapes for continuous variables by smoke
.
Now, we apply both different shapes and different colors.
Let’s use labs()
function to increase its readability.
We added another layer called theme_bw()
. This function is about the background, the size of the text etc.
Now, we elaborated this function a little bit more and omit the NA values.
Lab Assignment 2 is in Canvas (Module 3 - Lab 2- Submissions)
Just go back and follow this slide show to find necessary codes. Please copy-paste it instead of coding from scratch.
Call me over if you have any questions.