Continue to explore and practice graphing with ggplot
Continue to explore and practice website setup and styling with GitHub Pages
Integrate 1) and 2) to publish your data visualization on a website
gapminder
dataset with ggplot (50 mins)Today, we’ll use ggplot to visually explore global trends in public health and economics compiled by the Gapminder project. This project was pioneered by Hans Rosling, who is famous for describing the prosperity of nations over time through famines, wars and other historic events with this beautiful data visualization in his 2006 TED Talk: The best stats you’ve ever seen:
Please create a new GitHub repo in your personal account named gapminder
, clone the repo to your computer, and work on your data exploration in this new repo.
Open an .Rmd template file (File -> New File -> R Markdown…). Delete the boilerplate text under the setup chunk (you can keep that chunk) and make four level 2 headers:
Under the Data header, add a short description of the dataset we’re using today (you can copy that from our description above).
Today, we will work with a subset of the gapminder
dataset provided in the R package dslabs
.
Let’s start by installing the dslabs
package so we can access the data. After installing the package we need to load it with the library()
function. We also need to load the tidyverse
package because it contains ggplot.
library(dslabs) #install.packages("dslabs")
library(tidyverse)
Let’s start by exploring the data. You might e.g. want to use functions like View()
, dim()
, colnames()
, and ?
. You will see that the dataset includes the following variables:
The dataset includes data from 1960-2016. Since we’re just getting started with ggplot, we’ll only work with the 2011 data today (the most recent year for which the national GDPs are included in this dataset). Later in the course, we’ll return to the full dataset.
To subset the data, copy and run the following code. We’ll discuss data subsetting in class next week, so don’t worry about the notation for now.
gap2011 <- gapminder %>%
as_tibble() %>%
filter(year == 2011)
This creates the gap2011
dataframe that you’ll be working with for the rest of the day. Explore its dimensions and variables.
In breakout rooms, go and explore patterns in the data with ggplot.
First, under your Life expectancy header, add some text and code chunks to plot patterns in the life_expectancy
variable. Remember to use gap2011
as your data.
Some ideas to explore:
You can look back to our lecture 6 notes or the RStudio ggplot cheatsheet for inspiration.
Here’s an example plot:
ggplot(data = gap2011) +
geom_point(mapping = aes(x = gdp, y = life_expectancy))
## Warning: Removed 17 rows containing missing values (geom_point).
# Can we add more information to this plot?
After you’ve done some exploration of life expectancy, move on to add some plots and text under your Fertility header.
For this exercise, we will create three types of breakout rooms: interactive, quiet, and solitary.
You will be able to choose your own rooms, but let’s limit the group size to 4, so pick a different room if one already has this many participants.
Share your findings, challenges, and questions with the class.
For this exercise, you will build a GitHub Pages website as described in Lecture 5 and display our gapminder data visualization result on this website. For this website, you will each build your own, so there is no need to invite a collaborator. Just make sure your repo is public to be able to build the site.
You can split your RMarkdown file into separate files, so each section (i.e. data, life expectancy, fertility, infant mortality) becomes a separate page and can get it own tab. You can e.g. split your content into files named about.Rmd
, life_expectancy.Rmd
, fertility.Rmd
, and infant_mortality.Rmd
, and the add those as tabs in a _site.yml
file, as described in the lecture notes
You can then consider adding a table of contents and changing the styling (theme) of your website, as described here
Remember that it may take a little while for your website to update after you have pushed your changes to GitHub, but you can always check the current build (after running rmarkdown::render_site()
) in your Viewer pane in RStudio.
END LAB 3