Introduction

One of the best features of R is its graphics capability. As far as flexibility and breadth, R graphics is unsurpassed (in my opinion). There are MANY functions in R that involve visualizing data. There are many examples online, but a nice site that has tons of graphics (and syntax!) is: http://www.r-graph-gallery.com/.

Why Visualization matters…

Pictures are cool, no doubt. But, visualization of information is an ESSENTIAL part of your toolkit as a researcher, analyst, and/or rad R user. Visualizing information helps you tell the story and make the point to those who are “visual” learners.

For example, in a paper entitled “Solitary Confinement and the Well-Being of People in Prison”, Wright et. al. examined the effects of solitary confinement on mental well-being at three time points over a 12-month period (called “baseline”, “6-Month”, and “12-Month”). A key feature of the study was tracking movement of individuals through custody levels.

Take a look at the table below which shows a cross-tabulation for custody levels for individuals. That is, the cells represent the count of individuals who were at a particular custody level at baseline (the row) and their custody level at 6 months (the column):

Custody Levels	Minimum (BL)	Medium	Close	Maximum	Attrite
Medium (6M)	6	81	3	3	17
Close	0	9	60	13	12
Maximum	0	0	14	99	9

What does this table tell you? In other words, what can you infer from the table?

It is a bit hard to follow. Now, what if we added the two other tables: 6-month to 12-month and baseline to 12-month. That would be a lot of information to digest and tough to get a sense of what was occurring.

A visualization would help. How could we better represent this information?

Take a look at this visualization.

This is a “sanky” plot. It is for showing flows where entities (nodes) are represented by rectangles or text. Arrows or arcs are used to show flows between them. It is created using the networkD3 package.

The Point

Visualization is a great medium for conveying information. There are a LOT of tools in R for visualizing information. So, to get a sense of how it all works, let’s start with a basic function plot().

The `plot()` Function

A very useful graphics function is the plot() function. This function plots a two-dimensional pane with two arguments giving the x and y coordinates.

Let’s create a simple plot:

x <- rnorm( 100, 0, 1 ) #create a vector with 100 elements drawn from a normal distribution.
y <- seq( 1, 10, length.out=length( x ) ) # create a vector 1:10 with same length.
plot( x, y ) #plot it.

This is a pretty simple plot. As mentioned, one of the nice features of R is the flexibility to create your plot. Take a look at the different arguments that we pass to plot() to modify it by examining the help: ?plot.

Overall, there are many different parameters we can modify in plot(). Let’s check out a few:

Type of plot

We can change the “type” of plot:

plot( x, y, type="l" ) #plot a line.
plot( x, y, type="p" ) #plot points.
plot( x, y, type="b" ) #plot both!

Often, when plotting multiple objects, we want to first set up the plot regions before adding anything. This is a plot of type “none”: plot( x,y, type="n" ).

plot( x, y,
   type="n",
   main="our sample plot", # plot a title.
   xlab="this is the x axis", # change the x label.
   ylab="this is the y axis" # change the y label.
  )

Characters

We can also change the “characters” of the plot:

plot( x, y, pch=1 ) #plot a point.
plot( x, y, pch=2 ) #plot a triangle.
plot( x, y, pch=3 ) #plot a +.
plot( x, y, pch=4 ) #plot an x.

The argument pch determines the shape of the plot points. The numeric values 0 to 25 represent different default shapes. We can also use any number, letter, or symbol as a plotting shape.

Note that shapes 0 to 14 are hollow, 15 to 20 are solid, and 21 to 25 can also plot a background color specified by the bg= argument.

Points, lines, text, etc.

Additionally, the points(), lines(), segments(), and text() functions are useful for adding information to plots.

Here is an example I use in my data analysis course to illustrate the properties of the standard normal distribution.

First, let’s set up our values:

y    <- seq( -15, 30, length=1000 ) # sequence from -15 to 30. 
hx.1 <- dnorm( y, 0, 1 ) # densities for the plots.
hx.2 <- dnorm( y, 0, 2 )
hx.3 <- dnorm( y, 0, 3 )

Next, let’s set up the plot, but we don’t want to add anything yet (so we use type="n"):

plot( y, hx.1,
  xlab="", ylab="", # blank out the labels for x and y.
  type="n", #do not plot anything.
  main="Normal Distributions" # a title.
)

Now, illustrate the shape of the distributions using the lines() function (you can copy and paste one at a time to see them get added):

plot( y, hx.1,
  xlab="", ylab="", # blank out the labels for x and y.
  type="n", #do not plot anything.
  main="Normal Distributions" # a title.
)

lines( y, hx.1, col="blue", type="l", lwd=2 )
lines( y, hx.2, col="red", type="l", lwd=2 )
lines( y, hx.3, col="darkgreen", type="l", lwd=2 )

Now, add a line to show the central tendency by using the segments() function:

plot( y, hx.1,
  xlab="", ylab="", # blank out the labels for x and y.
  type="n", #do not plot anything.
  main="Normal Distributions" # a title.
)

lines( y, hx.1, col="blue", type="l", lwd=2 )
lines( y, hx.2, col="red", type="l", lwd=2 )
lines( y, hx.3, col="darkgreen", type="l", lwd=2 )

segments( 0, 0, 0, 0.5, col="black", lwd=2 )

Finally, add some text to show the values (note that we will use the text() function):

plot( y, hx.1,
  xlab="", ylab="", # blank out the labels for x and y.
  type="n", #do not plot anything.
  main="Normal Distributions" # a title.
)

lines( y, hx.1, col="blue", type="l", lwd=2 )
lines( y, hx.2, col="red", type="l", lwd=2 )
lines( y, hx.3, col="darkgreen", type="l", lwd=2 )

segments( 0, 0, 0, 0.5, col="black", lwd=2 )

text( 11, 0.35, "Mean = 0, SD = 1", col="blue", cex=1.5 )
text( 12, 0.15, "Mean = 0, SD = 2", col="red", cex=1.5 )
text( 13, 0.06, "Mean = 0, SD = 3", col="darkgreen", cex=1.5 )

The Layering Approach

As we have seen, we can start with a basic plot and add information. Creating graphics in this way is referred to as layering because we are stacking additional layers of elements on top of each other.

Consider the following plot:

As you can see, there are a number of elements that have been used to create this plot:

a basic plot layout using plot()
a set of points using points()
several segment lines using segments()
a title and axis labels using the main=, xlab=, and ylab= arguments in the function plot()
and some words using text().

Let’s go through and build this plot, layer by layer.

First, what are these data?

The data are yearly rates of family deaths recorded by a professor at Penn State. That is, the rate at which family deaths are reported to him prior to an exam from 1960-1995.

Here are what the data look like in a table:

Year	Death Rate
1960	0.18
1965	0.20
1970	0.24
1975	0.30
1980	0.47
1985	0.61
1990	0.70
1995	0.90

Now, let’s move the data into objects to work with in R:

  x <- c( 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995 )
  y <- c( 0.18, 0.20, 0.24, 0.30, 0.47, 0.61, 0.70, 0.90 )

Next, let’s set up the plot and define the limits of the axes using xlim= and ylim=, define the title using main=, and set the axis labels using xlab= and ylab=:

# Set up the plot.    
plot(x, y,
     xlim=c( min( x ) - 5, max( x ) + 5 ), # set x axis limits.
     ylim=c( min( y ) - 0.5, max( y ) + 0.5) , # same for y axis.
     main="Plot of Average Family Deaths by Year", # title.
     xlab="Year", # x axis label.
     ylab="Average Family Deaths", # same for y axis.
     type="n" # don't plot anything inside.
     )

Now, we can add the points to the plot using the points() function. We can customize the points using the pch=, col=, and bg= arguments.

plot(x, y,
     xlim=c( min( x ) - 5, max( x ) + 5 ), # set x axis limits.
     ylim=c( min( y ) - 0.5, max( y ) + 0.5) , # same for y axis.
     main="Plot of Average Family Deaths by Year", # title.
     xlab="Year", # x axis label.
     ylab="Average Family Deaths", # same for y axis.
     type="n" # don't plot anything inside.
)

points( x, y, pch = 21, col = "red", bg = "lightblue" )

What additional information could we add to this plot that would aid in our understanding of the relationship between year and average family deaths? What about understanding how regression works?

Well, we could add the least-squares regression line to the plot using the abline() function and the lm() function:

plot(x, y,
     xlim=c( min( x ) - 5, max( x ) + 5 ), # set x axis limits.
     ylim=c( min( y ) - 0.5, max( y ) + 0.5) , # same for y axis.
     main="Plot of Average Family Deaths by Year", # title.
     xlab="Year", # x axis label.
     ylab="Average Family Deaths", # same for y axis.
     type="n" # don't plot anything inside.
)

points( x, y, pch = 21, col = "red", bg = "lightblue" )

abline( lm( y ~ x ), lty=2 )

Additionally, we can illustrate how OLS estimation works. Recall that OLS finds the line that minimizes the sum of squared residuals. We can show that using the points(), abline(), segments(), and text() functions:

plot(x, y,
     xlim=c( min( x ) - 5, max( x ) + 5 ), # set x axis limits.
     ylim=c( min( y ) - 0.5, max( y ) + 0.5) , # same for y axis.
     main="Plot of Average Family Deaths by Year", # title.
     xlab="Year", # x axis label.
     ylab="Average Family Deaths", # same for y axis.
     type="n" # don't plot anything inside.
)

points( x, y, pch = 21, col = "red", bg = "lightblue" )
abline( lm( y ~ x ), lty=2 )

# add some points to the plot.
points( mean( x ), mean( y ), col="black", pch=3, cex=3 )

# plot the mean of y horizontally.
abline( h=mean( y ), lty=3 )

#plot the mean of x vertically.
abline( v=mean( x ), lty=3 )

# add segments and text showing the deviations.
segments( 1985, mean( y ), 1985, 0.61, lwd=3, col="red" )
text( 1987.2, 0.53, "y-ybar" )
segments( mean( x ), 0.61, 1985, 0.61, lwd=3, col="red" )
text( 1981, 0.65, "x-xbar" )

This is just the beginning!!!

There is MUCH more you can do with just the plot() function. See help( par ) for a list of all the arguments and options for plotting.

There are also many other options for plots in R. There are entire packages created for plotting. One in particular is the ggplot2. Check out this page showing crime in Phoenix.

Questions?

Back to R Workshop page

Data Visualization: R Graphics

R Workshop

Introduction

Why Visualization matters…

The Point

The `plot()` Function

Type of plot

Characters

Points, lines, text, etc.

The Layering Approach

This is just the beginning!!!

Questions?

Please report any needed corrections to the Issues page. Thanks!

Last updated 13 August, 2024

Data Visualization: R Graphics

R Workshop

Introduction

Why Visualization matters…

The Point

The plot() Function

Type of plot

Characters

Points, lines, text, etc.

The Layering Approach

This is just the beginning!!!

Questions?

Please report any needed corrections to the Issues page. Thanks!

Last updated 13 August, 2024

The `plot()` Function