Let’s be honest, network analysis is awesome. Where does that awesomeness come from? One sure reason is visualization! There is nothing like an beautiful network visualization that conveys lots of information and is aesthetically pleasing. Right?
Not convinced? Examine the plot below. Without me giving you any information about this figure (which is from James Moody and Peter Mucha’s paper, Portrait of Political Party Polarization), what does it tell you?
What about this one showing political blogs? (from David Lazer and colleagues paper Computational Social Science). What does the visualization “say” or “tell you”?
In this lab, you will be introduced to the basics of visualizing
networks using the gplot()
function in the sna
package. We will also look at some approaches to building plots to help
guide you in honing your SNA toolkit.
Why are you learning this? I want you to be able to render really cool networks, obviously. But, I also want you to learn how to embrace this visualization tool to convey information to audiences that might not be able to digest a table or some other data description medium. Let’s get to work yall!
One of the great features of working with network data is the ability to see the data through visualization. Visualizing the structure of a network is helpful for discerning patterns that might be of interest.
Douglas Luke’s (2015: 47) A User’s Guide to Network Analysis provides several guidelines, or aesthetic principles, for what makes a graphical layout of a network easy to understand and interpret. These are:
Minimize edge crossings
Maximize the symmetry of the layout of nodes
Minimize the variability of the edge lengths
Maximize the angle between edges when they cross or join nodes
Minimize the total space used for the network display
Think about each of these suggestions. Why do they aid in visualizing the network? How do they assist in avoiding conveying information that is not really there? Ponder these questions for a bit…
Now that you have a sense of what a good visualization should try to
do, let’s look at the example Luke uses as an illustration by working
with the gplot()
function in the sna
package.
First, we need to install the sna
package using
install.packages( "sna" )
and load the library using
library( sna )
.
Remember, if you have already installed a package then you do not
need to use the install.packages()
function. But, if you
have not installed the package in a while, you should use
update.packages()
to incorporate any changes that have been
made to the page.
Next, let’s get the UserNetR
package from Douglas Luke’s GitHub
page. Since this package is not on the CRAN package repository, we
need to install it directly from Github. We do this in four steps:
First, install the devtools package using
install.packages( "devtools" )
And then load the library for the package with
library( devtools )
Now, install the UserNetR
package
install_github( "DougLuke/UserNetR" )
Finally, load the library
library( UserNetR )
Altogether, that looks something like this:
# install the packages
install.packages( "sna" )
install.packages( "devtools" )
# call the libraries
library( sna )
library( devtools )
# install from Github
install_github( "DougLuke/UserNetR" )
# call that library
library( UserNetR )
Now, let’s take a look at the Moreno
network (see
help( Moreno, package = UserNetR )
. These data are
contained in a sociogram constructed by Jacob Moreno,
and published in the New York Times in 1933 (see Moreno, J. L. 1934.
Who shall survive? A new approach to the problem of human
interrelations. Nervous and mental disease monograph series, no.
58. Washington, DC: Nervous and Mental Disease Publishing Co. for a more
extensive discussion).
# Note that we ask the summary() function to not print out the adjacency matrix.
summary( Moreno, print.adj = FALSE )
## Network attributes:
## vertices = 33
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 46
## missing edges = 0
## non-missing edges = 46
## density = 0.08712121
##
## Vertex attributes:
##
## gender:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 1.515 2.000 2.000
## vertex.names:
## character valued attribute
## 33 valid vertex names
##
## No edge attributes
# Now, let's compare two different plot layouts.
# Here is a circle.
gplot( Moreno, mode = "circle" )
Think back to the aesthetic elements we discussed above. How do these two plots differ in how well they convey the same information? Which one is better? Why is it better?
To think about these questions, let’s plot the two layouts together.
To do this we will use the par()
function. This allows us
to partition the plotting region.
# First, we define the plot layout window.
op <- par(
mar = rep( 2, 4 ), # set the margins
mfrow = c( 1, 2 ) # set the dimensions
)
# plot the circle layout (add the main() argument for a title)
gplot( Moreno, mode = "circle", main = "Plotted as a circle" )
# plot the Fruchterman Reingold layout
gplot( Moreno, mode = "fruchtermanreingold", main = "Plotted using a spring algorithm" )
Again, think about our questions: How do these two plots differ in how well they convey the same information? Which one is better? Why is it better?
If we add information about gender, we can see a bit more that is revealed by the spring algorithm.
To do so, we use the vertex.col=
argument with the
vertex attribute gender. To do so, we need to access the vertex
attribute. We can do this with:
the get.vertex.attribute()
function in the
sna
package. Use library( sna )
and specify
the attribute we want, like
get.vertex.attribute( Moreno, "gender" )
or use the shorthand for this with
network object %v% atttribute
, like
Moreno %v% "gender"
op <- par( mar = rep( 2, 4 ), mfrow = c( 1, 2 ) )
gplot( Moreno,
mode = "circle",
main = "Plotted as a circle",
vertex.col = get.vertex.attribute( Moreno, "gender" ) # use the vertex attribute.
)
gplot( Moreno,
mode = "fruchtermanreingold",
main = "Plotted using a spring algorithm",
vertex.col = Moreno %v% "gender" # note the difference here compared to above.
)
# let's add a legend to the plot
legend(
"bottomleft",
legend = c( "Male","Female" ),
col = c( "red","black" ),
title = "legend",
pt.cex = 0.75,
bty = "n",
pch = 19
)
What is the primary story that the plot tells? Does a particular layout help us see that better?
Ok, that was a lot. Let’s work our way back through the mechanics of building a plot to better get a sense of what we are doing.
gplot()
functionLet’s take a look at some of the visualization capabilities of
gplot()
. Let’s start by looking at the function’s help
page: ?gplot
.
To see the various functionality of the function, let’s work with the example of an undirected network from Lab 3 - Introduction to Networks in R.
# define the path where the data are
url <- "https://raw.githubusercontent.com/jacobtnyoung/sna-textbook/main/data/data-undirected-example.csv"
# define the object
mat.u <- as.matrix(
read.csv(
url,
as.is = TRUE,
header = TRUE,
row.names = 1
)
)
# now, create the network
net.u <- network(
mat.u,
directed = FALSE
)
# take a look at the network
summary( net.u )
## Network attributes:
## vertices = 5
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 5
## missing edges = 0
## non-missing edges = 5
## density = 0.5
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## Jen Tom Bob Leaf Jim
## Jen 0 1 0 0 0
## Tom 1 0 1 0 0
## Bob 0 1 0 1 1
## Leaf 0 0 1 0 1
## Jim 0 0 1 1 0
#NOTE: we could have wrapped all of this into two statements
net.u <- as.network(
as.matrix(
read.csv(
"https://raw.githubusercontent.com/jacobtnyoung/sna-textbook/main/data/data-undirected-example.csv",
as.is = TRUE,
header = TRUE,
row.names = 1
)
),
directed = FALSE
)
summary( net.u )
## Network attributes:
## vertices = 5
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 5
## missing edges = 0
## non-missing edges = 5
## density = 0.5
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## Jen Tom Bob Leaf Jim
## Jen 0 1 0 0 0
## Tom 1 0 1 0 0
## Bob 0 1 0 1 1
## Leaf 0 0 1 0 1
## Jim 0 0 1 1 0
Now that we have the object built, we can plot it.
Hold on, this plot shows arrows. But this network is undirected. What gives?
That is because gplot()
assumes a directed network. We
can see this in the help menu, ?gplot
, where is shows that
for the type of network, the gmode=
argument defaults to a
directed graph. To fix this we can either:
manually turn off the display of arrows using the
usearrows=
argument,
gplot( net.u, usearrows = FALSE )
or indicate that the object to be plotted is a undirected graph
or graph, gplot( net.u, gmode = "graph" )
The gplot()
function has a number of arguments that can
be used to work try and better display the information contained in the
network.
For example, we can add labels to the vertices using the
network.vertex.names()
function.
Alternatively, we could add in a string of names for the label:
gplot( net.u, gmode = "graph", label = c( "Jen", "Tom", "Bob", "Leaf", "Jim" ) )
Or we could read them in as an object:
names <- c( "Jen", "Tom", "Bob", "Leaf", "Jim" )
add to the plot using
gplot( net.u, gmode = "graph", label = names )
A great feature of R is that we can tune the graphing parameters. Here are several examples:
Labels:
Add boxes around the labels,
boxed.labels = TRUE
Change label size using label.cex
, such as
label.cex = 1.5
Color the labels using label.col=
, such as:
label.col = "blue"
Colors:
different colored names, combine label.col=
with the
c()
function. Such as:
label.col = c( "red", "blue", "green", "orange", "grey" )
different colored nodes, vertex.col=
argument. Such
as:
vertex.col=c("red","blue","green","orange","grey")
different colored edges, using edge.col=
, such as:
edge.col=c("red","blue","green","orange","grey")
There is a LOT of functionality to the gplot()
function.
See the arguements in the help file: ?gplot
. I would
encourage you to take some time to look through it and play around with
the various features.
When a layout is generated, the results can be saved for later reuse
using the coord=
argument.
# Set the seed for the random number generator
# so we can always get the same plot layout.
set.seed( 605 )
# Define an object that will be the coordinates we want to use.
coords <- gplot(
net.u,
gmode = "graph",
label = network.vertex.names( net.u )
)
## x y
## [1,] -5.441389 -1.849797
## [2,] -4.682331 -3.198937
## [3,] -3.171835 -3.157688
## [4,] -2.055885 -2.041756
## [5,] -1.648225 -3.476604
# Saved layouts can be used via the coord= argument:
gplot(
net.u,
gmode = "graph",
label = network.vertex.names( net.u ),
coord = coords
)
Cool but, why do this? The placement of the nodes shift when
we call the gplot()
function just due to the operation of
the algorithm. Controlling where nodes are plotted in the 2-dimensional
space is useful if we want to show different aspects of the plot. Note
that we can have different layouts of the nodes. If we like a particular
one, we can save the coordinates.
But, suppose the default settings are insufficient and we
want to make a few small changes. The interactive=
argument
allows for tweaking.
# First, set up the coordinates you want.
coords <- gplot(
net.u,
gmode = "graph",
label = network.vertex.names( net.u ),
coord = coords,
interactive=TRUE
)
When this renders on your system, move a few of the nodes around. Then, after you close the window it will save the coordinates.
# Then, use these in the plot.
gplot(
net.u,
coord = coords,
gmode = "graph",
label=network.vertex.names( net.u )
)
As we have seen, we can start with a basic plot and add information. Creating graphics in this way is referred to as layering because we are stacking additional layers of elements on top of each other.
Take a look at this series of plots:
The plot uses several layers of information:
vertex.cex
)vertex.col
)edge.col
)As we create a plot, we want to think about what information we should convey and how best to convey that information (i.e. colors?, shapes?, size?, all of the above?)
The Prison Inmate Networks Study (PINS) examines the social networks of prison inmates in a state correctional institution. The study was unique in that it was the first in nearly a century to collect sociometric data in a prison. The researchers collected data on several types of networks.
Let’s plot the power and influence network, which was
created by asking individuals whom they believed was “powerful and
influential” on the unit. We will continue working with the
gplot()
function.
We are going to do this in a few steps:
First, load the adjacency matrix,
data-PINS-power-w1-adj.csv
, and create an object of class
network
.
Second, load the file with Age and Race attributes,
data-PINS-w1-age-race-attributes.csv
, and assign each
attribute to the network object.
# define the adjacency matrix
PI.mat <- as.matrix(
read.csv(
"https://raw.githubusercontent.com/jacobtnyoung/sna-textbook/main/data/data-PINS-power-w1-adj.csv",
as.is = TRUE,
header = TRUE,
row.names = 1
)
)
# create an object of class network
PI.net <- network( PI.mat, directed = TRUE )
# define the attributes object
PI.attrs <- read.csv(
"https://raw.githubusercontent.com/jacobtnyoung/sna-textbook/main/data/data-PINS-w1-age-race-attributes.csv",
as.is = TRUE,
header = TRUE
)
# assign the attributes to the network
PI.net %v% "Age" <- PI.attrs[,1]
PI.net %v% "Race" <- PI.attrs[,2]
Note that we used a shorthand notation: %v%
. This is an
assignment operation that tells R to assign something to the network.
Specifically, %v%
indicates the assignment to a vertex,
hence the v.
The operate also let’s us pull a specific attribute. We can
look at the various vertex data by using the shorthand
network %v% "attribute"
. For example:
PI.net %v% "Age"
shows the age variable.
PI.net %v% "Race"
shows the race variable.
# look at the values for age
PI.net %v% "Age" <- PI.attrs[,1]
# look at the values for race
PI.net %v% "Race" <- PI.attrs[,2]
Note that we can also reference edges (i.e. %e%
) if a
network has an assigned edge. For example, we could pull the information
network and assign that to the power influence network:
# define the adjacency matrix
INFO.mat <- as.matrix(
read.csv(
"https://raw.githubusercontent.com/jacobtnyoung/sna-textbook/main/data/data-PINS-info-w1-adj.csv",
as.is = TRUE,
header = TRUE,
row.names = 1
)
)
# assign the matrix as an edge attribute
PI.net %e% "info" <- INFO.mat
Think about what we did on the last line. For the power influence
edges, we assigned INFO.mat
as an attribute. This
represents whether a power/influence tie was
also an information network tie.
Now, we can use that information in our plot. For example:
YIKES!!! What is wrong?
The problem is that we need to rescale the vertex attribute so that
the nodes are not too big. Let’s build a function to do that and then
execute the gplot()
function:
rescale <- function( nchar, low, high ){
min_d <- min( nchar )
max_d <- max( nchar )
rscl <- ( ( high - low )*( nchar - min_d ) ) / ( max_d - min_d ) + low
rscl
}
Now, use the function we created to rescale the vertex attribute:
# now execute the plot
gplot( PI.net,
arrowhead.cex=0.5,
vertex.cex = rescale( PI.net %v% "Age", 0.5, 1.5 ) )
Note that the plot above has a lot of “whitespace” due to the
margins. We can adjust this using the par()
function.
# tweak the margins to cut some whitespace
par( mar = c( 0.1,0.1,0.1,0.1 ) )
# now execute the plot
gplot( PI.net,
arrowhead.cex=0.5,
vertex.cex = rescale( PI.net %v% "Age", 0.5, 1.5 ) )
That looks better. Let’s drop the isolates to help with the visualization.
par( mar = c( 0.1,0.1,0.1,0.1 ) )
gplot( PI.net,
displayisolates = FALSE,
arrowhead.cex=0.5,
vertex.cex = rescale( PI.net %v% "Age", 0.5, 1.5 ) )
How about we color the edges based on whether there was a information network tie.
par( mar = c( 0.1,0.1,0.1,0.1 ) )
gplot( PI.net,
edge.col = PI.net %e% "info" + 1,
displayisolates = FALSE,
arrowhead.cex=0.5,
vertex.cex = rescale( PI.net %v% "Age", 0.5, 1.5 ) )
As we build layers, we can get a fairly useful graphic that tells us a lot of information:
par( mar = c( 5,0.1,3,0.1 ) )
gplot( PI.net,
main="PINS Power/Influence Network", # add a title
vertex.col = PI.net %e% "Race", # color the nodes by the Race variable
edge.col = PI.net %e% "info" + 2, # color the edges by the information network attribute
displayisolates = FALSE, # don't display the isolated cases
arrowhead.cex=0.5, # augment the size of the arrowheads
vertex.cex = rescale( PI.net %v% "Age", 0.5, 1.5 ), # size the nodes by the Age variable
sub="Nodes colored by Race, \n edges colored by Info net \n nodes sized by Age" # add a subtitle
)
gplot()
is not the only option for network
visualization. In fact, there are MANY functions written for plotting
networks. An alternative to gplot()
is ggraph()
.
ggraph()
is an extension of the ggplot2()
package which is designed for layout-based development of
visualizations. That is, rather than editing arguments within a
function, the ggplot2()
grammar adds layers to a plot by
calling those different layers.
The main piece of code that creates the plot is:
ggraph(graph, layout = 'kk') +
geom_edge_fan(aes(alpha = after_stat(index)), show.legend = FALSE) +
geom_node_point(aes(size = Popularity)) +
facet_edges(~year) +
theme_graph(foreground = 'steelblue', fg_text_colour = 'white')
As you can see, there are calls to different layers that build the
visualization. I would encourage you, if you have the time, to tinker
with ggraph()
as well. We will stick to
gplot()
for this course due to the class of objects we will
be using. ggraph()
requires objects from a different
network package, igraph()
, so we will avoid going down the
road of switching between packages.
There is nothing like an beautiful network visualization that conveys
lots of information and is aesthetically pleasing. Now you know how to
get there! As you have seen, there is a ton of flexibility. I would
encourage you to spend some time tinkering with the various arguments
for the gplot()
function to get a feel for what it can
do.