Now that you know about representing network data as matrices, let’s start getting our hands dirty (figuratively of course) by building some networks in R! This lab will work through some basic examples of matrices and how they are created, imported, and manipulated in R.
A note on the labs. Each lab in this course will contain R code which is in the “code chunks” you see below. There is also regular text. The R code chunks can be copied and pasted directly into R. As you work through this lab, follow along in R by coping and pasting the R code and seeing it work on your end.
First, clear the workspace. To do so, we use the following statement:
Let’s start by working with an example of an undirected, binary network. We will create an object that is the adjacency matrix.
One way to create an adjacency matrix is to use the
matrix()
function with the concatenate()
or
c()
function.
We can look at what these functions do by asking for help using the
help("
function name here ")
or
?("
function name here ")
functions.
The help window describes the arguments that the function takes and also provides examples.
Now, let’s create the data object:
This command reads as follows:
Combine the following numbers
From these combined numbers, create a matrix by reading across the list
Create an object called data. This object will be a matrix with 5 rows.
We can see the object by just typing the object name:
mat
. Note that if the number of elements does not
correctly match the dimensions of the matrix, R gives you an error.
For example:
junk1 <- matrix(c(1,2,3,4,5,6,7),nrow=2,byrow=TRUE)
# Because there are 7 elements here,
# the 8th element needed for a matrix
# is replaced with the first value in the vector
# print out the object by just typing the name of the object
junk1
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 1
After we have created our object mat
, we can examine the
dimensions with the dim function: dim( mat )
.
We can also attach names to the rows and columns of the matrix by
using the rownames()
and colnames()
functions.
# attach row names
rownames( mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
# attach column names
colnames( mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
# print out the object
mat
## Jen Tom Bob Leaf Jim
## Jen 0 1 0 0 0
## Tom 1 0 1 0 0
## Bob 0 1 0 1 1
## Leaf 0 0 1 0 1
## Jim 0 0 1 1 0
We can refer to specific elements, rows, or columns by using the
[
and ]
symbols. This reads as:
“object[row,column]”.
For example, lets look at the relation Jen sends to Tom.
Recall from lecture that this is element 1,2 in the matrix (i.e. row
one, column two). In R code that is: mat[1,2]
.
This command reads as follows: for the object mat
,
return the value at row 1 column 2. The row number is the first
dimension and the column is the second dimension. Remember: “rows by
columns”
We can also call the values for an entire row or column. A single value is called a scalar.
## Jen Tom Bob Leaf Jim
## 0 1 0 0 0
## Jen Tom Bob Leaf Jim
## 0 1 0 0 0
Since we have defined names for the rows and columns, we can use those as well.
## Jen Tom Bob Leaf Jim
## 0 1 0 0 0
## Jen Tom Bob Leaf Jim
## 0 1 0 0 0
Note: the following does not work because it needs a
character, defined by the ""
symbols around the name.
# returns an error because there is no OBJECT called Jen
mat[,Jen]
# compare the difference with the prior line
mat[,"Jen"]
We can also call a series of values:
## Jen Tom Bob Leaf Jim
## Jen 0 1 0 0 0
## Tom 1 0 1 0 0
## Bob 0 1 0 1 1
## Jen Tom Bob
## Jen 0 1 0
## Tom 1 0 1
## Bob 0 1 0
## Leaf 0 0 1
## Jim 0 0 1
We can also call a group of values that are non-contiguous using the
c()
function:
## Jen Tom Bob Leaf Jim
## Jen 0 1 0 0 0
## Bob 0 1 0 1 1
## Jen Bob
## Jen 0 0
## Tom 1 1
## Bob 0 0
## Leaf 0 1
## Jim 0 1
We can also call a group of values that that do not contain specified
values by putting a -
(i.e. a minus sign) in front of the c
function:
## Jen Tom Bob Leaf Jim
## Tom 1 0 1 0 0
## Leaf 0 0 1 0 1
## Jim 0 0 1 1 0
## Tom Leaf Jim
## Jen 1 0 0
## Tom 0 0 0
## Bob 1 1 1
## Leaf 0 0 1
## Jim 0 1 0
Got it? If yes, then GREAT! If no, hang in there: you got this! If you are a bit hesitant with working with indexing, the best way to get better is to practice. Feel free to work back through the section above to get better at this basic skill we will use a LOT in this course.
network
packageNow that we have our object mat
is created, let’s
manipulate it into a network and a graph. To do this, we can use the
network
package. network
is a package
containing tools to create and modify network objects created by Carter Butts. See the network
page for an overview of package functionality.
First, we need to install the package using:
install.packages( "network" )
. Note: if you have
already installed the package, no do not need to
reinstall it.
If it is already installed, we should check to make sure we have the
most recent version: update.packages( "network" )
Whenever we start R, we need to load the package because it is not
automatically loaded. To do this, use the library()
function. library( "network" )
To get a list of the contents of the package, as for help with
respect to the package itself use the help()
function, but
tell R we want help on the particular package:
help( package="network" )
Now that the package is loaded, lets create a new object from our
matrix that is a network. In R lingo, we will use the
network()
function to create an object that is of
class network
. To use some of the functions, it
has to be a specific class.
Just like you can’t perform calculations on a object that is of class
character
(e.g. a list of names), the functions in this
page are designed to work with a network
object.
## Jen Tom Bob Leaf Jim
## Jen 0 1 0 0 0
## Tom 1 0 1 0 0
## Bob 0 1 0 1 1
## Leaf 0 0 1 0 1
## Jim 0 0 1 1 0
## [1] "matrix" "array"
# now, coerce the object into
# an object called net.u that
# is of class network
net.u <- as.network( mat )
When we enter the object in the command line, summary info about the
object is produced: net.u
. This is because the object is of
class network
. We can use the class function to confirm
this: class( net.u )
.
Let’s look at the object again: net.u
. What does the
summary output of the object tell us?
Note that the network is treated as directed. By
default, the function as.network()
sets the argument
directed =
to TRUE
. We can see this by looking
at the structure of the function in the help page:
?as.network
. What do we need to change in the
as.network()
function?
We need to change the input for the directed=
argument
because our network is undirected. In other words,
directed = FALSE
. This tells the function that the matrix
we are entering is an undirected network. This is logical: is the object
a directed network? False. Therefore, it is an undirected network.
# create a new object called net.u.correct
net.u.correct <- as.network(
mat,
directed = FALSE )
# compare the difference since telling the function that the network is directed
net.u
## Network attributes:
## vertices = 5
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 10
## missing edges= 0
## non-missing edges= 10
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
## Network attributes:
## vertices = 5
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 5
## missing edges= 0
## non-missing edges= 5
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
The summary()
function is a generic function that
summarizes objects. We can use it on an object of class
network
to provide more information:
summary( net.u.correct )
. More information about what can
be done with the summary()
function for an object of class
network is shown on the ?as.network
page.
We could also enter the data as an edgelist using the
matrix.type =
argument. By default, the function
as.network()
sets the argument matrix.type =
to adjacency
. For an edgelist, we would need to change the
input for the matrix.type =
argument to
edgelist
.
# for example, lets make an edgelist called edge
# it will be a matrix of 5 rows and we are reading off by row
edge <- matrix(
c( "Jen","Tom","Tom","Bob","Bob","Leaf","Bob","Jim","Leaf","Jim" ),
nrow = 5,
byrow = TRUE )
# take a look
edge
## [,1] [,2]
## [1,] "Jen" "Tom"
## [2,] "Tom" "Bob"
## [3,] "Bob" "Leaf"
## [4,] "Bob" "Jim"
## [5,] "Leaf" "Jim"
# create an object called edge.net.u
# but change the default to edgelist for the matrix.type argument.
edge.net.u <- as.network(
edge,
directed = FALSE,
matrix.type = "edgelist" )
# now take a look
summary( edge.net.u )
## Network attributes:
## vertices = 5
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 5
## missing edges = 0
## non-missing edges = 5
## density = 0.5
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## Bob Jen Jim Leaf Tom
## Bob 0 0 1 1 1
## Jen 0 0 0 0 1
## Jim 1 0 0 1 0
## Leaf 1 0 1 0 0
## Tom 1 1 0 0 0
# The as.network() function will often recognize the matrix type being entered
# create the object again, but do not toggle the matrix.type argument
edge.net.u <- as.network(
edge,
directed = FALSE )
# is it different
# no, because the function is programmed to
# read the dimensions of the input object
summary( edge.net.u )
## Network attributes:
## vertices = 5
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 5
## missing edges = 0
## non-missing edges = 5
## density = 0.5
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## Bob Jen Jim Leaf Tom
## Bob 0 0 1 1 1
## Jen 0 0 0 0 1
## Jim 1 0 0 1 0
## Leaf 1 0 1 0 0
## Tom 1 1 0 0 0
Now, let’s work with the example of a directed, binary network. We will create an object that is the adjacency matrix.
# these were written in by row
# so we want it to read by row, byrow=TRUE
mat.d <- matrix(
c( 0,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,1,0,1,0,0,1,1,0 ),
nrow = 5,
byrow = TRUE )
# take a look at the matrix
mat.d
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 1 0 0 0
## [2,] 0 0 1 0 0
## [3,] 0 0 0 1 1
## [4,] 0 0 1 0 1
## [5,] 0 0 1 1 0
Now, let’s coerce it to be an object of class network.
## Network attributes:
## vertices = 5
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 8
## missing edges = 0
## non-missing edges = 8
## density = 0.4
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## 1 2 3 4 5
## 1 0 1 0 0 0
## 2 0 0 1 0 0
## 3 0 0 0 1 1
## 4 0 0 1 0 1
## 5 0 0 1 1 0
Just as before, we could also enter the data as an edgelist. Since we
have directed relations, we have more edges. This is because
reciprocated ties count twice. So, we have to tell the
matrix()
function that the matrix has 8 rows, instead of
5.
# create the edgelist
edge.d <- matrix(
c( "Jen","Tom","Tom","Bob","Bob","Leaf","Bob","Jim",
"Leaf","Bob","Leaf","Jim","Jim","Bob","Jim","Leaf" ),
nrow = 8,
byrow = TRUE )
# create the network object
edge.d.net <- as.network(
edge.d,directed = TRUE,
matrix.type = "edgelist" )
# take a look
summary( edge.d.net )
## Network attributes:
## vertices = 5
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 8
## missing edges = 0
## non-missing edges = 8
## density = 0.4
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## Bob Jen Jim Leaf Tom
## Bob 0 0 1 1 0
## Jen 0 0 0 0 1
## Jim 1 0 0 1 0
## Leaf 1 0 1 0 0
## Tom 1 0 0 0 0
# I have added the argument print.adj=FALSE
# what is different?
summary( edge.d.net,
print.adj = FALSE )
## Network attributes:
## vertices = 5
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 8
## missing edges = 0
## non-missing edges = 8
## density = 0.4
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## No edge attributes
If we had a large network, these routines (i.e. using the
matrix()
function) would be tedious. Most of the time, we
have a file that is an adjacency matrix or an edgelist that we can
import. The read.csv()
function can be used to read in .csv
files that are arranged in this way. Let’s take a look at the help for
this function: ?read.csv
.
We will use the data-undirected-example.csv file
from the Social
Network Analysis Textbook website. To access the file, we can place
the url in the read.csv()
function.
Here is the url: https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-undirected-example.csv.
# define the path
url <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-undirected-example.csv"
# define the data using the url object
mat.u <- read.csv( url )
# look at the object
mat.u
# note that the read.csv function creates an object of class data.frame.
class( mat.u )
We need to adjust the arguments to read in the file how we want it. Specifically, we want to do the following:
Set the as.is =
argument equal to TRUE
so that it reads the data as it is.
Set the header =
argument to TRUE
to
indicate that there is a header, or column labels.
Set the row.names =
argument equal to 1 to indicate
that the name of the rows are in the first column.
# look at the arguments
mat.u2 <- read.csv(
url,
as.is = TRUE,
header = TRUE,
row.names = 1
)
mat.u2
# compare them
mat.u
mat.u2
Now, make the object into one of class
network:
# we have to first coerce the object to a matrix
mat.u2 <- as.matrix(
mat.u2 )
# recall that since this network is undirected
# we set the directed= argument to FALSE
net.u <- as.network(
mat.u2,
directed = FALSE
)
net.u
# we could combine the as.matrix and as.network functions
net.u <- as.network(
as.matrix(
mat.u2
),
directed = FALSE
)
net.u
We could also import the file if it is saved locally (i.e. we are not going to the web to get it). Typically we do not do this because it is a bad practice. That is, creating a version of a file locally. But, sometimes you might be offline or you have files that cannot be put online.
Let’s do this for the directed network. I have saved the file to my
desktop. First, look at what directory we are in using:
getwd()
function. This function gets the wording directory
R is currently looking at.
Then, set the directory where the file is using the
setwd()
function. You can get the location of the file by
right-clicking and in Windows using Properties or on Mac using
Get Info. Note that you have to configure this path to your
machine.
Then, use read.csv
as above:
setwd(" PUT THE CORRECT PATH HERE")
mat.d <- read.csv(
"data-undirected-example.csv",
as.is=TRUE,
header=TRUE,
row.names=1
)
# Note: we don't need to tell it that
# the network is directed since
# this is the default,
# but a good habit to get into.
net.d <- as.network(
as.matrix( data.d ),
directed=TRUE
)
## Network attributes:
## vertices = 5
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 10
## missing edges= 0
## non-missing edges= 10
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
Why are you learning this? As you can probably guess, we are going to
be working with matrices and objects of class network
a LOT
in this course!