Working with Networks in R

In the Network Data Structures chapter of the Social Network Analysis for Crime Analysts textbook, we reviewed how networks can be represented as sociomatrices. These matrices form the basic unit of data analysis in SNA. In this tutorial, we will review how we can build networks in R. Why? Understanding how to work with networks in R is essential for crime analysts because networks provide a powerful lens for analyzing relationships and interactions that drive criminal behavior. Whether you’re mapping connections between individuals in a gang, examining the flow of illicit goods, or identifying key actors in a criminal enterprise, networks reveal critical structures and patterns that traditional data analysis might overlook.

As a reminder, in the tutorials for this textbook, R code appears like this and can be copied directly into R or RStudio. Let’s start getting our hands dirty (figuratively of course) by building some networks in R!

Working with matrices

First, clear the workspace. To do so, we use the following statement:

rm( list = ls() )

Let’s start by working with an example of an undirected, binary network. We will create an object that is the adjacency matrix.

One way to create an adjacency matrix is to use the matrix() function with the concatenate() or c() function.
We can look at what these functions do by asking for help using the help(" function name here ") or ?(" function name here ") functions.

The help window describes the arguments that the function takes and also provides examples.

# help for the matrix() function
?matrix

# help for the c() function
?c

Now, let’s create the data object:

mat <- matrix(
  c( 0,1,0,0,0,1,0,1,0,0,0,1,0,1,1,0,0,1,0,1,0,0,1,1,0 ),
  nrow=5,
  byrow=TRUE
  )

This command reads as follows:

Combine the following numbers
From these combined numbers, create a matrix by reading across the list
Create an object called mat. This object will be a matrix with 5 rows.

We can see the object by just typing the object name: mat. Note that if the number of elements does not correctly match the dimensions of the matrix, R gives you an error.

For example:

junk_1 <- matrix( c( 1,2,3,4,5,6,7 ), nrow=2, byrow=TRUE )  

# Because there are 7 elements here, 
# the 8th element needed for a matrix 
# is replaced with the first value in the vector

# print out the object by just typing the name of the object
junk_1

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    1

After we have created our object mat or junk_1, we can examine the dimensions with the dim function: dim( mat ) or dim( junk_1 ).

We can also attach names to the rows and columns of the matrix by using the rownames() and colnames() functions.

# attach row names
rownames( mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )

# attach column names
colnames( mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )

# print out the object
mat

     Jen Tom Bob Leaf Jim
Jen    0   1   0    0   0
Tom    1   0   1    0   0
Bob    0   1   0    1   1
Leaf   0   0   1    0   1
Jim    0   0   1    1   0

We can refer to specific elements, rows, or columns by using the [ and ] symbols. This reads as: “object[row,column]”. Remember from the Getting Started with R tutorial that I said we would use indexing a lot? Well, this is where that comes true!

For example, let’s look at the relation Jen sends to Tom.

Recall from the Network Data Structures chapter that this is element 1,2 in the matrix (i.e. row one, column two). In R code that is: mat[1,2].

This command reads as follows: for the object mat, return the value at row 1 column 2. The row number is the first dimension and the column is the second dimension. Remember: “rows by columns”.

We can also call the values for an entire row or column:

# this reads: return the first row of data
mat[1,]

 Jen  Tom  Bob Leaf  Jim 
   0    1    0    0    0

# this reads: return the first column of data
mat[,1]

 Jen  Tom  Bob Leaf  Jim 
   0    1    0    0    0

Since we have defined names for the rows and columns, we can use those as well.

# reference the ROW pertaining to Jen
mat["Jen",]

 Jen  Tom  Bob Leaf  Jim 
   0    1    0    0    0

# reference the COLUMN pertaining to Jen
mat[,"Jen"]

 Jen  Tom  Bob Leaf  Jim 
   0    1    0    0    0

Note: the following does not work because it needs a character, defined by the "" symbols around the name.

# returns an error because there is no object called Jen 
mat[,Jen]

# compare the difference with the prior line
mat[,"Jen"]

We can also call a series of values:

# return the first three ROWS of the object data
mat[1:3,]

    Jen Tom Bob Leaf Jim
Jen   0   1   0    0   0
Tom   1   0   1    0   0
Bob   0   1   0    1   1

# return the first three COLUMNS of the object data
mat[,1:3]

     Jen Tom Bob
Jen    0   1   0
Tom    1   0   1
Bob    0   1   0
Leaf   0   0   1
Jim    0   0   1

We can also call a group of values that are non-contiguous using the c() function:

# return the first and second ROWS of the object data
mat[c(1,3),]

    Jen Tom Bob Leaf Jim
Jen   0   1   0    0   0
Bob   0   1   0    1   1

# return the first and second COLUMNS of the object data
mat[,c(1,3)]

     Jen Bob
Jen    0   0
Tom    1   1
Bob    0   0
Leaf   0   1
Jim    0   1

We can also call a group of values that do not contain specified values by putting a - (i.e. a minus sign) in front of the c function:

# return the object data without ROWS 1 and 3
mat[-c(1,3),]

     Jen Tom Bob Leaf Jim
Tom    1   0   1    0   0
Leaf   0   0   1    0   1
Jim    0   0   1    1   0

# return the object data without COLUMNS 1 and 3
mat[,-c(1,3)]

     Tom Leaf Jim
Jen    1    0   0
Tom    0    0   0
Bob    1    1   1
Leaf   0    0   1
Jim    0    1   0

Got it? If yes, then GREAT! If no, hang in there: you got this! If you are a bit hesitant with working with indexing, the best way to get better is to practice. Feel free to work back through the section above to get better at this basic skill that we will use a LOT in subsequent tutorials.

Exploring the `network` Package

Now that we have created the mat object, let’s manipulate it into a network and a graph. To do this, we can use the network package. network is a package containing tools to create and modify network objects created by Carter Butts. See the network package page for an overview of its functionality.

First, we need to install the package using: install.packages( "network" ). Note: if you have already installed the package, no do not need to reinstall it.

If it is already installed, we should check to make sure we have the most recent version: update.packages( "network" )

Whenever we start R, we need to load the package because it is not automatically loaded. To do this, use the library() function. library( "network" )

To get a list of the contents of the package, as for help with respect to the package itself use the help() function, but tell R we want help on the particular package: help( package = "network" ).

Working with Unidirected, Binary Networks

Now that the package is loaded, let’s create a new object from our matrix that is a network. In R lingo, we will use the network() function to create an object that is of class network. To use some of the functions, it has to be a specific class.

Just like you can’t perform calculations on an object that is of class character (e.g. a list of names), the functions in this page are designed to work with a network object.

# look at our object
mat

     Jen Tom Bob Leaf Jim
Jen    0   1   0    0   0
Tom    1   0   1    0   0
Bob    0   1   0    1   1
Leaf   0   0   1    0   1
Jim    0   0   1    1   0

# what class is the object
class( mat )

[1] "matrix" "array"

# now, coerce the object into 
# an object called net.u that
# is of class network
net_u <- as.network( mat )

net_u

 Network attributes:
  vertices = 5 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 10 
    missing edges= 0 
    non-missing edges= 10 

 Vertex attribute names: 
    vertex.names 

No edge attributes

When we enter the object in the command line, summary info about the object is produced: net_u. This is because the object is of class network. We can use the class function to confirm this: class( net_u ).

Let’s look at the object again: net_u. What does the summary output of the object tell us?

Note that the network is treated as directed. By default, the function as.network() sets the argument directed = to TRUE. We can see this by looking at the structure of the function in the help page: ?as.network. What do we need to change in the as.network() function?

We need to change the input for the directed= argument because our network is undirected. In other words, directed = FALSE. This tells the function that the matrix we are entering is an undirected network. This is logical: is the object a directed network? False. Therefore, it is an undirected network.

# create a new object called net_u_correct
net_u_correct <- as.network( 
  mat, 
  directed = FALSE )

# compare the difference since telling the function that the network is directed
net_u

 Network attributes:
  vertices = 5 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 10 
    missing edges= 0 
    non-missing edges= 10 

 Vertex attribute names: 
    vertex.names 

No edge attributes

# how is this one different?
net_u_correct

 Network attributes:
  vertices = 5 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 5 
    missing edges= 0 
    non-missing edges= 5 

 Vertex attribute names: 
    vertex.names 

No edge attributes

The summary() function is a generic function that summarizes objects. We can use it on an object of class network to provide more information: summary( net_u_correct ). More information about what can be done with the summary() function for an object of class network is shown on the ?as.network page.

We could also enter the data as an edgelist using the matrix.type = argument. By default, the function as.network() sets the argument matrix.type = to adjacency. For an edgelist, we would need to change the input for the matrix.type = argument to edgelist.

# for example, let's make an edgelist called edge
# it will be a matrix of 5 rows and we are reading off by row
edge <- matrix(
  c( "Jen","Tom","Tom","Bob","Bob","Leaf","Bob","Jim","Leaf","Jim" ),
  nrow = 5, 
  byrow = TRUE )

# take a look
edge

     [,1]   [,2]  
[1,] "Jen"  "Tom" 
[2,] "Tom"  "Bob" 
[3,] "Bob"  "Leaf"
[4,] "Bob"  "Jim" 
[5,] "Leaf" "Jim"

We can see that there are two columns and five rows. Recall that in an edgelist, each row corresponds to an edge in the network. So, the first row of edge, or edge[1,], is the tie between Jen and Tom. From this edgelist, we can build a network:

# create an object called edge_net_u
# but change the default to edgelist for the matrix.type argument.

edge_net_u <- as.network(
  edge, 
  directed = FALSE, 
  matrix.type = "edgelist" ) 


# now take a look

summary( edge_net_u )

Network attributes:
  vertices = 5
  directed = FALSE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 5 
   missing edges = 0 
   non-missing edges = 5 
 density = 0.5 

Vertex attributes:
  vertex.names:
   character valued attribute
   5 valid vertex names

No edge attributes

Network adjacency matrix:
     Bob Jen Jim Leaf Tom
Bob    0   0   1    1   1
Jen    0   0   0    0   1
Jim    1   0   0    1   0
Leaf   1   0   1    0   0
Tom    1   1   0    0   0

# The as.network() function will often recognize the matrix type being entered
# create the object again, but do not toggle the matrix.type argument

edge_net_u <- as.network( 
  edge,
  directed = FALSE )


# is it different
# no, because the function is programmed to 
# read the dimensions of the input object

summary( edge_net_u )

Network attributes:
  vertices = 5
  directed = FALSE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 5 
   missing edges = 0 
   non-missing edges = 5 
 density = 0.5 

Vertex attributes:
  vertex.names:
   character valued attribute
   5 valid vertex names

No edge attributes

Network adjacency matrix:
     Bob Jen Jim Leaf Tom
Bob    0   0   1    1   1
Jen    0   0   0    0   1
Jim    1   0   0    1   0
Leaf   1   0   1    0   0
Tom    1   1   0    0   0

Working with Directed, Binary Networks

Now, let’s work with the example of a directed, binary network. We will create an object that is the adjacency matrix.

# these were written in by row
# so we want it to read by row, byrow=TRUE

mat_d <- matrix(
  c( 0,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,1,0,1,0,0,1,1,0 ),
  nrow = 5,
  byrow = TRUE )


# take a look at the matrix

mat_d

     [,1] [,2] [,3] [,4] [,5]
[1,]    0    1    0    0    0
[2,]    0    0    1    0    0
[3,]    0    0    0    1    1
[4,]    0    0    1    0    1
[5,]    0    0    1    1    0

Now, let’s coerce it to be an object of class network.

net_d <- as.network( 
  mat_d,
  directed = TRUE )


# take a look

summary( net_d )

Network attributes:
  vertices = 5
  directed = TRUE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 8 
   missing edges = 0 
   non-missing edges = 8 
 density = 0.4 

Vertex attributes:
  vertex.names:
   character valued attribute
   5 valid vertex names

No edge attributes

Network adjacency matrix:
  1 2 3 4 5
1 0 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 1
4 0 0 1 0 1
5 0 0 1 1 0

Just as before, we could also enter the data as an edgelist. Since we have directed relations, we have more edges. This is because reciprocated ties count twice. So, we have to tell the matrix() function that the matrix has 8 rows, instead of 5.

# create the edgelist

edge_d <- matrix(
  c( "Jen","Tom","Tom","Bob","Bob","Leaf","Bob","Jim",
     "Leaf","Bob","Leaf","Jim","Jim","Bob","Jim","Leaf" ),
  nrow = 8,
  byrow = TRUE )


# create the network object

edge_d_net <- as.network(
  edge_d,
  directed = TRUE,
  matrix.type = "edgelist" )


# take a look

summary( edge_d_net )

Network attributes:
  vertices = 5
  directed = TRUE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 8 
   missing edges = 0 
   non-missing edges = 8 
 density = 0.4 

Vertex attributes:
  vertex.names:
   character valued attribute
   5 valid vertex names

No edge attributes

Network adjacency matrix:
     Bob Jen Jim Leaf Tom
Bob    0   0   1    1   0
Jen    0   0   0    0   1
Jim    1   0   0    1   0
Leaf   1   0   1    0   0
Tom    1   0   0    0   0

# I have added the argument print.adj=FALSE
# what is different?

summary( edge_d_net,
         print.adj = FALSE )

Network attributes:
  vertices = 5
  directed = TRUE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 8 
   missing edges = 0 
   non-missing edges = 8 
 density = 0.4 

Vertex attributes:
  vertex.names:
   character valued attribute
   5 valid vertex names

No edge attributes

Importing Network Data

If we had a large network, these routines (i.e. using the matrix() function) would be tedious and most likely result in a few errors. Most of the time, we have a file that is an adjacency matrix or an edgelist that we can import. The read.csv() function can be used to read in .csv files that are arranged in this way. Let’s take a look at the help for this function: ?read.csv.

We will use a file called data-undirected-example.csv. To access the file, we can place the url in the read.csv() function.

Here is the url: https://github.com/jacobtnyoung/snaca-r/raw/main/data/data-undirected-example.csv.

# define the path

url <- "https://github.com/jacobtnyoung/snaca-r/raw/main/data/data-undirected-example.csv"


# define the data using the url object

mat_u <- read.csv( url )


# look at the object

mat_u


# note that the read.csv function creates an object of class data.frame.

class( mat_u )

We need to adjust the arguments to read in the file how we want it. Specifically, we want to do the following:

Set the as.is = argument equal to TRUE so that it reads the data as it is.
Set the header = argument to TRUE to indicate that there is a header, or column labels.
Set the row.names = argument equal to 1 to indicate that the name of the rows are in the first column.

# look at the arguments

mat_u2 <- read.csv( 
  url,
  as.is = TRUE,
  header = TRUE,
  row.names = 1 
  )

mat_u2


# compare them

mat_u

mat_u2

Now, make the object into one of class network:

# we have to first coerce the object to a matrix

mat_u2 <- as.matrix( mat_u2 )


# recall that since this network is undirected
# we set the directed= argument to FALSE

net_u <- as.network( 
  mat_u2,
  directed = FALSE
   )

net_u


# we could combine the as.matrix and as.network functions

net_u <- as.network( 
  as.matrix( 
    mat_u2 ), 
  directed = FALSE 
  )

net_u

We could also import the file if it is saved locally (i.e. we are not going to the web to get it). Typically we do not do this because it is a bad practice. That is, creating a version of a file locally. But, sometimes you might be offline or you have files that cannot be put online.

Let’s do this for the directed network. I have saved the file to my desktop. First, look at what directory we are in using the getwd() function. This function gets the current working directory.

Then, set the directory where the file is located using the setwd() function. You can get the location of the file by right-clicking and in Windows using Properties or on Mac using Get Info. Note that you have to configure this path to your machine.

Then, use read.csv as above:

setwd( "PUT THE CORRECT PATH HERE" )

mat_d <- read.csv(
  "data-directed-example.csv",
  as.is=TRUE,
  header=TRUE,
  row.names=1
  )


# Note: we don't need to tell it that 
# the network is directed since 
# this is the default, 
# but a good habit to get into.

net_d <- as.network(
  as.matrix( data_d ),
  directed=TRUE
  )

# now, print the object

net_d

 Network attributes:
  vertices = 5 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 8 
    missing edges= 0 
    non-missing edges= 8 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Calling Network Data in SNACpack

We have seen how you can build a network object manually and how you can import one. As we will see throughout the tutorials, the SNACpack package contains several prebuilt data sets for you to use. To access these, we need to make sure we have the SNACpack package installed. We covered this in the Getting Started with R tutorial, but we will review it here.

Recall that this package is not on CRAN, rather it is located on Github, which is an online repository for storing code and various projects. To install the SNACpack package, we need to first install a package called devtools because it has a function install_github() that we will use to install the SNACpack package. Remember, if you have already installed devtools and SNACpack you do not need to reinstall them.

To install SNACpack, let’s use the following code chunk:

# first install the devtools package

install.packages( "devtools" )


# now, load the devtools package

library( devtools )

Ok, no we are ready to use the install_github() function. For this function to work, you need to know the location of the package you want to install. The location of the SNACpack package is: jacobtnyoung/SNACpack. If you want, you can access the repository using this link: https://github.com/jacobtnyoung/SNACpack. Now that we have the location, we can install it:

# install the package

install_github( "jacobtnyoung/SNACpack" )


# now, load the library

library( SNACpack )

Done! Take a look at the package using help( package = "SNACpack" ) to get a sense of what it contains (don’t worry, we will be going through this in more detail throughout the tutorials).

The networks in SNACpack are stored as objects of class network. R recongizes this, and loads the network package for you. So, to examine the network object, you just have to type it out. Let’s look at a network of cocaine dealers:

# examine the network

cocaine_dealing_net

 Network attributes:
  vertices = 28 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 40 
    missing edges= 0 
    non-missing edges= 40 

 Vertex attribute names: 
    vertex.names 

No edge attributes

What do we see? Well, there are 28 vertices (i.e. nodes), and 40 edges (ties). We also see that the network is directed. To learn more about the network, we can call the help file for the object by using ?cocaine_dealing_net.

As mentioned, we will use data in SNACpack throughout the book, so you will get more practice working with these objects.

Test Your Knowledge Exercises

Create a 5x5 adjacency matrix using the matrix() function.
- What happens if the number of elements doesn’t match the dimensions of the matrix?
- Assign meaningful row and column names to the adjacency matrix using the rownames() and colnames() functions.
- How can you reference a specific row or column by name?
Use the [ and ] symbols to extract specific elements, rows, or columns from a matrix.
Coerce an adjacency matrix into a network object using the as.network() function.
- What argument must you modify to create an undirected network?
Create an edgelist for a directed network and convert it to a network object. How does this process differ from working with adjacency matrices?
Summarize a network object using the summary() function.
- What insights can you gain from this output?
Explain the difference between directed and undirected networks in the context of the as.network() function.
- What specific arguments are used to define these properties?

Tutorial Summary

This tutorial introduced the basics of working with network data in R, focusing on creating, manipulating, and importing network structures. It began with constructing adjacency matrices using the matrix() function, followed by assigning meaningful row and column names for clarity and exploring matrix indexing to extract specific elements, rows, or columns. The tutorial then transitioned to coercing matrices into network objects using the as.network() function from the network package, with special attention to specifying whether a network is directed or undirected. The use of edgelists as an alternative to adjacency matrices was demonstrated, highlighting the importance of specifying the correct matrix type in the as.network() function. The tutorial also covered importing network data from external sources, including CSV files from URLs or local directories, and converting them into network objects. The emphasis was on building a solid foundation for working with network data structures, which are essential for analyzing relationships and interactions in crime analysis.