As discussed in the Bipartite Graphs & Two-Mode Networks lecture, bipartite graphs are useful for operationalizing contexts where nodes come from two separate classes. In contrast to one-mode networks, or unipartite graphs, where edges can be incident within a particular node/vertex set, in two-mode or bipartite graphs there are two partitions of nodes (called modes), and edges only occur between these partitions (i.e. not within).
This lab examines various properties of bipartite graphs
(e.g. density, degree centrality, dyadic clustering) using equations
from the lecture. We will also be review the tnet
package.
The tnet
package is a collection of functions for working
with two-mode and weighted networks written by Tore Opsahl.
Why are you learning this? Bipartite graphs are a common network structure used in many network analysis projects. Being able to work with these objects and summarize their structure is an important tool for your skill set as a social network analyst!
Let’s build the example bipartite graph from the Bipartite Graphs &
Two-Mode Networks lecture:
# clear the workspace
rm( list = ls() )
# create the example network
bipartite.example <- rbind(
c( 1,1,0,0,0 ),
c( 1,0,0,0,0 ),
c( 1,1,0,0,0 ),
c( 0,1,1,1,1 ),
c( 0,0,1,0,0 ),
c( 0,0,1,0,1 ) )
# assign names to the rows
rownames( bipartite.example ) <- c( "A","B","C","D","E","F" )
# assign names to the columns
colnames( bipartite.example ) <- c( "1","2","3","4","5" )
# print out the object
bipartite.example
## 1 2 3 4 5
## A 1 1 0 0 0
## B 1 0 0 0 0
## C 1 1 0 0 0
## D 0 1 1 1 1
## E 0 0 1 0 0
## F 0 0 1 0 1
As we can see, the matrix has 6 rows and 5 columns. So, the order of the matrix is 6 x 5.
network
packageWe can create an object of class network
by using the
as.network()
function in the network
package.
First, take a look at the help for the as.network()
function, paying particular attention to the bipartite=
argument.
# call the network package
library( network )
# pull up the help for the as.network() function
?as.network
In looking through the help file for the as.network()
function, we see that the bipartite=
argument says that
this argument allows the count of the number of actors in the bipartite
network. A bipartite adjacency matrix has order NxM, where
N represents the number of rows (e.g. actors) and M
represents the number of columns (e.g. events). In the
bipartite=
argument, we can specify the count of actors as
N.
For example:
# identify the number of actors in the example
N <- dim( bipartite.example )[1]
# create a network object
bipartite.example.net <- as.network(
bipartite.example, # here is our matrix
bipartite = N # define the number of actors
)
# there 11 vertices, 6 are bipartite (in the first mode), and 12 edges
bipartite.example.net
## Network attributes:
## vertices = 11
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = 6
## total edges= 12
## missing edges= 0
## non-missing edges= 12
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
gplot()
in the sna
packageNow that our object is created, we can take a look at a plot of the
network using the gplot()
function. Before we do so, let’s
take a look at a few changes we need to make. First, note that the
gplot()
function reads the labels by starting with the
names in the first mode, then the names in the second mode. We can see
this by printing the vertex labels the function assigns with the
network.vertex.names()
function.
## [1] "A" "B" "C" "D" "E" "F" "1" "2" "3" "4" "5"
We can see that it first labels the actor nodes (i.e. A, B, C, D, E, and F) and then names the event nodes (i.e. 1, 2, 3, 4, 5).
Second, we need to tell the gplot()
function that the
network has two modes, not one mode. We do this using the
gmode=
argument, which automatically changes the colors and
the shapes of the nodes when we specify that the graph is
twomode
. Third, we need to set the usearrows=
argument to FALSE
so that the arrowheads are turned off.
(Note that there are directed two-mode
networks in which you would use arrowheads, but we will skip that for
now). Let’s check it out:
# load the sna library to get the gplot() function
library( sna )
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
bipartite.example.net, # our network to plot
label = network.vertex.names( bipartite.example.net ), # the labels we want
gmode = "twomode", # indicate it is two modes
usearrows = FALSE, # turn off the arrowheads
vertex.cex=2, # size the nodes
label.cex=1.2, # size the labels
main="Bipartite Graph of Example Graph" # add a title
)
As we saw in the Basics of Network
Visualization in R lab, there are a lot of options that help us
convey important information about a network. When we are working with a
two mode network object, we need to make sure that whatever information
we pass to the nodes matches the order of the nodes in the network. For
example, we saw above that the network first labels the actor nodes and
then labels the event nodes. If we wanted to choose a different set of
colors, for example, using the vertex.col=
argument, then
we would want a set of colors for the actors and a set of colors for the
events. Then, we would want these combined into a single object. Let’s
do that here to demonstrate:
# identify the number of actors in the example
N <- dim( bipartite.example )[1]
# identify the number of events in the example
M <- dim( bipartite.example )[2]
# set the actor colors
actor.col <- rep( "#1fdeb1", N )
# set the event colors
event.col <- rep( "#bab4de", M )
# now combine them into a single vector of colors
node.col <- c( actor.col, event.col )
# take a look
node.col
## [1] "#1fdeb1" "#1fdeb1" "#1fdeb1" "#1fdeb1" "#1fdeb1" "#1fdeb1" "#bab4de"
## [8] "#bab4de" "#bab4de" "#bab4de" "#bab4de"
Now that we have a set of colors that match the order of the nodes,
we can pass it into the gplot()
function using the
vertex.col=
argument:
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
bipartite.example.net, # our network to plot
label = network.vertex.names( bipartite.example.net ), # the labels we want
gmode = "twomode", # indicate it is two modes
usearrows = FALSE, # turn off the arrowheads
vertex.cex = 3, # size the nodes
label.cex = 1.2, # size the labels
label.pos = 5, # position the labels on the nodes
main="Bipartite Graph of Example Graph", # add a title
# here is the addition to what we had above:
vertex.col = node.col # add the colors
)
This setup, where we define the actor properties and event properties and combine them into a vector will be used for any attribute we want to attach to the nodes. We will work through more examples below illustrating this point.
As reviewed in the Bipartite Graphs & Two-Mode Networks lecture, there are multiple structural properties of bipartite graphs that we can examine to help us describe the network.
The density of a bipartite graph is the number of observed edges in the graph, L, divided by the number of nodes in the first mode, N, multiplied by the number of nodes in the second mode, M. That is:
\[\frac{L}{N \times M}\]
In other words, the density of the graph is the number of edges we
observed divided by the maximum number of possible edges in the graph.
We can calculate this using the sum()
and
dim()
functions.
# identify the number of edges in the graph
L <- sum( bipartite.example )
# identify the number of actors in the example
N <- dim( bipartite.example )[1]
# identify the number of events in the example
M <- dim( bipartite.example )[2]
# calculate the density
density.bipartite.example <- L / ( N * M )
# check it out
density.bipartite.example
## [1] 0.4
What is the interpretation of the density?
For a bipartite graph there are two degree distributions:
The distribution of ties in the first mode
The distribution of ties in the second mode
We can calculate the degree centrality scores for each node in each
corresponding vertex set by taking the row sum for N
nodes in the first mode and taking the column sum for
M nodes in the second mode. We can do so using the
rowSums()
and colSums()
functions,
recspectively.
## A B C D E F
## 2 1 2 4 1 2
## 1 2 3 4 5
## 3 3 3 1 2
How should we interpret the centrality scores for each node set? Well, it is a bit difficult when just looking at it here. So, we can calculate a summary statistic, such as the mean, to evaluate the distribution of centrality scores for each node set.
As before, we could examine the central tendency by examining the mean degree for each node/vertex set. We take the sum of the edges, \(L\) and:
for the first node set we divide by \(\frac{L}{N}\), the number of nodes in that set.
for the second node set we divide by \(\frac{L}{M}\), the number of nodes in that set.
# mean degree for actors
mean.actor.deg <- L / N
# mean degree for events
mean.event.deg <- L / M
# an alternative is to just use the mean() function with the degree data
mean( actor.deg )
## [1] 2
## [1] 2.4
How should we interpret the mean centrality score for each node set? The mean for the actor node set indicates that, on average, each node has 2 ties. The mean for the event node set indicates that, on average, each event has 2.4 ties.
Degree centrality scores for each node/vertex set not only reflects each node’s connectivity to nodes in the other set, but also depend on the size of that set. As a result, larger networks will have a higher maximum possible degree centrality value. Solution?
Standardize!!!
As we saw for unipartite graphs, we can adjust the raw degree centrality scores by taking into account the size of the graph. In a bipartite graph, we can standardize, or normalize, by dividing the raw centrality scores by the number of nodes in the opposite vertex set. That is, for the centrality scores in the first mode we divide by M and for the centrality scores in the second mode we divide by N.
## A B C D E F
## 0.4 0.2 0.4 0.8 0.2 0.4
## 1 2 3 4 5
## 0.5000000 0.5000000 0.5000000 0.1666667 0.3333333
In networks with lots of nodes, this information might be useful for
visualizing differences in nodes degree centrality. Let’s create a plot
with the standardized scores where larger degree centrality influences
the size of the nodes. As we did above, we will want to create a single
object that has these sizes. We can do this using the c()
function.
# define the standardized scores for actors
actor.size <- actor.deg / M
# define the standardized scores for events
event.size <- event.deg / N
# combine these to use in the plot
v.size <- c( actor.size, event.size )
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
bipartite.example.net, # our network to plot
label = network.vertex.names( bipartite.example.net ), # the labels we want
gmode = "twomode", # indicate it is two modes
usearrows = FALSE, # turn off the arrowheads
label.cex=1.2, # size the labels
main="Bipartite Graph of Example Graph", # add a title
vertex.col = node.col, # add the colors
# here is the addition to what we had above:
vertex.cex = v.size + 0.5 # set the size (add 0.5 so it is not too small)
)
The density tells us about the overall level of ties between the node/vertex sets in the graph and degree centrality tells us about how many edges are incident on a node in each node/vertex set. What about the overlap in ties? In other words, do nodes in N tend to share nodes in M? This is the notion of clustering in a graph.
In a bipartite graph, there are two interesting structures:
3-paths (sometimes called \(L_3\)) are a path of distance 3.
Cycles (sometimes called \(C_4\)) are a closed 3-path.
Cycles in a graph create multiple ties between vertices in both modes. The ratio of cycles to 3-paths in a graph is proportional to the level of dyadic clustering (sometimes called reinforcement). A value of 1 indicates that every 3-path is closed (i.e., is embedded in a cycle). Networks with values at or close to 1 will have considerable redundancy in ties.
We can calculate the dyadic clustering coefficient for the
example network above by using the reinforcement_tm()
function in the tnet()
package. If you have not already
installed the tnet()
package, do so using the
install.packages("tnet")
command.
# note that tnet conflicts with some of the functionality of the sna package
library( tnet )
# take a look at the reinforcement_tm() function
?reinforcement_tm
# first, we need to coerce our network to be of class "tnet" using as.tnet() so that the reinforcement() function will work
?as.tnet
bipartite.example.tnet <- as.tnet( # coerce it to a tnet
as.matrix( # coerce it to a matrix
bipartite.example.net # the network object to coerce
)
)
# now, calculate the dyadic clustering coefficient
reinforcement_tm( bipartite.example.tnet )
## [1] 0.3076923
What is the interpretation of the dyadic clustering coefficient in this example?
Now, let’s work with a real example. As discussed in the Bipartite Graphs & Two-Mode Networks lecture, Young & Ready (2015) examined how police officers develop cognitive frames about the usefulness of body-worn cameras. They argued that police officers views of body-worn cameras influence whether they use their cameras in incidents and that these views partly result from sharing incidents with other officers where they exchanged views about the legitimacy of body-worn cameras.
The adjacency matrix is available in the SNA Textbook data folder. Let’s import the the matrix, create a network, assign an attribute, and plot it. Then, we will work through the structural properties.
# set the location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-officer-events-adj.csv"
# read in the .csv file
camMat <- as.matrix(
read.csv(
loc,
as.is = TRUE,
header = TRUE,
row.names = 1
)
)
# identify the number of police officers
N <- dim( camMat )[1]
# identify the number of incidents
M <- dim( camMat )[2]
Now, we want to coerce this to an object of class
network
. To do so, we need to reload the
network
library. This is because when we loaded the
tnet
library above, it hid some of the functionality of the
network
package do to conflicts in the coding of
functions.
# load the network package
library( network )
# create the network object
OfficerEventsNet <- as.network(
camMat,
bipartite = N
)
# take a look
OfficerEventsNet
## Network attributes:
## vertices = 234
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = 81
## total edges= 346
## missing edges= 0
## non-missing edges= 346
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
Now, let’s create a plot of the network. To aid in visualization, we
need to create a variable indicating whether the officer was in the
treatment (i.e. received a body-cam) or control group. If you look at
the network.vertex.names( OfficerEventsNet )
object, you
will see that the officer ids start with “C” or “T”. The “C” prefix
indicates a control officer and a “T” prefix indicates
a treatment officer. The first 44 ids are for control
officers and the subsequent 37 ids are for treatment officers. The
remaining 153 correspond to events. We can use this info to create a
variable that indicates treatment status.
We will use this information to create an attribute. To do so, we
will use the rep()
function which repeats a sequence of
numbers of characters.
# create a variable using the information in the ids
status <- c(
rep( "Control", 44 ), # repeat "Control" 44 times because there are 44 control officers
rep( "Treatment", 37 ), # repeat "Treatment" 37 times because there are 37 control officers
rep( "Incident", 153 ) # repeat "Incident" 153 times because there are 153 incident
)
# create colors for the plot
vcol <- status
vcol[ status == "Control" ] <- "green" # make controls green
vcol[ status == "Treatment" ] <- "red" # make treatment group red
vcol[ status == "Incident" ] <- "white" # make events white
# create the shapes
vsides <- rep( 0, length( status ) )
vsides[ status == "Control" ] <- 3 # make controls triangles
vsides[ status == "Treatment" ] <- 4 # make treatment group squares
vsides[ status == "Incident" ] <- 50 # make events circles
# create the node sizes
nsize <- c(
rep( 2, N ), # sizes for officers
rep( 1.2, M ) # size for the events
)
Now, we want to plot this using the gplot()
function. To
do so, we need to reload the sna
library. This is because
when we loaded the tnet
library above, it hid some of the
functionality of the sna
package do to conflicts in the
coding of functions.
Now, let’s take a look at several properties of the graph: density, degree centrality, and dyadic clustering.
Remember, the density is the total number of edges divided by
N x M
.
## [1] 0.02791899
The density for the this network is 0.03. What is the interpretation of the density for this network?
Now, let’s take a look at the degree distributions.
# raw scores for officers
officer.deg <- rowSums( camMat )
# raw scores for incidents
incident.deg <- colSums( camMat )
# mean degree for officers
mean.officer.deg <- L / N
mean.officer.deg
## [1] 4.271605
## [1] 2.261438
What is the interpretation of the mean degree for each node set?
Lastly, we can calculate the dyadic clustering coefficient
for this network by using the reinforcement_tm()
function
in the tnet()
package.
# load tnet
library( tnet )
# coerce the network object to work with tnet
officer.event.tnet <- as.tnet( as.matrix( OfficerEventsNet ) )
# get the reinforcement score
reinforcement_tm( officer.event.tnet )
## [1] 0.219666
What is the interpretation of the dyadic clustering coefficient? What does the value of 0.22 tell us about the structure of the network?
Bipartite graphs are a common network structure used in many network
analysis projects. In this lab, we examined various properties of
bipartite graphs (e.g. density, degree centrality, dyadic clustering)
using equations from the lecture. We also reviewed the tnet
package. The tnet
package is a collection of functions for
working with two-mode and weighted networks written by Tore Opsahl.