As discussed in the Bipartite Graphs & Two-Mode Networks lecture, bipartite graphs are useful for operationalizing contexts where nodes come from two separate classes. In contrast to one-mode networks, or unipartite graphs, where edges can be incident within a particular node/vertex set, in two-mode or bipartite graphs there are two partitions of nodes (called modes), and edges only occur between these partitions (i.e. not within).

This lab examines various properties of bipartite graphs (e.g. density, degree centrality, dyadic clustering) using equations from the lecture. We will also be review the tnet package. The tnet package is a collection of functions for working with two-mode and weighted networks written by Tore Opsahl.

Why are you learning this? Bipartite graphs are a common network structure used in many network analysis projects. Being able to work with these objects and summarize their structure is an important tool for your skill set as a social network analyst!

Bipartite Graphs/Two-Mode Networks

Let’s build the example bipartite graph from the Bipartite Graphs & Two-Mode Networks lecture:

# clear the workspace
rm( list = ls() )

# create the example network
bipartite.example <- rbind(
  c( 1,1,0,0,0 ),
  c( 1,0,0,0,0 ),
  c( 1,1,0,0,0 ),
  c( 0,1,1,1,1 ),
  c( 0,0,1,0,0 ), 
  c( 0,0,1,0,1 ) )

# assign names to the rows
rownames( bipartite.example ) <- c( "A","B","C","D","E","F" )

# assign names to the columns
colnames( bipartite.example ) <- c( "1","2","3","4","5" )

# print out the object
bipartite.example

##   1 2 3 4 5
## A 1 1 0 0 0
## B 1 0 0 0 0
## C 1 1 0 0 0
## D 0 1 1 1 1
## E 0 0 1 0 0
## F 0 0 1 0 1

As we can see, the matrix has 6 rows and 5 columns. So, the order of the matrix is 6 x 5.

Using the `network` package

We can create an object of class network by using the as.network() function in the network package. First, take a look at the help for the as.network() function, paying particular attention to the bipartite= argument.

# call the network package
library( network )

# pull up the help for the as.network() function
?as.network

In looking through the help file for the as.network() function, we see that the bipartite= argument says that this argument allows the count of the number of actors in the bipartite network. A bipartite adjacency matrix has order NxM, where N represents the number of rows (e.g. actors) and M represents the number of columns (e.g. events). In the bipartite= argument, we can specify the count of actors as N.

For example:

# identify the number of actors in the example
N <- dim( bipartite.example )[1]

# create a network object
bipartite.example.net <- as.network(
  bipartite.example, # here is our matrix
  bipartite = N      # define the number of actors
  )

# there 11 vertices, 6 are bipartite (in the first mode), and 12 edges
bipartite.example.net

##  Network attributes:
##   vertices = 11 
##   directed = FALSE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = 6 
##   total edges= 12 
##     missing edges= 0 
##     non-missing edges= 12 
## 
##  Vertex attribute names: 
##     vertex.names 
## 
## No edge attributes

Using `gplot()` in the `sna` package

Now that our object is created, we can take a look at a plot of the network using the gplot() function. Before we do so, let’s take a look at a few changes we need to make. First, note that the gplot() function reads the labels by starting with the names in the first mode, then the names in the second mode. We can see this by printing the vertex labels the function assigns with the network.vertex.names() function.

# look how it reads the labels
network.vertex.names( bipartite.example.net )

##  [1] "A" "B" "C" "D" "E" "F" "1" "2" "3" "4" "5"

We can see that it first labels the actor nodes (i.e. A, B, C, D, E, and F) and then names the event nodes (i.e. 1, 2, 3, 4, 5).

Second, we need to tell the gplot() function that the network has two modes, not one mode. We do this using the gmode= argument, which automatically changes the colors and the shapes of the nodes when we specify that the graph is twomode. Third, we need to set the usearrows= argument to FALSE so that the arrowheads are turned off. (Note that there are directed two-mode networks in which you would use arrowheads, but we will skip that for now). Let’s check it out:

# load the sna library to get the gplot() function
library( sna )

# set the seed to reproduce the plot layout
set.seed( 605 )

# execute the plot
gplot(
  bipartite.example.net,                                  # our network to plot
  label = network.vertex.names( bipartite.example.net ),  # the labels we want
  gmode = "twomode",                                      # indicate it is two modes
  usearrows = FALSE,                                      # turn off the arrowheads
  vertex.cex=2,                                           # size the nodes     
  label.cex=1.2,                                          # size the labels
  main="Bipartite Graph of Example Graph"                 # add a title
)

As we saw in the Basics of Network Visualization in R lab, there are a lot of options that help us convey important information about a network. When we are working with a two mode network object, we need to make sure that whatever information we pass to the nodes matches the order of the nodes in the network. For example, we saw above that the network first labels the actor nodes and then labels the event nodes. If we wanted to choose a different set of colors, for example, using the vertex.col= argument, then we would want a set of colors for the actors and a set of colors for the events. Then, we would want these combined into a single object. Let’s do that here to demonstrate:

# identify the number of actors in the example
N <- dim( bipartite.example )[1]

# identify the number of events in the example
M <- dim( bipartite.example )[2]

# set the actor colors
actor.col <- rep( "#1fdeb1", N )

# set the event colors
event.col <- rep( "#bab4de", M )

# now combine them into a single vector of colors
node.col <- c( actor.col, event.col )

# take a look
node.col

##  [1] "#1fdeb1" "#1fdeb1" "#1fdeb1" "#1fdeb1" "#1fdeb1" "#1fdeb1" "#bab4de"
##  [8] "#bab4de" "#bab4de" "#bab4de" "#bab4de"

Now that we have a set of colors that match the order of the nodes, we can pass it into the gplot() function using the vertex.col= argument:

# set the seed to reproduce the plot layout
set.seed( 605 )

# execute the plot
gplot(
  bipartite.example.net,                                  # our network to plot
  label = network.vertex.names( bipartite.example.net ),  # the labels we want
  gmode = "twomode",                                      # indicate it is two modes
  usearrows = FALSE,                                      # turn off the arrowheads
  vertex.cex = 3,                                         # size the nodes     
  label.cex = 1.2,                                        # size the labels
  label.pos = 5,                                          # position the labels on the nodes
  main="Bipartite Graph of Example Graph",                # add a title
  
  # here is the addition to what we had above:
  vertex.col = node.col                                   # add the colors
)

This setup, where we define the actor properties and event properties and combine them into a vector will be used for any attribute we want to attach to the nodes. We will work through more examples below illustrating this point.

Structural Properties of Bipartite Graphs/Two-Mode Networks

As reviewed in the Bipartite Graphs & Two-Mode Networks lecture, there are multiple structural properties of bipartite graphs that we can examine to help us describe the network.

Density

The density of a bipartite graph is the number of observed edges in the graph, L, divided by the number of nodes in the first mode, N, multiplied by the number of nodes in the second mode, M. That is:

\[\frac{L}{N \times M}\]

In other words, the density of the graph is the number of edges we observed divided by the maximum number of possible edges in the graph. We can calculate this using the sum() and dim() functions.

# identify the number of edges in the graph
L <- sum( bipartite.example )

# identify the number of actors in the example
N <- dim( bipartite.example )[1]

# identify the number of events in the example
M <- dim( bipartite.example )[2]

# calculate the density
density.bipartite.example <- L / ( N * M )

# check it out
density.bipartite.example

## [1] 0.4

What is the interpretation of the density?

Degree Centrality

For a bipartite graph there are two degree distributions:

The distribution of ties in the first mode
The distribution of ties in the second mode

We can calculate the degree centrality scores for each node in each corresponding vertex set by taking the row sum for N nodes in the first mode and taking the column sum for M nodes in the second mode. We can do so using the rowSums() and colSums() functions, recspectively.

# raw scores for actors
actor.deg <- rowSums( bipartite.example )
actor.deg

## A B C D E F 
## 2 1 2 4 1 2

# raw scores for events
event.deg <- colSums( bipartite.example )
event.deg

## 1 2 3 4 5 
## 3 3 3 1 2

How should we interpret the centrality scores for each node set? Well, it is a bit difficult when just looking at it here. So, we can calculate a summary statistic, such as the mean, to evaluate the distribution of centrality scores for each node set.

Mean Degree Centrality

As before, we could examine the central tendency by examining the mean degree for each node/vertex set. We take the sum of the edges, \(L\) and:

for the first node set we divide by \(\frac{L}{N}\), the number of nodes in that set.
for the second node set we divide by \(\frac{L}{M}\), the number of nodes in that set.

# mean degree for actors
mean.actor.deg <- L / N

# mean degree for events
mean.event.deg <- L / M

# an alternative is to just use the mean() function with the degree data
mean( actor.deg )

## [1] 2

mean( event.deg )

## [1] 2.4

How should we interpret the mean centrality score for each node set? The mean for the actor node set indicates that, on average, each node has 2 ties. The mean for the event node set indicates that, on average, each event has 2.4 ties.

Standardized Degree Centrality

Degree centrality scores for each node/vertex set not only reflects each node’s connectivity to nodes in the other set, but also depend on the size of that set. As a result, larger networks will have a higher maximum possible degree centrality value. Solution?

Standardize!!!

As we saw for unipartite graphs, we can adjust the raw degree centrality scores by taking into account the size of the graph. In a bipartite graph, we can standardize, or normalize, by dividing the raw centrality scores by the number of nodes in the opposite vertex set. That is, for the centrality scores in the first mode we divide by M and for the centrality scores in the second mode we divide by N.

# standardized score for actors
actor.deg / M

##   A   B   C   D   E   F 
## 0.4 0.2 0.4 0.8 0.2 0.4

# standardized score for events
event.deg / N

##         1         2         3         4         5 
## 0.5000000 0.5000000 0.5000000 0.1666667 0.3333333

In networks with lots of nodes, this information might be useful for visualizing differences in nodes degree centrality. Let’s create a plot with the standardized scores where larger degree centrality influences the size of the nodes. As we did above, we will want to create a single object that has these sizes. We can do this using the c() function.

# define the standardized scores for actors
actor.size <- actor.deg / M

# define the standardized scores for events
event.size <- event.deg / N

# combine these to use in the plot
v.size <- c( actor.size, event.size )

# set the seed to reproduce the plot layout
set.seed( 605 )

# execute the plot
gplot(
  bipartite.example.net,                                  # our network to plot
  label = network.vertex.names( bipartite.example.net ),  # the labels we want
  gmode = "twomode",                                      # indicate it is two modes
  usearrows = FALSE,                                      # turn off the arrowheads
  label.cex=1.2,                                          # size the labels
  main="Bipartite Graph of Example Graph",                # add a title
  vertex.col = node.col,                                  # add the colors
  
  # here is the addition to what we had above:
  vertex.cex = v.size + 0.5                               # set the size (add 0.5 so it is not too small) 
)

Dyadic Clustering

The density tells us about the overall level of ties between the node/vertex sets in the graph and degree centrality tells us about how many edges are incident on a node in each node/vertex set. What about the overlap in ties? In other words, do nodes in N tend to share nodes in M? This is the notion of clustering in a graph.

In a bipartite graph, there are two interesting structures:

3-paths (sometimes called \(L_3\)) are a path of distance 3.
Cycles (sometimes called \(C_4\)) are a closed 3-path.

Cycles in a graph create multiple ties between vertices in both modes. The ratio of cycles to 3-paths in a graph is proportional to the level of dyadic clustering (sometimes called reinforcement). A value of 1 indicates that every 3-path is closed (i.e., is embedded in a cycle). Networks with values at or close to 1 will have considerable redundancy in ties.

We can calculate the dyadic clustering coefficient for the example network above by using the reinforcement_tm() function in the tnet() package. If you have not already installed the tnet() package, do so using the install.packages("tnet") command.

# note that tnet conflicts with some of the functionality of the sna package
library( tnet )

# take a look at the reinforcement_tm() function
?reinforcement_tm

# first, we need to coerce our network to be of class "tnet" using as.tnet() so that the reinforcement() function will work
?as.tnet

bipartite.example.tnet <- as.tnet(   # coerce it to a tnet
  as.matrix(                         # coerce it to a matrix
    bipartite.example.net            # the network object to coerce
    ) 
  )

# now, calculate the dyadic clustering coefficient
reinforcement_tm( bipartite.example.tnet )

## [1] 0.3076923

What is the interpretation of the dyadic clustering coefficient in this example?

Structural Properties of Young & Ready’s (2015) Police Officer Network

Now, let’s work with a real example. As discussed in the Bipartite Graphs & Two-Mode Networks lecture, Young & Ready (2015) examined how police officers develop cognitive frames about the usefulness of body-worn cameras. They argued that police officers views of body-worn cameras influence whether they use their cameras in incidents and that these views partly result from sharing incidents with other officers where they exchanged views about the legitimacy of body-worn cameras.

The adjacency matrix is available in the SNA Textbook data folder. Let’s import the the matrix, create a network, assign an attribute, and plot it. Then, we will work through the structural properties.

# set the location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-officer-events-adj.csv"

# read in the .csv file
camMat <- as.matrix(
  read.csv( 
    loc,
    as.is = TRUE,
    header = TRUE,
    row.names = 1 
    )
  )

# identify the number of police officers
N <- dim( camMat )[1]

# identify the number of incidents
M <- dim( camMat )[2]

Now, we want to coerce this to an object of class network. To do so, we need to reload the network library. This is because when we loaded the tnet library above, it hid some of the functionality of the network package do to conflicts in the coding of functions.

# load the network package
library( network )

# create the network object
OfficerEventsNet <- as.network( 
  camMat,
  bipartite = N
)

# take a look
OfficerEventsNet

##  Network attributes:
##   vertices = 234 
##   directed = FALSE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = 81 
##   total edges= 346 
##     missing edges= 0 
##     non-missing edges= 346 
## 
##  Vertex attribute names: 
##     vertex.names 
## 
## No edge attributes

Plotting the Network

Now, let’s create a plot of the network. To aid in visualization, we need to create a variable indicating whether the officer was in the treatment (i.e. received a body-cam) or control group. If you look at the network.vertex.names( OfficerEventsNet ) object, you will see that the officer ids start with “C” or “T”. The “C” prefix indicates a control officer and a “T” prefix indicates a treatment officer. The first 44 ids are for control officers and the subsequent 37 ids are for treatment officers. The remaining 153 correspond to events. We can use this info to create a variable that indicates treatment status.

We will use this information to create an attribute. To do so, we will use the rep() function which repeats a sequence of numbers of characters.

# create a variable using the information in the ids
status <- c( 
  rep( "Control", 44 ),   # repeat "Control" 44 times because there are 44 control officers
  rep( "Treatment", 37 ), # repeat "Treatment" 37 times because there are 37 control officers
  rep( "Incident", 153 )     # repeat "Incident" 153 times because there are 153 incident
  )

# create colors for the plot
vcol <- status
vcol[ status == "Control" ]   <- "green" # make controls green
vcol[ status == "Treatment" ] <- "red"   # make treatment group red
vcol[ status == "Incident" ]  <- "white" # make events white

# create the shapes
vsides <- rep( 0, length( status ) )
vsides[ status == "Control" ]   <- 3  # make controls triangles
vsides[ status == "Treatment" ] <- 4  # make treatment group squares
vsides[ status == "Incident" ]  <- 50 # make events circles

# create the node sizes
nsize <- c(
  rep( 2,   N ), # sizes for officers
  rep( 1.2, M )  # size for the events
  )

Now, we want to plot this using the gplot() function. To do so, we need to reload the sna library. This is because when we loaded the tnet library above, it hid some of the functionality of the sna package do to conflicts in the coding of functions.

# set the seed for the plot
set.seed( 605 )

# plot it
gplot(
  OfficerEventsNet,
  gmode="twomode",
  usearrows=FALSE,
  displayisolates=FALSE,
    vertex.col=vcol,
    vertex.sides=vsides,
    edge.col="grey60",
    edge.lwd=1.2,
    vertex.cex = nsize,
  main = "Plot of Officers and Events"
)

Structural Properties of the Network

Now, let’s take a look at several properties of the graph: density, degree centrality, and dyadic clustering.

Density

Remember, the density is the total number of edges divided by N x M.

# identify the number of edges in the graph
L <- sum( camMat )

L / ( N * M )

## [1] 0.02791899

The density for the this network is 0.03. What is the interpretation of the density for this network?

Degree

Now, let’s take a look at the degree distributions.

# raw scores for officers
officer.deg <- rowSums( camMat )

# raw scores for incidents
incident.deg <- colSums( camMat )

# mean degree for officers
mean.officer.deg <- L / N
mean.officer.deg

## [1] 4.271605

# mean degree for incidents
mean.incident.deg <- L / M
mean.incident.deg

## [1] 2.261438

What is the interpretation of the mean degree for each node set?

Dyadic Clustering

Lastly, we can calculate the dyadic clustering coefficient for this network by using the reinforcement_tm() function in the tnet() package.

# load tnet 
library( tnet )

# coerce the network object to work with tnet
officer.event.tnet <- as.tnet( as.matrix( OfficerEventsNet ) )

# get the reinforcement score
reinforcement_tm( officer.event.tnet )

## [1] 0.219666

What is the interpretation of the dyadic clustering coefficient? What does the value of 0.22 tell us about the structure of the network?

Wrapping up…

Bipartite graphs are a common network structure used in many network analysis projects. In this lab, we examined various properties of bipartite graphs (e.g. density, degree centrality, dyadic clustering) using equations from the lecture. We also reviewed the tnet package. The tnet package is a collection of functions for working with two-mode and weighted networks written by Tore Opsahl.

Questions?

Please report any needed corrections to the Issues page. Thanks!

Back to SAND main page

Lab 8 - Bipartite Graphs & Two-Mode Networks in R

CRJ 605 Statistical Analysis of Networks

Bipartite Graphs/Two-Mode Networks

Using the `network` package

Using `gplot()` in the `sna` package

Structural Properties of Bipartite Graphs/Two-Mode Networks

Density

Degree Centrality

Mean Degree Centrality

Standardized Degree Centrality

Dyadic Clustering

Structural Properties of Young & Ready’s (2015) Police Officer Network

Plotting the Network

Structural Properties of the Network

Density

Degree

Dyadic Clustering

Wrapping up…

Questions?

Please report any needed corrections to the Issues page. Thanks!

Lab 8 - Bipartite Graphs & Two-Mode Networks in R

CRJ 605 Statistical Analysis of Networks

Bipartite Graphs/Two-Mode Networks

Using the network package

Using gplot() in the sna package

Structural Properties of Bipartite Graphs/Two-Mode Networks

Density

Degree Centrality

Mean Degree Centrality

Standardized Degree Centrality

Dyadic Clustering

Structural Properties of Young & Ready’s (2015) Police Officer Network

Plotting the Network

Structural Properties of the Network

Density

Degree

Dyadic Clustering

Wrapping up…

Questions?

Please report any needed corrections to the Issues page. Thanks!

Using the `network` package

Using `gplot()` in the `sna` package