How do we know whether a node is important in a network? As was
discussed in the lecture on Degree Centrality, one of the
most popular concepts in network analysis is centrality. That
is, important nodes are those who are central. Also, we can compare
networks by examining how they differ (or are similar) based on the
distribution of centrality scores. In this lab, we will examine how to
calculate degree centrality and centralization scores in R using the
degree()
and centralization()
functions in the
sna
package.
Why are you learning this? Centrality scores are a common metric used in many network analysis projects. Being able to calculate these scores is an important tool for your skill set as a social network analyst!
In an undirected binary graph, actor degree centrality
measures the extent to which a node connects to all other nodes in a
network. In other words, the number of edges incident with a node. This
is symbolized as: \(d(n_i)\). For an
undirected binary graph, the degree \(d(n_i)\) is the row or column sum. If we
have an object of class(matrix)
in the workspace, we can
use the colSums()
and/or rowSums()
functions
to return this information.
First, let’s set up our graph from the degree centrality lecture:
# First, clear the workspace
rm( list = ls() )
# Then, build an object
u.mat <- rbind(
c( 0,1,0,0,0 ),
c( 1,0,1,0,0 ),
c( 0,1,0,1,1 ),
c( 0,0,1,0,1 ),
c( 0,0,1,1,0 ) )
# Assign the names to the object
rownames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
colnames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
# Now, plot the graph (remember to load the sna package)
# The quitely= argument tells R not to print out the info on the package
library( sna, quietly=TRUE )
# Let's set up the coordinates to force the nodes
# to be in the same position throughout the lab
set.seed( 605 )
coords <- gplot( u.mat )
# Plot the network
gplot(
u.mat,
gmode="graph",
arrowhead.cex=0.5,
edge.col="grey40",
label=rownames( u.mat ),
label.col="blue",
label.cex=1.2,
coord = coords
)
Since the graph is undirected, we can print the degree centrality for
each node as a vector using the colSums()
or
rowSums()
functions:
## Jen Tom Bob Leaf Jim
## 1 2 3 2 2
## Jen Tom Bob Leaf Jim
## 1 2 3 2 2
Then, we can use that information in the plot by passing the degree
object to the vertex.cex=
argument. This will make nodes
with higher degree larger.
gplot(
u.mat,
gmode="graph",
arrowhead.cex=0.5,
edge.col="grey40",
label=rownames( u.mat ),
label.col="blue",
label.cex=1.2,
vertex.cex = deg.u.mat, #HERE: we added the object to size the plot
coord = coords
)
Another approach is to shade the nodes. Rather than just the size, we
might want to have nodes with larger degree to be darker (or lighter) to
better visualize differences in degree. To do this, we could use the
RColorBrewer
package to shade the nodes.
# install.packages( "RColorBrewer" )
library( RColorBrewer, quietly=TRUE )
# use display.brewer.all() to see the pallettes.
# Let's use the Blues pallette.
col.deg <- brewer.pal( length( unique( deg.u.mat ) ), "Blues")[deg.u.mat]
# In this plot, what do darker shades mean?
gplot(
u.mat,
gmode="graph",
arrowhead.cex=0.5,
edge.col="grey40",
label=rownames( u.mat ),
label.col="blue",
label.cex=1.2,
vertex.cex = deg.u.mat,
vertex.col = col.deg,
coord = coords
)
Actor degree centrality not only reflects each node’s connectivity to other nodes but also depends on the size of the network, g. As a result, larger networks will have a higher maximum possible degree centrality values. This makes comparison across networks problematic. The solution is to take into account the number of nodes and the maximum possible nodes to which i could be connected, g-1.
Let’s calculate the standardized centrality scores for our undirected graph:
# unstandardized or raw centrality
deg.u.mat <- colSums( u.mat )
# to calculate g-1, we need to know the number of nodes in the graph
# this is the first dimension of the matrix
g <- dim( u.mat )[1]
# now, divide by g-1
s.deg.u.mat <- deg.u.mat / ( g-1 )
deg.u.mat
## Jen Tom Bob Leaf Jim
## 1 2 3 2 2
## Jen Tom Bob Leaf Jim
## 0.25 0.50 0.75 0.50 0.50
We can also examine the average degree of the graph using
\[\frac{\sum_{i=1}^g d(n_i)}{g}\] or
\[\frac{2L}{g}\]
where L is the number of edges in the graph:
## [1] 2
## [1] 2
We can also calculate how centralized the graph itself is. Group degree centralization measures the extent to which the actors in a social network differ from one another in their individual degree centralities. Following Wasserman & Faust (1994), an index of group degree centralization can be calculated as:
\[C_D = \frac{\sum\limits_{i=1}^g [C_D(n^*) - C_D(n_i)]}{[(g-1)(g-2)]}\]
for undirected graphs where \(C_D(n^*)\) is the maximum degree in the
graph. We can write out the components of the equation using the
max()
function:
# In separate pieces
deviations <- max( deg.u.mat ) - deg.u.mat
sum.deviations <- sum( deviations )
numerator <- sum.deviations
denominator <- ( g-1 )*( g-2 )
group.deg.cent <- numerator/denominator
group.deg.cent
## [1] 0.4166667
# Or, as a single equation.
group.deg.cent <-( sum( ( ( max( deg.u.mat ) - deg.u.mat ) ) ) ) / ( ( g -1 )*( g - 2 ) )
group.deg.cent
## [1] 0.4166667
In a directed binary graph, actor degree centrality can be broken down into indegree and outdegree centrality. Indegree, \(C_I(n_i)\), measures the number of ties that i receives. For the sociomatrix \(Xij\), the indegree for i is the column sum. Outdegree, \(C_O(n_i)\), measures the number of ties that i sends. For the sociomatrix \(Xij\), the outdegree for i is the row sum.
As before, if we have an object of class(matrix)
in the
workspace, we can use the rowSums()
and
colSums()
functions. However, the colSums()
function will return the indegree centrality for i and
the rowSums()
function will return the outdegree
centrality for i.
First, let’s set up our directed graph from the degree centrality lecture:
# First, clear the workspace
rm( list = ls() )
# Then, build the object
d.mat <- rbind(
c( 0,1,0,0,0 ),
c( 0,0,1,0,0 ),
c( 0,0,0,1,1 ),
c( 0,0,1,0,1 ),
c( 0,0,1,1,0 ) )
rownames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
colnames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
# Let's set up the coordinates to force the nodes
# to be in the same position throughout the lab
set.seed( 605 )
# remove the old object named coords
rm( coords )
# set the new coordinates
coords <- gplot( d.mat )
# Now, plot the graph (remember to load the sna package)
gplot(
d.mat,
gmode="digraph",
arrowhead.cex=0.5,
edge.col="grey40",
label=rownames( d.mat ),
label.col="red",
label.cex=1.2,
coord = coords
)
# Let's look at the different centrality scores
# by assigning them to different objects
ideg.d.mat <- colSums( d.mat )
odeg.d.mat <- rowSums( d.mat )
# print them out to examine them
ideg.d.mat
## Jen Tom Bob Leaf Jim
## 0 1 3 2 2
## Jen Tom Bob Leaf Jim
## 1 1 2 2 2
Now, let’s work this information in the plot. We will want to
partition the plotting window using the par()
function to
show two plots and we want to change the margins using the
mar=
argument. Use ?par
and/or
?mar
to view the help on how these work.
par(
mfrow=c( 1, 2 ),
mar=c( 0.1, 0.5, 4, 0.1)
)
gplot(
d.mat,
gmode="digraph",
arrowhead.cex=0.5,
edge.col="grey40",
label=rownames( d.mat ),
label.col="red",
label.cex=0.8,
label.pos=1,
vertex.cex = ideg.d.mat+0.2,
main="Nodes sized by Indegree",
coord = coords
)
gplot(
d.mat,
gmode="digraph",
arrowhead.cex=0.5,
edge.col="grey40",
label=rownames( d.mat ),
label.col="red",
label.cex=0.8,
label.pos=1,
vertex.cex = odeg.d.mat,
main="Nodes sized by Outdegree",
coord = coords
)
Note the difference. Which nodes are more central in terms of indegree? What about outdegree?
Again, let’s use the RColorBrewer
package to help with
shading.
# create the objects
col.ideg <- brewer.pal( length( unique( ideg.d.mat ) ), "Greens")[ideg.d.mat]
col.odeg <- brewer.pal( length( unique( odeg.d.mat ) ), "Oranges")[odeg.d.mat]
par(
mfrow=c( 1, 2 ),
mar=c( 0.1, 0.5, 4, 0.1)
)
gplot(
d.mat,
gmode = "digraph",
arrowhead.cex = 0.5,
edge.col = "grey40",
label = rownames( d.mat ),
label.col = "red",
label.cex = 0.8,
label.pos = 1,
vertex.cex = ideg.d.mat+0.2,
vertex.col = col.ideg,
main = "Nodes sized &\n shaded by Indegree",
coord = coords
)
gplot(
d.mat,
gmode = "digraph",
arrowhead.cex = 0.5,
edge.col = "grey40",
label = rownames( d.mat ),
label.col = "red",
label.cex = 0.8,
label.pos = 1,
vertex.cex = odeg.d.mat,
vertex.col = col.odeg,
main = "Nodes sized &\n shaded by Outdegree",
coord = coords
)
Let’s calculate the standardized centrality scores for our directed graph:
# unstandardized or raw centrality
ideg.d.mat <- colSums( d.mat )
odeg.d.mat <- rowSums( d.mat )
# to calculate g-1, we need to know the number of nodes in the graph
# this is the first dimension of the matrix
g <- dim( d.mat )[1]
# now, divide by g-1
s.i.deg.u.mat <- ideg.d.mat / ( g-1 )
s.o.deg.u.mat <- odeg.d.mat / ( g-1 )
We can also examine the average degree of the graph using \(\frac{\sum_{i=1}^g C_I(n_i)}{g} = \frac{\sum_{i=1}^g C_O(n_i)}{g}\) or \(\frac{L}{g}\), where L is the number of edges in the graph:
mean.i.deg <- sum( ideg.d.mat ) / dim( d.mat )[1]
mean.o.deg <- sum( odeg.d.mat ) / dim( d.mat )[1]
mean.i.deg
## [1] 1.6
## [1] 1.6
## [1] 1.6
## [1] 1.6
Again, following Wasserman & Faust (1994), an index of group indegree/outdegree centralization can be calculated as:
\[ C_D = \frac{\sum\limits_{i=1}^g [C_D(n^*) - C_D(n_i)]}{[(g-1)^2]} \]
for undirected graphs where \(C_D(n^*)\) is the maximum
indegree/outdegree in the graph. We can write out the components of the
equation using the max()
function:
# In separate pieces
deviations <- max( ideg.d.mat ) - ideg.d.mat
sum.deviations <- sum( deviations )
numerator <- sum.deviations
denominator <- ( g-1 )*( g-1 )
group.i.deg.cent <- numerator/denominator
group.i.deg.cent
## [1] 0.4375
deviations <- max( odeg.d.mat ) - odeg.d.mat
sum.deviations <- sum( deviations )
numerator <- sum.deviations
denominator <- ( g-1 )*( g-1 )
group.o.deg.cent <- numerator/denominator
group.o.deg.cent
## [1] 0.125
# Or, as a single equation
group.ideg.cent <-( sum( ( ( max( ideg.d.mat ) - ideg.d.mat ) ) ) ) / ( ( g -1 )*( g - 1) )
group.odeg.cent <-( sum( ( ( max( odeg.d.mat ) - odeg.d.mat ) ) ) ) / ( ( g -1 )*( g - 1 ) )
group.ideg.cent
## [1] 0.4375
## [1] 0.125
What do the centralization scores tell us, conceptually?
sna
packageDid that feel tedious? If no, go back and do it again :)
As you probably have guessed, there are functions in the
sna
package that calculate degree centrality and graph
centralization! In the sna
package, these are the
degree()
and centralization()
functions,
respectively. Let’s take a look at how these work.
# load the library
library( sna )
# Build the objects to work with
rm( list = ls() )
u.mat <- rbind(
c( 0,1,0,0,0 ),
c( 1,0,1,0,0 ),
c( 0,1,0,1,1 ),
c( 0,0,1,0,1 ),
c( 0,0,1,1,0 )
)
rownames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
colnames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
d.mat <- rbind(
c( 0,1,0,0,0 ),
c( 0,0,1,0,0 ),
c( 0,0,0,1,1 ),
c( 0,0,1,0,1 ),
c( 0,0,1,1,0 )
)
rownames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
colnames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
# First, let's look at degree
?degree
# degree for undirected graph
deg <- degree( u.mat, gmode="graph" )
# indegree for directed graph
ideg <- degree( d.mat, gmode="digraph", cmode="indegree" )
# outdegree for directed graph
odeg <- degree( d.mat, gmode="digraph", cmode="outdegree" )
# returns the combined centrality for each node
deg.d <- degree( d.mat, gmode="digraph" )
# Now, let's look at centralization
?centralization
# degree centralization for undirected graph
cent.u <- centralization( u.mat, degree, mode="graph" )
# indegree centralization for directed graph.
i.cent.d <- centralization( d.mat, degree, mode="digraph", cmode="indegree" )
# outdegree centralization for directed graph.
o.cent.d <- centralization( d.mat, degree, mode="digraph", cmode="outdegree" )
Now, wasn’t that easier?
The Prison Inmate Networks Study (PINS) examines the social networks of prison inmates in a state correctional institution. The study was unique in that it was the first in nearly a century to collection sociometric data in a prison. The researchers collected data on several types of networks. There are two we want to look at here:
The get along with network was created by asking individuals whom they “get along with” on the unit. We can think of this as “friends” in a prison setting. (People don’t really have “friends” in prison, but there are people they “get along with”)
The power and influence network was created by asking individuals whom they believed was “powerful and influential” on the unit.
Let’s examine the degree centrality scores for both of these networks. These data are available in the SNA Textbook data folder.
For the get along with network, individuals could have asymmetric nominations. That is, i could nominate j and j didn’t necessarily nominate i. But, we are going to symmetrize the network by only taking ties for which both i and j indicated that the get along with the other person. This will give us an undirected network.
# set the location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-PINS-getalong-w1-adj.csv"
# read in the .csv file
gaMat <- as.matrix(
read.csv(
loc,
as.is = TRUE,
header = TRUE,
row.names = 1
)
)
# use the symmetrize() function to create an undirected matrix
gaMatU <- symmetrize( gaMat, rule = "strong" )
# create the network object
gaNetU <- as.network( gaMatU, directed = FALSE )
Now we have created an undirected network where ties represent “get
along with” nominations from both individuals. Let’s calculate the
degree centrality scores, the centralization score, and then use the
degree centrality scores to size our nodes in a plot using the
vertex.cex()
argument in the gplot()
function.
Now lets build the objects and the plot:
# get the degrees.
gaNetDeg <- degree( gaNetU, gmode="graph" )
# now the centralization score.
gaNetDegCent <- centralization( gaNetU, degree, mode="graph" )
# Now, take a look at the plot.
gplot(
gaNetU,
gmode = "graph",
edge.col="grey40",
vertex.col="#3250a8",
vertex.cex = gaNetDeg,
coord = coords,
main = "PINS Get\n Along With Network (Undirected)",
sub = "node sized by degree centrality"
)
Woops! Let’s try that again AFTER rescaling the degree. We can use
the rescale()
function to do this.
# define the rescale function
rescale <- function( nchar, low, high ){
min_d <- min( nchar )
max_d <- max( nchar )
rscl <- ( ( high - low )*( nchar - min_d ) ) / ( max_d - min_d ) + low
rscl
}
Now we can plot it after adding in the rescale()
function to gplot()
:
# Now, take a look at the plot.
gplot(
gaNetU,
gmode = "graph",
edge.col="grey40",
vertex.col="#3250a8",
vertex.cex = rescale( gaNetDeg, 0.2, 4 ),
coord = coords,
main = "PINS Get\n Along With Network (Undirected)",
sub = "node sized by degree centrality"
)
Almost there! Let’s drop the isolates to help with the size:
# Now, take a look at the plot.
gplot(
gaNetU,
gmode = "graph",
edge.col="grey40",
vertex.col="#3250a8",
vertex.cex = rescale( gaNetDeg, 0.2, 4 ),
displayisolates = FALSE,
coord = coords,
main = "PINS Get\n Along With Network (Undirected)",
sub = "node sized by degree centrality"
)
A few questions:
For the power and influence network, individuals could have asymmetric nominations. That is, i could nominate j and j didn’t necessarily nominate i. We will keep this asymmetry so that we can treat the network as directed.
# set the location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-PINS-power-w1-adj.csv"
# read in the .csv file
piMat <- as.matrix(
read.csv(
loc,
as.is = TRUE,
header = TRUE,
row.names = 1
)
)
# create the network object
piNetD <- as.network( piMat, directed = TRUE )
Now we have created an undirected network where ties represent “get
along with” nominations from both individuals. Let’s calculate the
degree centrality scores, the centralization score, and then use the
degree centrality scores to size our nodes in a plot using the
vertex.cex()
argument in the gplot()
function.
# get the degrees.
piNetiDeg <- degree( piNetD, gmode="digraph", cmode = "indegree" )
piNetoDeg <- degree( piNetD, gmode="digraph", cmode = "outdegree" )
# now the centralization scores.
piNetiDegCent <- centralization( piNetD, degree, mode="digraph", cmode = "indegree" )
piNetoDegCent <- centralization( piNetD, degree, mode="digraph", cmode = "outdegree" )
par( mfrow=c( 1, 2 ) )
gplot(
piNetD,
gmode = "digraph",
edge.col="grey40",
vertex.col="#693859",
vertex.cex = rescale( piNetiDeg, 0.2, 4 ),
displayisolates = FALSE,
coord = coords2,
main = "PINS Power/Influnece\n Network (Directed)",
sub = "node sized by indegree centrality"
)
gplot(
piNetD,
gmode = "digraph",
edge.col="grey40",
vertex.col="#2b868c",
vertex.cex = rescale( piNetoDeg, 0.2, 4 ),
displayisolates = FALSE,
coord = coords2,
main = "PINS Power/Influnece\n Network (Directed)",
sub = "node sized by outdegree centrality"
)
A few questions:
Centrality scores are a common metric used in many network analysis
projects. In this lab, we examined how to calculate degree
centrality and centralization scores in R using the
degree()
and centralization()
functions in the
sna
package.