How do we know whether a node is important in a network? As was discussed in the lecture on Degree Centrality, one of the most popular concepts in network analysis is centrality. That is, important nodes are those who are central. Also, we can compare networks by examining how they differ (or are similar) based on the distribution of centrality scores. In this lab, we will examine how to calculate degree centrality and centralization scores in R using the degree() and centralization() functions in the sna package.

Why are you learning this? Centrality scores are a common metric used in many network analysis projects. Being able to calculate these scores is an important tool for your skill set as a social network analyst!



Degree Centrality (Undirected Binary Graphs)

In an undirected binary graph, actor degree centrality measures the extent to which a node connects to all other nodes in a network. In other words, the number of edges incident with a node. This is symbolized as: \(d(n_i)\). For an undirected binary graph, the degree \(d(n_i)\) is the row or column sum. If we have an object of class(matrix) in the workspace, we can use the colSums() and/or rowSums() functions to return this information.

First, let’s set up our graph from the degree centrality lecture:

# First, clear the workspace
rm( list = ls() )

# Then, build an object
u.mat <- rbind(
  c( 0,1,0,0,0 ),
  c( 1,0,1,0,0 ),
  c( 0,1,0,1,1 ),
  c( 0,0,1,0,1 ),
  c( 0,0,1,1,0 ) )

# Assign the names to the object
rownames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
colnames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )

# Now, plot the graph (remember to load the sna package)
# The quitely= argument tells R not to print out the info on the package
library( sna, quietly=TRUE ) 

# Let's set up the coordinates to force the nodes
# to be in the same position throughout the lab
set.seed( 605 )
coords <- gplot( u.mat )

# Plot the network
gplot( 
  u.mat, 
  gmode="graph", 
  arrowhead.cex=0.5, 
  edge.col="grey40", 
  label=rownames( u.mat ),
  label.col="blue",
  label.cex=1.2,
  coord = coords
  )

Since the graph is undirected, we can print the degree centrality for each node as a vector using the colSums() or rowSums() functions:

colSums( u.mat )
##  Jen  Tom  Bob Leaf  Jim 
##    1    2    3    2    2
rowSums( u.mat )
##  Jen  Tom  Bob Leaf  Jim 
##    1    2    3    2    2
# We could also assign these to an object
deg.u.mat <- colSums( u.mat )


Then, we can use that information in the plot by passing the degree object to the vertex.cex= argument. This will make nodes with higher degree larger.

gplot(
  u.mat,
  gmode="graph", 
  arrowhead.cex=0.5, 
  edge.col="grey40", 
  label=rownames( u.mat ),
  label.col="blue",
  label.cex=1.2,
  vertex.cex = deg.u.mat, #HERE: we added the object to size the plot
  coord = coords
  )


Another approach is to shade the nodes. Rather than just the size, we might want to have nodes with larger degree to be darker (or lighter) to better visualize differences in degree. To do this, we could use the RColorBrewer package to shade the nodes.


# install.packages( "RColorBrewer" )
library( RColorBrewer, quietly=TRUE )

# use display.brewer.all() to see the pallettes.

# Let's use the Blues pallette.
col.deg  <- brewer.pal( length( unique( deg.u.mat ) ), "Blues")[deg.u.mat]

# In this plot, what do darker shades mean?
gplot(
  u.mat, 
  gmode="graph", 
  arrowhead.cex=0.5, 
  edge.col="grey40", 
  label=rownames( u.mat ),
  label.col="blue",
  label.cex=1.2,
  vertex.cex = deg.u.mat,
  vertex.col = col.deg,
  coord = coords
  )

Standardized degree centrality, mean degree, and centralization

Actor degree centrality not only reflects each node’s connectivity to other nodes but also depends on the size of the network, g. As a result, larger networks will have a higher maximum possible degree centrality values. This makes comparison across networks problematic. The solution is to take into account the number of nodes and the maximum possible nodes to which i could be connected, g-1.

Let’s calculate the standardized centrality scores for our undirected graph:

# unstandardized or raw centrality
deg.u.mat <- colSums( u.mat )

# to calculate g-1, we need to know the number of nodes in the graph 
# this is the first dimension of the matrix
g <- dim( u.mat )[1]

# now, divide by g-1
s.deg.u.mat <- deg.u.mat / ( g-1 )

deg.u.mat
##  Jen  Tom  Bob Leaf  Jim 
##    1    2    3    2    2
s.deg.u.mat
##  Jen  Tom  Bob Leaf  Jim 
## 0.25 0.50 0.75 0.50 0.50

We can also examine the average degree of the graph using

\[\frac{\sum_{i=1}^g d(n_i)}{g}\] or

\[\frac{2L}{g}\]

where L is the number of edges in the graph:

mean.deg <- sum( deg.u.mat ) / dim( u.mat )[1] 

mean.deg
## [1] 2
# Note that we can also use the mean() function to return this information:
mean( deg.u.mat )
## [1] 2


We can also calculate how centralized the graph itself is. Group degree centralization measures the extent to which the actors in a social network differ from one another in their individual degree centralities. Following Wasserman & Faust (1994), an index of group degree centralization can be calculated as:

\[C_D = \frac{\sum\limits_{i=1}^g [C_D(n^*) - C_D(n_i)]}{[(g-1)(g-2)]}\]

for undirected graphs where \(C_D(n^*)\) is the maximum degree in the graph. We can write out the components of the equation using the max() function:

# In separate pieces
deviations <- max( deg.u.mat ) - deg.u.mat
sum.deviations <- sum( deviations )
numerator <- sum.deviations
denominator <- ( g-1 )*( g-2 )
group.deg.cent <- numerator/denominator

group.deg.cent
## [1] 0.4166667
# Or, as a single equation.
group.deg.cent <-( sum( ( ( max( deg.u.mat ) - deg.u.mat ) ) ) ) / ( ( g -1 )*( g - 2 ) )

group.deg.cent
## [1] 0.4166667


Degree Centrality (Directed Binary Graphs)

In a directed binary graph, actor degree centrality can be broken down into indegree and outdegree centrality. Indegree, \(C_I(n_i)\), measures the number of ties that i receives. For the sociomatrix \(Xij\), the indegree for i is the column sum. Outdegree, \(C_O(n_i)\), measures the number of ties that i sends. For the sociomatrix \(Xij\), the outdegree for i is the row sum.

As before, if we have an object of class(matrix) in the workspace, we can use the rowSums() and colSums() functions. However, the colSums() function will return the indegree centrality for i and the rowSums() function will return the outdegree centrality for i.

First, let’s set up our directed graph from the degree centrality lecture:

# First, clear the workspace
rm( list = ls() )

# Then, build the object
d.mat <- rbind(
  c( 0,1,0,0,0 ),
  c( 0,0,1,0,0 ),
  c( 0,0,0,1,1 ),
  c( 0,0,1,0,1 ),
  c( 0,0,1,1,0 ) )
rownames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )
colnames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )

# Let's set up the coordinates to force the nodes
# to be in the same position throughout the lab
set.seed( 605 )

# remove the old object named coords
rm( coords )

# set the new coordinates
coords <- gplot( d.mat )

# Now, plot the graph (remember to load the sna package)
gplot(
  d.mat, 
  gmode="digraph",
  arrowhead.cex=0.5, 
  edge.col="grey40", 
  label=rownames( d.mat ),
  label.col="red",
  label.cex=1.2,
  coord = coords
  )

# Let's look at the different centrality scores 
# by assigning them to different objects
ideg.d.mat <- colSums( d.mat )
odeg.d.mat <- rowSums( d.mat )

# print them out to examine them
ideg.d.mat
##  Jen  Tom  Bob Leaf  Jim 
##    0    1    3    2    2
odeg.d.mat
##  Jen  Tom  Bob Leaf  Jim 
##    1    1    2    2    2


Now, let’s work this information in the plot. We will want to partition the plotting window using the par() function to show two plots and we want to change the margins using the mar= argument. Use ?par and/or ?mar to view the help on how these work.

par( 
  mfrow=c( 1, 2 ), 
  mar=c( 0.1, 0.5, 4, 0.1) 
  )

gplot(
  d.mat, 
  gmode="digraph", 
  arrowhead.cex=0.5, 
  edge.col="grey40", 
  label=rownames( d.mat ),
  label.col="red",
  label.cex=0.8,
  label.pos=1,
  vertex.cex = ideg.d.mat+0.2,
  main="Nodes sized by Indegree",
  coord = coords
  )

gplot(
  d.mat, 
  gmode="digraph", 
  arrowhead.cex=0.5, 
  edge.col="grey40", 
  label=rownames( d.mat ),
  label.col="red",
  label.cex=0.8,
  label.pos=1,
  vertex.cex = odeg.d.mat,
  main="Nodes sized by Outdegree",
  coord = coords
  )


Note the difference. Which nodes are more central in terms of indegree? What about outdegree?


Again, let’s use the RColorBrewer package to help with shading.


# create the objects
col.ideg  <- brewer.pal( length( unique( ideg.d.mat ) ), "Greens")[ideg.d.mat]
col.odeg  <- brewer.pal( length( unique( odeg.d.mat ) ), "Oranges")[odeg.d.mat]

par( 
  mfrow=c( 1, 2 ), 
  mar=c( 0.1, 0.5, 4, 0.1) 
  )

gplot(
  d.mat, 
  gmode = "digraph", 
  arrowhead.cex = 0.5, 
  edge.col = "grey40", 
  label = rownames( d.mat ),
  label.col = "red",
  label.cex = 0.8,
  label.pos = 1,
  vertex.cex = ideg.d.mat+0.2,
  vertex.col = col.ideg,
  main = "Nodes sized &\n shaded by Indegree",
  coord = coords
  )

gplot(
  d.mat, 
  gmode = "digraph", 
  arrowhead.cex = 0.5, 
  edge.col = "grey40", 
  label = rownames( d.mat ),
  label.col = "red",
  label.cex = 0.8,
  label.pos = 1,
  vertex.cex = odeg.d.mat,
  vertex.col = col.odeg,
  main = "Nodes sized &\n shaded by Outdegree",
  coord = coords
  )


Standardized degree centrality, mean degree, and centralization

Let’s calculate the standardized centrality scores for our directed graph:

# unstandardized or raw centrality
ideg.d.mat <- colSums( d.mat )
odeg.d.mat <- rowSums( d.mat )

# to calculate g-1, we need to know the number of nodes in the graph
# this is the first dimension of the matrix
g <- dim( d.mat )[1]

# now, divide by g-1
s.i.deg.u.mat <- ideg.d.mat / ( g-1 )
s.o.deg.u.mat <- odeg.d.mat / ( g-1 )


We can also examine the average degree of the graph using \(\frac{\sum_{i=1}^g C_I(n_i)}{g} = \frac{\sum_{i=1}^g C_O(n_i)}{g}\) or \(\frac{L}{g}\), where L is the number of edges in the graph:

mean.i.deg <- sum( ideg.d.mat ) / dim( d.mat )[1] 
mean.o.deg <- sum( odeg.d.mat ) / dim( d.mat )[1] 
mean.i.deg
## [1] 1.6
mean.o.deg
## [1] 1.6
# we could also use the mean() function
mean( ideg.d.mat )
## [1] 1.6
mean( odeg.d.mat )
## [1] 1.6


Again, following Wasserman & Faust (1994), an index of group indegree/outdegree centralization can be calculated as:

\[ C_D = \frac{\sum\limits_{i=1}^g [C_D(n^*) - C_D(n_i)]}{[(g-1)^2]} \]

for undirected graphs where \(C_D(n^*)\) is the maximum indegree/outdegree in the graph. We can write out the components of the equation using the max() function:

# In separate pieces
deviations <- max( ideg.d.mat ) - ideg.d.mat
sum.deviations <- sum( deviations )
numerator <- sum.deviations
denominator <- ( g-1 )*( g-1 )
group.i.deg.cent <- numerator/denominator
group.i.deg.cent
## [1] 0.4375
deviations <- max( odeg.d.mat ) - odeg.d.mat
sum.deviations <- sum( deviations )
numerator <- sum.deviations
denominator <- ( g-1 )*( g-1 )
group.o.deg.cent <- numerator/denominator
group.o.deg.cent
## [1] 0.125
# Or, as a single equation
group.ideg.cent <-( sum( ( ( max( ideg.d.mat ) - ideg.d.mat ) ) ) ) / ( ( g -1 )*( g - 1) )
group.odeg.cent <-( sum( ( ( max( odeg.d.mat ) - odeg.d.mat ) ) ) ) / ( ( g -1 )*( g - 1 ) )
group.ideg.cent
## [1] 0.4375
group.odeg.cent
## [1] 0.125

What do the centralization scores tell us, conceptually?


Degree Centrality using the sna package

Did that feel tedious? If no, go back and do it again :)

As you probably have guessed, there are functions in the sna package that calculate degree centrality and graph centralization! In the sna package, these are the degree() and centralization() functions, respectively. Let’s take a look at how these work.

# load the library
library( sna )

# Build the objects to work with
rm( list = ls() )

u.mat <- rbind( 
  c( 0,1,0,0,0 ),
  c( 1,0,1,0,0 ), 
  c( 0,1,0,1,1 ), 
  c( 0,0,1,0,1 ), 
  c( 0,0,1,1,0 )
  )

rownames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )

colnames( u.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )

d.mat <- rbind(
  c( 0,1,0,0,0 ),
  c( 0,0,1,0,0 ), 
  c( 0,0,0,1,1 ), 
  c( 0,0,1,0,1 ), 
  c( 0,0,1,1,0 )
  )

rownames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )

colnames( d.mat ) <- c( "Jen","Tom","Bob","Leaf","Jim" )


# First, let's look at degree
?degree

# degree for undirected graph
deg <- degree( u.mat, gmode="graph" )

# indegree for directed graph
ideg <- degree( d.mat, gmode="digraph", cmode="indegree" )

# outdegree for directed graph
odeg <- degree( d.mat, gmode="digraph", cmode="outdegree" )

# returns the combined centrality for each node
deg.d <- degree( d.mat, gmode="digraph" )


# Now, let's look at centralization
?centralization

# degree centralization for undirected graph
cent.u <- centralization( u.mat, degree, mode="graph" )

# indegree centralization for directed graph.
i.cent.d <- centralization( d.mat, degree, mode="digraph", cmode="indegree" ) 

# outdegree centralization for directed graph.
o.cent.d <- centralization( d.mat, degree, mode="digraph", cmode="outdegree" )


Now, wasn’t that easier?


Degree Centrality in PINS Get Along With and Power/Influence Networks

The Prison Inmate Networks Study (PINS) examines the social networks of prison inmates in a state correctional institution. The study was unique in that it was the first in nearly a century to collection sociometric data in a prison. The researchers collected data on several types of networks. There are two we want to look at here:

  • The get along with network was created by asking individuals whom they “get along with” on the unit. We can think of this as “friends” in a prison setting. (People don’t really have “friends” in prison, but there are people they “get along with”)

  • The power and influence network was created by asking individuals whom they believed was “powerful and influential” on the unit.

Let’s examine the degree centrality scores for both of these networks. These data are available in the SNA Textbook data folder.


Get Along With Network (Undirected Network)

For the get along with network, individuals could have asymmetric nominations. That is, i could nominate j and j didn’t necessarily nominate i. But, we are going to symmetrize the network by only taking ties for which both i and j indicated that the get along with the other person. This will give us an undirected network.

# set the location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-PINS-getalong-w1-adj.csv"

# read in the .csv file
gaMat <- as.matrix(
  read.csv( 
    loc,
    as.is = TRUE,
    header = TRUE,
    row.names = 1 
    )
  )

# use the symmetrize() function to create an undirected matrix
gaMatU <- symmetrize( gaMat, rule = "strong" )

# create the network object
gaNetU <- as.network( gaMatU, directed = FALSE )


Now we have created an undirected network where ties represent “get along with” nominations from both individuals. Let’s calculate the degree centrality scores, the centralization score, and then use the degree centrality scores to size our nodes in a plot using the vertex.cex() argument in the gplot() function.


# Set the coordinates
set.seed( 605 )
coords <- gplot( gaNetU )

Now lets build the objects and the plot:

# get the degrees.
gaNetDeg <- degree( gaNetU, gmode="graph" )

# now the centralization score.
gaNetDegCent <- centralization( gaNetU, degree, mode="graph" )

# Now, take a look at the plot.
gplot( 
  gaNetU, 
  gmode = "graph",
  edge.col="grey40", 
  vertex.col="#3250a8",
  vertex.cex = gaNetDeg,
  coord = coords,
  main = "PINS Get\n Along With Network (Undirected)",
  sub = "node sized by degree centrality"
  )

Woops! Let’s try that again AFTER rescaling the degree. We can use the rescale() function to do this.

# define the rescale function
rescale <- function( nchar, low, high ){
  min_d <- min( nchar )
  max_d <- max( nchar )
  rscl  <- ( ( high - low )*( nchar - min_d ) ) / ( max_d - min_d ) + low
  rscl
}

Now we can plot it after adding in the rescale() function to gplot():

# Now, take a look at the plot.
gplot( 
  gaNetU, 
  gmode = "graph",
  edge.col="grey40", 
  vertex.col="#3250a8",
  vertex.cex = rescale( gaNetDeg, 0.2, 4 ),
  coord = coords,
  main = "PINS Get\n Along With Network (Undirected)",
  sub = "node sized by degree centrality"
  )


Almost there! Let’s drop the isolates to help with the size:

# Now, take a look at the plot.
gplot( 
  gaNetU, 
  gmode = "graph",
  edge.col="grey40", 
  vertex.col="#3250a8",
  vertex.cex = rescale( gaNetDeg, 0.2, 4 ),
  displayisolates = FALSE,
  coord = coords,
  main = "PINS Get\n Along With Network (Undirected)",
  sub = "node sized by degree centrality"
  )


A few questions:

  • What do we see in the plot?
  • What does the degree centralization score of 0.04 indicate?


Power and Influence Network (Directed Network)

For the power and influence network, individuals could have asymmetric nominations. That is, i could nominate j and j didn’t necessarily nominate i. We will keep this asymmetry so that we can treat the network as directed.

# set the location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-PINS-power-w1-adj.csv"

# read in the .csv file
piMat <- as.matrix(
  read.csv( 
    loc,
    as.is = TRUE,
    header = TRUE,
    row.names = 1 
    )
  )

# create the network object
piNetD <- as.network( piMat, directed = TRUE )


Now we have created an undirected network where ties represent “get along with” nominations from both individuals. Let’s calculate the degree centrality scores, the centralization score, and then use the degree centrality scores to size our nodes in a plot using the vertex.cex() argument in the gplot() function.


# Set the coordinates
set.seed( 605 )
coords2 <- gplot( piNetD )


# get the degrees.
piNetiDeg <- degree( piNetD, gmode="digraph", cmode = "indegree" )
piNetoDeg <- degree( piNetD, gmode="digraph", cmode = "outdegree" )

# now the centralization scores.
piNetiDegCent <- centralization( piNetD, degree, mode="digraph", cmode = "indegree" )
piNetoDegCent <- centralization( piNetD, degree, mode="digraph", cmode = "outdegree" )
par( mfrow=c( 1, 2 ) )

gplot( 
  piNetD, 
  gmode = "digraph",
  edge.col="grey40", 
  vertex.col="#693859",
  vertex.cex = rescale( piNetiDeg, 0.2, 4 ),
  displayisolates = FALSE,
  coord = coords2,
  main = "PINS Power/Influnece\n Network (Directed)",
  sub = "node sized by indegree centrality"
  )

gplot( 
  piNetD, 
  gmode = "digraph",
  edge.col="grey40", 
  vertex.col="#2b868c",
  vertex.cex = rescale( piNetoDeg, 0.2, 4 ),
  displayisolates = FALSE,
  coord = coords2,
  main = "PINS Power/Influnece\n Network (Directed)",
  sub = "node sized by outdegree centrality"
  )


A few questions:

  • What do we see in the plot?
  • What does the indegree centralization score of 0.07 indicate?
  • What does the outdegree centralization score of 0.04 indicate?



Wrapping up…

Centrality scores are a common metric used in many network analysis projects. In this lab, we examined how to calculate degree centrality and centralization scores in R using the degree() and centralization() functions in the sna package.


Questions?


Please report any needed corrections to the Issues page. Thanks!


Back to SAND main page