ANTH475 Home

Network data in R

For this tutorial we will use the igraph package in R. If you don’t have it installed and loaded, do this before starting the tutorial. We will also be using data from the igraphdata package, so load this as well.

library(igraph)
library(igraphdata)

We will be working with data about hospital encounters from the igraphdata package. First we load the data and plot it to see what it looks like.

data(rfid)
plot(rfid)

Graph objects and attributes

We also might want to know more about the graph structure and attributes.

rfid
IGRAPH efde728 U--- 75 32424 -- RFID hospital encounter network
+ attr: name (g/c), Citation (g/c), Status (v/c), Time (e/n)
+ edges from efde728:
 [1] 15--31 15--22 15--16 15--16 16--22 16--22 16--22 16--22 16--22 11--16
[11] 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22
[21] 11--22 15--16 11--22 11--16 15--16 11--16 15--16 11--16 11--16 14--22
[31] 14--22 14--22  3--37  3--37 15--22 15--22 15--22  3--37  3--37  5--37
[41]  3--37  3-- 6  3--37  5-- 7  5-- 7  5--37  1--20  3-- 5  3--37  1--17
[51]  3--37  8--17 17--37 31--37  3--37  5--17  8--17  8--37  5--31  8--17
[61]  5--31  6--37 23--31  5--31  8--17  5--23 23--37 10--13  5--31  1-- 6
[71]  8--17  5--37 23--37  8--23 17--23  8--17 23--37  8--23 17--37 17--23
+ ... omitted several edges

We can take a closer look at the edge and node attributes as follows.

V(rfid)$Status
E(rfid)$Time

You can also add additional node or edge attribute data to the graph object. For example, we can add a new attribute stating whether the node is a patient or has a different role. Then we can assign this new attribute to the nodes in the same way we add a new column to a dataframe.

Role <- V(rfid)$Status
Role <- ifelse(Role == "PAT", "Patient", "Other")
V(rfid)$Role <- Role
#examine your work
rfid
IGRAPH efde728 U--- 75 32424 -- RFID hospital encounter network
+ attr: name (g/c), Citation (g/c), Status (v/c), Role (v/c), Time
| (e/n)
+ edges from efde728:
 [1] 15--31 15--22 15--16 15--16 16--22 16--22 16--22 16--22 16--22 11--16
[11] 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22
[21] 11--22 15--16 11--22 11--16 15--16 11--16 15--16 11--16 11--16 14--22
[31] 14--22 14--22  3--37  3--37 15--22 15--22 15--22  3--37  3--37  5--37
[41]  3--37  3-- 6  3--37  5-- 7  5-- 7  5--37  1--20  3-- 5  3--37  1--17
[51]  3--37  8--17 17--37 31--37  3--37  5--17  8--17  8--37  5--31  8--17
[61]  5--31  6--37 23--31  5--31  8--17  5--23 23--37 10--13  5--31  1-- 6
+ ... omitted several edges

Network visualization

As seen above, when the igraph package is loaded, you can simply run the plot() function on an igraph object and R will know what to do. However, the default aesthetics are not always that great. We can start by adjusting the size and color of the nodes.

plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color="darkseagreen")

R will automatically choose a layout, but you can also manually set a layout. More details can be found on the igraph page in the R graph gallery.

We can also assign vertex colors based on node attributes. Let’s color the nodes using the “Role” attribute we made previously. Here we will first create a new attribute that defines the role colors.

V(rfid)$Rolecolor <- ifelse(V(rfid)$Role == "Patient", "orchid","darkseagreen")
plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color=V(rfid)$Rolecolor)

We can also alter the arrow size, label font and many other plotting parameters to make a more interpretable graph. Check out more details on the R graph gallery.

Try it

  1. Adjust the color of the edges and shapes of the nodes in your rfid graph.
  2. Create a new color label based on the “Status” attribute. Hint: Remember what we learned about nested ifelse() statements.
  3. Apply this new status color to your graph and turn off the node labels. What does this graph tell you about the encounters between different individuals in the hospital?
Click for solution
plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color=V(rfid)$Rolecolor, edge.color="orange", vertex.shape="sphere")



V(rfid)$Statuscolor <- ifelse(V(rfid)$Status == "PAT", "burlywood1", ifelse(V(rfid)$Status == "NUR", "orangered1", ifelse(V(rfid)$Status == "ADM", "chartreuse1", ifelse(V(rfid)$Status == "MED", "skyblue2", "grey" ))))

plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color=V(rfid)$Statuscolor, edge.color="lightgrey", vertex.label=NA)

Network statistics

In addition to visualizing networks, we can also calculate network statistics. For example, we can calculate various forms of centrality. We can also assign these centrality measures to nodes as attributes and then identify which nodes have the highest centrality metrics.

degree(rfid)
 [1] 1480  288  373  430 1711 1335 4286   93  957  249 2045 1296 1934  236 2849
[16] 1582 2109  444  689 1279 1227 1798 2236 1501  763 1105 3130  161 4077 2075
[31]  163  133  848  228 1333  202 3695   63  367  149  155  849  445  244 1113
[46]   46   90  802  624  153  603  322  488  116  172  164  481   12   18  281
[61]  446 1366 1075 1242  404   67   21  367  389   84  289  306  460  174   61

V(rfid)$degree <- degree(rfid)
V(rfid)$betweenness <- betweenness(rfid)

#which node has the highest degree?
V(rfid)[degree(rfid)==max(degree(rfid))]
+ 1/75 vertex, from efde728:
[1] 7

With these new stats we can resize the nodes in our hospital network to see which types of actors were more central in the network. In this network, the range of degree measures are very high, so it is difficult to visualize well. However, this does tell us something right off the bat about which individuals tended to be more central in the network.

plot(rfid, vertex.size=V(rfid)$degree/100, vertex.label.cex=.5, vertex.color=V(rfid)$Statuscolor, edge.color="lightgrey", vertex.label=NA)

Community detection

With R we can also examine the structure of the network as a whole. For this tutorial we will be working with data about friendships among UK faculty from the igraphdata package. First we load the data and plot it to see what it looks like.

data(UKfaculty)
plot(UKfaculty)

UKfaculty
IGRAPH 6f42903 D-W- 81 817 -- 
+ attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c), Group
| (v/n), weight (e/n)
+ edges from 6f42903:
 [1] 57->52 76->42 12->69 43->34 28->47 58->51  7->29 40->71  5->37 48->55
[11]  6->58 21-> 8 28->69 43->21 67->58 65->42  5->67 52->75 37->64  4->36
[21] 12->49 19->46 37-> 9 74->36 62-> 1 15-> 2 72->49 46->62  2->29 40->12
[31] 22->29 71->69  4-> 3 37->69  5-> 6 77->13 23->49 52->35 20->14 62->70
[41] 34->35 76->72  7->42 37->42 51->80 38->45 62->64 36->53 62->77 17->61
[51]  7->68 46->29 44->53 18->58 12->16 72->42 52->32 58->21 38->17 15->51
[61] 22-> 7 22->69  5->13 29-> 2 77->12 37->35 18->46 10->71 22->47 20->19
+ ... omitted several edges
summary(UKfaculty)
IGRAPH 6f42903 D-W- 81 817 -- 
+ attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c), Group
| (v/n), weight (e/n)

Let’s make a quick plot to look at the network.

plot(UKfaculty, vertex.label=V(UKfaculty)$Group, vertex.color="pink", edge.arrow.size=.3, edge.width=E(UKfaculty)$weight/2, edge.color=ifelse(E(UKfaculty)$weight>9, "orange","lightgrey"))

One thing that is different in this network compared to the first network is that it is directed. The direction of ties is indicated by arrows in the graph. These edges also have a weight attribute, which we can map onto the width of the edges.

There are numerous community detection algorithms in R. Here we use the spinglass() function. Without knowing more about the network, what might the detected communities below be indicative of within a faculty friendship network?

#set seed and run community detection algorithm
set.seed(1000)
spinglasslayer <- spinglass.community(UKfaculty)

#take a look at what the clustering looks like
spinglasslayer
IGRAPH clustering spinglass, groups: 6, mod: 0.1
+ groups:
  $`1`
  [1] 14 20 26 51 56 80
  
  $`2`
   [1]  5  6  7 10 12 13 16 22 23 27 28 30 33 40 42 47 49 63 65 66 67 68 69 71
  [25] 72 76 77
  
  $`3`
  [1] 24 32 48 52 55 64
  
  + ... omitted several groups/vertices

#plot the detected communities

plot(UKfaculty, vertex.label=NA, vertex.color=membership(spinglasslayer), mark.groups=communities(spinglasslayer),edge.arrow.size=.5, main="Community detection on UK friendship network")

Try it

Try running another community detection algorithm on the UKfaculty dataset. How do the community groups detected with this algorithm differ from the first one? Check out some different community detection algorithms.

Click for solution
set.seed(1000)
walktraplayer <- walktrap.community(UKfaculty)



plot(UKfaculty, vertex.label=V(UKfaculty)$Group, vertex.color=membership(walktraplayer),edge.arrow.size=.5, main="Walktrap community detection on UK friendship network")


plot(UKfaculty, vertex.label=V(UKfaculty)$Group, vertex.color=membership(spinglasslayer),edge.arrow.size=.5, main="Spinglass community detection on UK friendship network")

Here is an article with details on how to pick a community detection algorithm and the igraph page on community detection algorithms.

Dyad level measures

Within our network, we might also be interested in knowing how many of the ties are reciprocal. There are functions in R for this.

reciprocity(UKfaculty)
[1] 0.5875153
dyad_census(UKfaculty)
$mut
[1] 240

$asym
[1] 337

$null
[1] 2663

Loading network data into R

When bringing in your own network data, it may be in a sociomatrix or edgelist format. In an edgelist format, you can convert it into an igraph object using the graph.data.frame() function. There are other functions as well for importing edgelist and sociomatrix data. Here is a tutorial from a short course on Importing nework data into R and another section from the online companion to Network Science in Archaeology about working with Network Data in R. These sources should help with figuring out how to import your network data files into R.