For this tutorial we will use the igraph package in R.
If you don’t have it installed and loaded, do this before starting the
tutorial. We will also be using data from the igraphdata
package, so load this as well.
library(igraph)
library(igraphdata)
We will be working with data about hospital
encounters from the igraphdata package. First we load
the data and plot it to see what it looks like.
data(rfid)
plot(rfid)
We also might want to know more about the graph structure and attributes.
rfid
IGRAPH efde728 U--- 75 32424 -- RFID hospital encounter network
+ attr: name (g/c), Citation (g/c), Status (v/c), Time (e/n)
+ edges from efde728:
[1] 15--31 15--22 15--16 15--16 16--22 16--22 16--22 16--22 16--22 11--16
[11] 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22
[21] 11--22 15--16 11--22 11--16 15--16 11--16 15--16 11--16 11--16 14--22
[31] 14--22 14--22 3--37 3--37 15--22 15--22 15--22 3--37 3--37 5--37
[41] 3--37 3-- 6 3--37 5-- 7 5-- 7 5--37 1--20 3-- 5 3--37 1--17
[51] 3--37 8--17 17--37 31--37 3--37 5--17 8--17 8--37 5--31 8--17
[61] 5--31 6--37 23--31 5--31 8--17 5--23 23--37 10--13 5--31 1-- 6
[71] 8--17 5--37 23--37 8--23 17--23 8--17 23--37 8--23 17--37 17--23
+ ... omitted several edges
We can take a closer look at the edge and node attributes as follows.
V(rfid)$Status
E(rfid)$Time
You can also add additional node or edge attribute data to the graph object. For example, we can add a new attribute stating whether the node is a patient or has a different role. Then we can assign this new attribute to the nodes in the same way we add a new column to a dataframe.
Role <- V(rfid)$Status
Role <- ifelse(Role == "PAT", "Patient", "Other")
V(rfid)$Role <- Role
#examine your work
rfid
IGRAPH efde728 U--- 75 32424 -- RFID hospital encounter network
+ attr: name (g/c), Citation (g/c), Status (v/c), Role (v/c), Time
| (e/n)
+ edges from efde728:
[1] 15--31 15--22 15--16 15--16 16--22 16--22 16--22 16--22 16--22 11--16
[11] 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22 11--22
[21] 11--22 15--16 11--22 11--16 15--16 11--16 15--16 11--16 11--16 14--22
[31] 14--22 14--22 3--37 3--37 15--22 15--22 15--22 3--37 3--37 5--37
[41] 3--37 3-- 6 3--37 5-- 7 5-- 7 5--37 1--20 3-- 5 3--37 1--17
[51] 3--37 8--17 17--37 31--37 3--37 5--17 8--17 8--37 5--31 8--17
[61] 5--31 6--37 23--31 5--31 8--17 5--23 23--37 10--13 5--31 1-- 6
+ ... omitted several edges
As seen above, when the igraph package is loaded, you can simply run
the plot() function on an igraph object and R will know
what to do. However, the default aesthetics are not always that great.
We can start by adjusting the size and color of the nodes.
plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color="darkseagreen")
R will automatically choose a layout, but you can also manually set a layout. More details can be found on the igraph page in the R graph gallery.
We can also assign vertex colors based on node attributes. Let’s color the nodes using the “Role” attribute we made previously. Here we will first create a new attribute that defines the role colors.
V(rfid)$Rolecolor <- ifelse(V(rfid)$Role == "Patient", "orchid","darkseagreen")
plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color=V(rfid)$Rolecolor)
We can also alter the arrow size, label font and many other plotting parameters to make a more interpretable graph. Check out more details on the R graph gallery.
ifelse()
statements.plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color=V(rfid)$Rolecolor, edge.color="orange", vertex.shape="sphere")
V(rfid)$Statuscolor <- ifelse(V(rfid)$Status == "PAT", "burlywood1", ifelse(V(rfid)$Status == "NUR", "orangered1", ifelse(V(rfid)$Status == "ADM", "chartreuse1", ifelse(V(rfid)$Status == "MED", "skyblue2", "grey" ))))
plot(rfid, vertex.size=12, vertex.label.cex=.5, vertex.color=V(rfid)$Statuscolor, edge.color="lightgrey", vertex.label=NA)
In addition to visualizing networks, we can also calculate network statistics. For example, we can calculate various forms of centrality. We can also assign these centrality measures to nodes as attributes and then identify which nodes have the highest centrality metrics.
degree(rfid)
[1] 1480 288 373 430 1711 1335 4286 93 957 249 2045 1296 1934 236 2849
[16] 1582 2109 444 689 1279 1227 1798 2236 1501 763 1105 3130 161 4077 2075
[31] 163 133 848 228 1333 202 3695 63 367 149 155 849 445 244 1113
[46] 46 90 802 624 153 603 322 488 116 172 164 481 12 18 281
[61] 446 1366 1075 1242 404 67 21 367 389 84 289 306 460 174 61
V(rfid)$degree <- degree(rfid)
V(rfid)$betweenness <- betweenness(rfid)
#which node has the highest degree?
V(rfid)[degree(rfid)==max(degree(rfid))]
+ 1/75 vertex, from efde728:
[1] 7
With these new stats we can resize the nodes in our hospital network to see which types of actors were more central in the network. In this network, the range of degree measures are very high, so it is difficult to visualize well. However, this does tell us something right off the bat about which individuals tended to be more central in the network.
plot(rfid, vertex.size=V(rfid)$degree/100, vertex.label.cex=.5, vertex.color=V(rfid)$Statuscolor, edge.color="lightgrey", vertex.label=NA)
With R we can also examine the structure of the network as a whole.
For this tutorial we will be working with data about friendships
among UK faculty from the igraphdata package. First we
load the data and plot it to see what it looks like.
data(UKfaculty)
plot(UKfaculty)
UKfaculty
IGRAPH 6f42903 D-W- 81 817 --
+ attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c), Group
| (v/n), weight (e/n)
+ edges from 6f42903:
[1] 57->52 76->42 12->69 43->34 28->47 58->51 7->29 40->71 5->37 48->55
[11] 6->58 21-> 8 28->69 43->21 67->58 65->42 5->67 52->75 37->64 4->36
[21] 12->49 19->46 37-> 9 74->36 62-> 1 15-> 2 72->49 46->62 2->29 40->12
[31] 22->29 71->69 4-> 3 37->69 5-> 6 77->13 23->49 52->35 20->14 62->70
[41] 34->35 76->72 7->42 37->42 51->80 38->45 62->64 36->53 62->77 17->61
[51] 7->68 46->29 44->53 18->58 12->16 72->42 52->32 58->21 38->17 15->51
[61] 22-> 7 22->69 5->13 29-> 2 77->12 37->35 18->46 10->71 22->47 20->19
+ ... omitted several edges
summary(UKfaculty)
IGRAPH 6f42903 D-W- 81 817 --
+ attr: Type (g/c), Date (g/c), Citation (g/c), Author (g/c), Group
| (v/n), weight (e/n)
Let’s make a quick plot to look at the network.
plot(UKfaculty, vertex.label=V(UKfaculty)$Group, vertex.color="pink", edge.arrow.size=.3, edge.width=E(UKfaculty)$weight/2, edge.color=ifelse(E(UKfaculty)$weight>9, "orange","lightgrey"))
One thing that is different in this network compared to the first network is that it is directed. The direction of ties is indicated by arrows in the graph. These edges also have a weight attribute, which we can map onto the width of the edges.
There are numerous community detection algorithms in R. Here we use
the spinglass() function. Without knowing more about the
network, what might the detected communities below be indicative of
within a faculty friendship network?
#set seed and run community detection algorithm
set.seed(1000)
spinglasslayer <- spinglass.community(UKfaculty)
#take a look at what the clustering looks like
spinglasslayer
IGRAPH clustering spinglass, groups: 6, mod: 0.1
+ groups:
$`1`
[1] 14 20 26 51 56 80
$`2`
[1] 5 6 7 10 12 13 16 22 23 27 28 30 33 40 42 47 49 63 65 66 67 68 69 71
[25] 72 76 77
$`3`
[1] 24 32 48 52 55 64
+ ... omitted several groups/vertices
#plot the detected communities
plot(UKfaculty, vertex.label=NA, vertex.color=membership(spinglasslayer), mark.groups=communities(spinglasslayer),edge.arrow.size=.5, main="Community detection on UK friendship network")
Try running another community detection algorithm on the UKfaculty dataset. How do the community groups detected with this algorithm differ from the first one? Check out some different community detection algorithms.
set.seed(1000)
walktraplayer <- walktrap.community(UKfaculty)
plot(UKfaculty, vertex.label=V(UKfaculty)$Group, vertex.color=membership(walktraplayer),edge.arrow.size=.5, main="Walktrap community detection on UK friendship network")
plot(UKfaculty, vertex.label=V(UKfaculty)$Group, vertex.color=membership(spinglasslayer),edge.arrow.size=.5, main="Spinglass community detection on UK friendship network")
Here is an article with details on how to pick a community detection algorithm and the igraph page on community detection algorithms.
Within our network, we might also be interested in knowing how many of the ties are reciprocal. There are functions in R for this.
reciprocity(UKfaculty)
[1] 0.5875153
dyad_census(UKfaculty)
$mut
[1] 240
$asym
[1] 337
$null
[1] 2663
When bringing in your own network data, it may be in a sociomatrix or
edgelist format. In an edgelist format, you can convert it into an
igraph object using the graph.data.frame()
function. There are other functions as well for importing edgelist and
sociomatrix data. Here is a tutorial from a short course on Importing
nework data into R and another section from the online
companion to Network Science in Archaeology about working with
Network Data in
R. These sources should help with figuring out how to import
your network data files into R.