This lesson will teach you how to work with and analyze cultural domain data using R. This lesson draws in part on datasets and formulas from the Anthrotools package. You can get the data for this lesson here and here.
# Load packages install.packages('devtools') #devtools lets us install packages
# from github library('devtools') install_github('alastair-JL/AnthroTools')
library(AnthroTools)
library(tidyverse)
library(ggplot2)
library(tidyr)
Second, let’s load the data we plan to use and assign it a name. Here we will load the FruitList data from the Anthrotools package, which has been made available to you as a .csv file. Take a look at the head()
and str()
of this spreadsheet. How do you think the data have been organized? What do the rows and columns represent?
FruitList <- read.csv("https://maddiebrown.github.io/ethnoecology/FruitList.csv")
head(FruitList)
X Subj Order CODE
1 1 1 1 pear
2 2 1 2 orange
3 3 1 3 apple
4 4 2 1 apple
5 5 2 2 apple
6 6 2 3 strawberry
str(FruitList)
'data.frame': 75 obs. of 4 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ Subj : int 1 1 1 2 2 2 3 3 4 4 ...
$ Order: int 1 2 3 1 2 3 1 2 1 2 ...
$ CODE : Factor w/ 8 levels "apple","banana",..: 6 4 1 1 1 8 2 8 1 7 ...
We can see that we have 75 rows with 4 variables. The ‘X’ variable is a numeric index autocreated by excel. The ‘Subj’ variable is a numeric identifier for each interview respondent. The ‘Order’ variable represents the order in which each fruit was named by each respondent. The ‘CODE’ variable contains each fruit named in the freelists.
How many respondents and unique fruits (‘items’) are in the FruitList
data?
Click for solution
# How many respondents are there?
length(unique(FruitList$Subj))
# how many unique fruits are named and what are they?
length(unique(FruitList$CODE))
unique(FruitList$CODE)
From these few operations we now know a bit more about the dataset. There are 20 individuals who collectively named a total of 8 unique fruits.
Next, let’s calculate the average number of fruits named per person and explore the distribution of list lengths across responses.
# How many responses per subject
FruitList %>% group_by(Subj) %>% count()
# A tibble: 20 × 2
# Groups: Subj [20]
Subj n
<int> <int>
1 1 3
2 2 3
3 3 2
4 4 5
5 5 3
6 6 3
7 7 4
8 8 4
9 9 3
10 10 3
11 11 5
12 12 5
13 13 4
14 14 5
15 15 3
16 16 5
17 17 2
18 18 5
19 19 3
20 20 5
# What is the average number of responses per subject?
FruitList %>% group_by(Subj) %>% count() %>% ungroup() %>% summarise(mean = mean(n))
# A tibble: 1 × 1
mean
<dbl>
1 3.75
# We can also do this in two parts by creating a new dataframe of the summarized
# data and then calculating the mean value in the 'n' column
BySubject <- FruitList %>% group_by(Subj) %>% count()
mean(BySubject$n)
# This method also allows you to generate additional summary information
summary(BySubject$n)
# We can also make a table showing how many respondents had lists of length 2, 3,
# 4, and 5
table(BySubject$n)
Next, let’s analyze how frequently each fruit was mentioned. For this analysis we are going to create a new dataframe object called ‘ByFruit’ that groups the data according to each unique fruit and counts the number of rows.
ByFruit <- FruitList %>% group_by(CODE) %>% count() %>% arrange(desc(n))
ByFruit
Plot the number of times each type of fruit is mentioned by interviewees. Make a bar plot with flipped coordinates so that each fruit name is on the y axis and the number of mentions are on the x axis.
Click for solution
ggplot(ByFruit, aes(x = reorder(CODE, n), y = n)) + geom_bar(stat = "identity") +
coord_flip() + ggtitle("Frequency of fruit mentions") + labs(x = "Fruit", y = "n")
Another way to visualize these data is through a scree plot. In these types of plots data are arranged such that each point represents a single item, the y-value for which is its frequency of mention in the dataset. These types of plots can be used to quickly identify trends and cut-off points in the data.
ggplot(ByFruit, aes(x = reorder(CODE, desc(n)), y = n, group = 1)) + geom_point() +
geom_line()
In freelists, we often expect the more highly ranked items to also show up more frequently. The code below plots the frequency of fruit mention against its average rank to explore this pattern.
# First, we make a new 'ByFruit' object that includes the fruit frequency, top
# rank, and average rank.
ByFruit <- FruitList %>% group_by(CODE) %>% summarise(Frequency = n(), topRank = min(Order),
avgRank = mean(Order))
ByFruit # Look at our new object
# A tibble: 8 × 4
CODE Frequency topRank avgRank
<fct> <int> <int> <dbl>
1 apple 25 1 2.52
2 banana 8 1 2.25
3 lemon 5 2 3.2
4 orange 7 1 3
5 peach 2 1 1.5
6 pear 10 1 2.8
7 plum 9 1 2
8 strawberry 9 1 2.44
# Plot these varables
plot(ByFruit$Frequency, ByFruit$avgRank)
With more observations there will often be a trend where the top ranking items show up most frequently. This is a toy dataset with only a few samples, so this may explaine why the trend doesn’t seem supported, but try this out with your own freelist data.
One of the main insights than can be learned from freelist data is the relative cultural salience of items in particular domains. For example, in the domain of household chores, we could determine whether vacuuming, washing dishes, shoveling the driveway, or feeding the snake are considered more salient or central to the idea of household chores compared to other tasks. We might find variations based on individual attributes or the cultural context in which the question is asked. Salience often mirrors frequency, but the calculation is a bit more complicated as it considers both an item’s frequency of mention and the order in which it is usually listed.
Luckily, AnthroTools has a built in salience calculation function that can do the math for us. The code below calculates the salience of each fruit listed in the context of each individual interviewee.
FruitListSalience <- CalculateSalience(FruitList, Order = "Order", Subj = "Subj",
CODE = "CODE")
# Note: I have included the arguments for Order, Subj and CODE for illustrative
# purposes. Because these column names match the arguments, you do not actually
# need to include them in this case. However, when you are working with your own
# datasets you may have different column names, so it is helpful to keep the
# underlying structure of functions in mind when you deploy them.
FruitListSalience
X Subj Order CODE Salience
1 1 1 1 pear 1.0000000
2 2 1 2 orange 0.6666667
3 3 1 3 apple 0.3333333
4 4 2 1 apple 1.0000000
5 5 2 2 apple 0.6666667
6 6 2 3 strawberry 0.3333333
7 7 3 1 banana 1.0000000
8 8 3 2 strawberry 0.5000000
9 9 4 1 apple 1.0000000
10 10 4 2 plum 0.8000000
11 11 4 3 banana 0.6000000
12 12 4 4 orange 0.4000000
13 13 4 5 apple 0.2000000
14 14 5 1 strawberry 1.0000000
15 15 5 2 apple 0.6666667
16 16 5 3 apple 0.3333333
17 17 6 1 apple 1.0000000
18 18 6 2 lemon 0.6666667
19 19 6 3 apple 0.3333333
20 20 7 1 banana 1.0000000
21 21 7 2 plum 0.7500000
22 22 7 3 lemon 0.5000000
23 23 7 4 lemon 0.2500000
24 24 8 1 strawberry 1.0000000
25 25 8 2 apple 0.7500000
26 26 8 3 pear 0.5000000
27 27 8 4 apple 0.2500000
28 28 9 1 apple 1.0000000
29 29 9 2 apple 0.6666667
30 30 9 3 plum 0.3333333
31 31 10 1 apple 1.0000000
32 32 10 2 pear 0.6666667
33 33 10 3 apple 0.3333333
34 34 11 1 apple 1.0000000
35 35 11 2 strawberry 0.8000000
36 36 11 3 banana 0.6000000
37 37 11 4 apple 0.4000000
38 38 11 5 lemon 0.2000000
39 39 12 1 apple 1.0000000
40 40 12 2 orange 0.8000000
41 41 12 3 pear 0.6000000
42 42 12 4 apple 0.4000000
43 43 12 5 orange 0.2000000
44 44 13 1 plum 1.0000000
45 45 13 2 peach 0.7500000
46 46 13 3 strawberry 0.5000000
47 47 13 4 banana 0.2500000
48 48 14 1 pear 1.0000000
49 49 14 2 apple 0.8000000
50 50 14 3 banana 0.6000000
51 51 14 4 pear 0.4000000
52 52 14 5 strawberry 0.2000000
53 53 15 1 plum 1.0000000
54 54 15 2 strawberry 0.6666667
55 55 15 3 apple 0.3333333
56 56 16 1 banana 1.0000000
57 57 16 2 plum 0.8000000
58 58 16 3 plum 0.6000000
59 59 16 4 pear 0.4000000
60 60 16 5 pear 0.2000000
61 61 17 1 orange 1.0000000
62 62 17 2 apple 0.5000000
63 63 18 1 peach 1.0000000
64 64 18 2 banana 0.8000000
65 65 18 3 orange 0.6000000
66 66 18 4 pear 0.4000000
67 67 18 5 apple 0.2000000
68 68 19 1 pear 1.0000000
69 69 19 2 lemon 0.6666667
70 70 19 3 strawberry 0.3333333
71 71 20 1 plum 1.0000000
72 72 20 2 apple 0.8000000
73 73 20 3 plum 0.6000000
74 74 20 4 orange 0.4000000
75 75 20 5 apple 0.2000000
The above code calculates the salience for each item by respondent. If you inspect the results, you’ll see that the first item in each list has a salience of 1, with each subsequent item decreasing in relative salience. This is useful for understanding how an individual thinks about the domain of fruits, but what if we are interested in knowing how salient apples are across all responses? We can calculate the salience of particular items as well with the SalienceByCode() function.
Try running the SalienceByCode() function on the new salience dataframe we made above.
Click for solution
SalienceByFruit <- SalienceByCode(FruitListSalience, dealWithDoubles = "MAX")
CODE MeanSalience SumSalience SmithsS
1 pear 0.6958333 5.566667 0.2783333
2 orange 0.6444444 3.866667 0.1933333
3 apple 0.7588889 11.383333 0.5691667
4 strawberry 0.5925926 5.333333 0.2666667
5 banana 0.7312500 5.850000 0.2925000
6 plum 0.8119048 5.683333 0.2841667
7 lemon 0.5083333 2.033333 0.1016667
8 peach 0.8750000 1.750000 0.0875000
# The dealwithdoubles argument tells R what to do if a respondent lists the same
# item twice. There are a few different options available for this, the right one
# to pick will depend on your data and research question.
From this analysis we can see that there are slight differences in the mean salience and Smith’s S. Smith’s S considers the length of lists in its calculation of salience (More info here.
Now let’s plot the Smith’s S results in decreasing order and add a vertical line at the 0.1 mark. This range is generally considered a benchmark level for assessing item salience in freelists.
ggplot(SalienceByFruit, aes(x = reorder(CODE, SmithsS), y = SmithsS)) + geom_bar(stat = "identity") +
coord_flip() + ggtitle("Fruit Salience") + labs(x = "Fruit", y = "Smith's S") +
geom_hline(yintercept = 0.1)
From this plot, it looks like most of the fruits could be considered salient in this dataset. However, although pears were mentioned slightly more frequently than bananas and plums, their salience is lower when their overall order within lists and other factors are taken into account.
It is also possible to compare item salience across different groups of respondents. The following example comes from the AnthroTools package which includes a sample grouping of the FruitList data. First, let’s load the new dataset. Because it is included in the AnthroTools package, we can load it directly with the data() function.
data("WorldList")
WorldList
Subj Order CODE GROUPING
1 1 1 plum MAINLAND
2 1 2 strawberry MAINLAND
3 1 3 pear MAINLAND
4 1 4 peach MAINLAND
5 2 1 apple MAINLAND
6 2 2 orange MAINLAND
7 2 3 pear MAINLAND
8 2 4 pear MAINLAND
9 3 1 strawberry MAINLAND
10 3 2 apple MAINLAND
11 3 3 apple MAINLAND
12 3 4 lemon MAINLAND
13 3 5 orange MAINLAND
14 4 1 plum MAINLAND
15 4 2 peach MAINLAND
16 5 1 lemon MAINLAND
17 5 2 lemon MAINLAND
18 5 3 strawberry MAINLAND
19 5 4 plum MAINLAND
20 6 1 plum MAINLAND
21 6 2 plum MAINLAND
22 6 3 apple MAINLAND
23 7 1 peach MAINLAND
24 7 2 apple MAINLAND
25 8 1 banana MAINLAND
26 8 2 orange MAINLAND
27 8 3 apple MAINLAND
28 9 1 apple MAINLAND
29 9 2 orange MAINLAND
30 10 1 banana MAINLAND
31 10 2 apple MAINLAND
32 10 3 apple MAINLAND
33 10 4 apple MAINLAND
34 10 5 apple MAINLAND
35 11 1 apple MAINLAND
36 11 2 pear MAINLAND
37 11 3 strawberry MAINLAND
38 11 4 apple MAINLAND
39 12 1 orange MAINLAND
40 12 2 plum MAINLAND
41 12 3 orange MAINLAND
42 12 4 apple MAINLAND
43 13 1 apple MAINLAND
44 13 2 pear MAINLAND
45 13 3 apple MAINLAND
46 14 1 plum MAINLAND
47 14 2 peach MAINLAND
48 14 3 apple MAINLAND
49 14 4 apple MAINLAND
50 15 1 apple MAINLAND
51 15 2 peach MAINLAND
52 15 3 apple MAINLAND
53 16 1 apple MAINLAND
54 16 2 banana MAINLAND
55 16 3 apple MAINLAND
56 17 1 orange MAINLAND
57 17 2 pear MAINLAND
58 1 1 banana ISLAND
59 1 2 apple ISLAND
60 1 3 pear ISLAND
61 2 1 lemon ISLAND
62 2 2 plum ISLAND
63 2 3 strawberry ISLAND
64 3 1 lemon ISLAND
65 3 2 apple ISLAND
66 3 3 strawberry ISLAND
67 4 1 plum ISLAND
68 4 2 banana ISLAND
69 5 1 peach ISLAND
70 5 2 plum ISLAND
71 5 3 banana ISLAND
72 5 4 lemon ISLAND
73 5 5 apple ISLAND
74 6 1 strawberry ISLAND
75 6 2 plum ISLAND
76 6 3 banana ISLAND
77 6 4 apple ISLAND
78 6 5 pear ISLAND
79 7 1 apple ISLAND
80 7 2 plum ISLAND
81 7 3 pear ISLAND
82 8 1 plum ISLAND
83 8 2 peach ISLAND
84 8 3 banana ISLAND
85 8 4 peach ISLAND
86 9 1 pear ISLAND
87 9 2 banana ISLAND
88 9 3 strawberry ISLAND
89 10 1 peach ISLAND
90 10 2 apple ISLAND
91 10 3 apple ISLAND
92 10 4 orange ISLAND
93 11 1 banana ISLAND
94 11 2 apple ISLAND
95 11 3 apple ISLAND
96 11 4 apple ISLAND
97 12 1 plum ISLAND
98 12 2 apple ISLAND
99 12 3 peach ISLAND
100 13 1 pear ISLAND
101 13 2 apple ISLAND
102 13 3 banana ISLAND
103 13 4 plum ISLAND
104 1 1 peach MOON
105 1 2 plum MOON
106 2 1 apple MOON
107 2 2 strawberry MOON
108 2 3 plum MOON
109 2 4 pear MOON
110 3 1 banana MOON
111 3 2 plum MOON
112 3 3 lemon MOON
113 3 4 pear MOON
114 4 1 apple MOON
115 4 2 apple MOON
116 4 3 strawberry MOON
117 5 1 apple MOON
118 5 2 apple MOON
119 5 3 banana MOON
120 5 4 banana MOON
121 5 5 orange MOON
122 6 1 lemon MOON
123 6 2 orange MOON
124 6 3 apple MOON
125 6 4 pear MOON
126 6 5 peach MOON
127 7 1 apple MOON
128 7 2 pear MOON
129 8 1 apple MOON
130 8 2 lemon MOON
131 9 1 apple MOON
132 9 2 banana MOON
133 9 3 lemon MOON
134 9 4 apple MOON
135 9 5 lemon MOON
136 10 1 peach MOON
137 10 2 apple MOON
138 10 3 strawberry MOON
139 11 1 peach MOON
140 11 2 apple MOON
141 12 1 apple MOON
142 12 2 apple MOON
143 13 1 apple MOON
144 13 2 apple MOON
145 13 3 apple MOON
146 13 4 apple MOON
147 13 5 peach MOON
148 14 1 lemon MOON
149 14 2 apple MOON
Try calculating the salience for each fruit, adding in an argument to differentiate responses by GROUPING. The argument is conveniently called GROUPING. First calculate the salience of each item by response, then calculate the salience of each item. Refer to the code we ran earlier in this lesson as a template to write your solution.
Click for solution
FL1 <- CalculateSalience(WorldList, Order = "Order", Subj = "Subj", CODE = "CODE",
GROUPING = "GROUPING")
# Note, this function will not run without adding in the GROUPING argument.
FL2 <- SalienceByCode(FL1, GROUPING = "GROUPING", dealWithDoubles = "MAX")
[1] "plum"
[1] "strawberry"
[1] "pear"
[1] "peach"
[1] "apple"
[1] "orange"
[1] "lemon"
[1] "banana"
[1] "plum"
[1] "strawberry"
[1] "pear"
[1] "peach"
[1] "apple"
[1] "orange"
[1] "lemon"
[1] "banana"
[1] "plum"
[1] "strawberry"
[1] "pear"
[1] "peach"
[1] "apple"
[1] "orange"
[1] "lemon"
[1] "banana"
GROUPING CODE MeanSalience SumSalience SmithsS
1 MAINLAND plum 0.8333333 5.000000 0.29411765
2 ISLAND plum 0.7729167 6.183333 0.47564103
3 MOON plum 0.5833333 1.750000 0.12500000
4 MAINLAND strawberry 0.6875000 2.750000 0.16176471
5 ISLAND strawberry 0.5000000 2.000000 0.15384615
6 MOON strawberry 0.4722222 1.416667 0.10119048
7 MAINLAND pear 0.5833333 2.916667 0.17156863
8 ISLAND pear 0.5733333 2.866667 0.22051282
9 MOON pear 0.3500000 1.400000 0.10000000
10 MAINLAND peach 0.6333333 3.166667 0.18627451
11 ISLAND peach 0.7708333 3.083333 0.23717949
12 MOON peach 0.6800000 3.400000 0.24285714
13 MAINLAND apple 0.7320513 9.516667 0.55980392
14 ISLAND apple 0.6500000 5.850000 0.45000000
15 MOON apple 0.8555556 10.266667 0.73333333
16 MAINLAND orange 0.6861111 4.116667 0.24215686
17 ISLAND orange 0.2500000 0.250000 0.01923077
18 MOON orange 0.5000000 1.000000 0.07142857
19 MAINLAND lemon 0.7000000 1.400000 0.08235294
20 ISLAND lemon 0.8000000 2.400000 0.18461538
21 MOON lemon 0.7200000 3.600000 0.25714286
22 MAINLAND banana 0.8888889 2.666667 0.15686275
23 ISLAND banana 0.6708333 5.366667 0.41282051
24 MOON banana 0.8000000 2.400000 0.17142857
In addition to evaluating salience, we might be interested in the percentage of respondents from each group who named a particular item. The code below creates a new object where the data are grouped according to the GROUPING variable.
frequencybygroup <- WorldList %>% group_by(GROUPING) %>% mutate(GroupN = length(unique(Subj))) %>%
ungroup %>% group_by(GROUPING, CODE, GroupN) %>% summarise(totalResponses = n(),
nRespondents = length(unique(Subj)), percentRespondents = round(length(unique(Subj))/first(GroupN) *
100, 2)) %>% arrange(GROUPING, desc(percentRespondents))
We can then plot the results of this grouping with a facet wrap graph.
ggplot(frequencybygroup, aes(x = reorder(CODE, desc(percentRespondents)), percentRespondents)) +
geom_bar(stat = "identity") + coord_flip() + ggtitle("Frequency of fruit mentions by site") +
labs(x = "Fruit", y = "n") + facet_wrap(vars(GROUPING))
It looks like in all cases, apples were mentioned more frequently, but the different sample populations differ in their rates of mentioning oranges and bananas. One thing you will notice in this graph is that although the fruits are listed in descending order for the “ISLAND” sample, the order is not meaningful for the other sites. This is because in the faceted plot, the y axis is the same for each site, making it easier to compare across samples.
We can also compare the lengths of lists across each group. This shows us that the average list lengths were not very different across the groups, but were slightly higher in the Island population.
WorldList %>% group_by(GROUPING, Subj) %>% summarise(n = n()) %>% ungroup %>% group_by(GROUPING) %>%
summarize(nResponse = n(), avgLength = mean(n), maxLength = max(n), minLength = min(n),
medianLength = median(n))
# A tibble: 3 × 6
GROUPING nResponse avgLength maxLength minLength medianLength
<chr> <int> <dbl> <int> <int> <dbl>
1 ISLAND 13 3.54 5 2 3
2 MAINLAND 17 3.35 5 2 3
3 MOON 14 3.29 5 2 3
Beyond investigating how many fruits are listed how frequently within the fruit domain, we are also interested in learning more about the structure of this domain. We can think of this as a type of mental map, how do people think about each item within the fruit domain relative to the other items?
One way to investigate this question is through examining co-occurence of items within the same freelist. Essentially, how many times is each pair of fruits mentioned by the same respondent? We can create a co-occurence matrix with tidyverse
tools or the AnthroTools
package.
Using FreeListTable()
make a table indicating whether or not each respondent mentioned a particular fruit. Hint: This is a presence/absence table.
Click for solution
FruitListTable <- FreeListTable(FruitList, CODE = "CODE", Salience = "Salience",
Subj = "Subj", tableType = "PRESENCE")
You can do the same operation with tidyverse.
# add new count column
FruitList$present <- rep(1)
# Spread into wide datatable. Note: spread function requires unique identifiers
# for rows, so here we remove any duplicate rows
FruitListWide <- FruitList %>% select(Subj, CODE, present) %>% unique %>% spread(CODE,
present)
# convert NAs to 0
FruitListWide[is.na(FruitListWide)] <- 0
FruitListWide
Subj apple banana lemon orange peach pear plum strawberry
1 1 1 0 0 1 0 1 0 0
2 2 1 0 0 0 0 0 0 1
3 3 0 1 0 0 0 0 0 1
4 4 1 1 0 1 0 0 1 0
5 5 1 0 0 0 0 0 0 1
6 6 1 0 1 0 0 0 0 0
7 7 0 1 1 0 0 0 1 0
8 8 1 0 0 0 0 1 0 1
9 9 1 0 0 0 0 0 1 0
10 10 1 0 0 0 0 1 0 0
11 11 1 1 1 0 0 0 0 1
12 12 1 0 0 1 0 1 0 0
13 13 0 1 0 0 1 0 1 1
14 14 1 1 0 0 0 1 0 1
15 15 1 0 0 0 0 0 1 1
16 16 0 1 0 0 0 1 1 0
17 17 1 0 0 1 0 0 0 0
18 18 1 1 0 1 1 1 0 0
19 19 0 0 1 0 0 1 0 1
20 20 1 0 0 1 0 0 1 0
In this case, we have made a presence/absence matrix. That is, do the two items co-occur or not. You could also create a matrix that is weighted based on the number of co-occurences. the co-occurence matrix can now be used for clustering, MDS, network analysis and other analyses.
With the matrix we made above, we can analyze how similar or dissimilar each fruit it from one another using multidimensional scaling (MDS). The data manipulations can be a little tricky, but I’ve included the code below to get you started.
# Here we use the FruitListWide dataframe we made above.
## Convert dataframe into presence/absence only by removing the first column. First we can assign the Subj column to the rownames
rownames(FruitListWide) <- FruitListWide$Subj
# Remove the subj column
FruitListWide<-FruitListWide[,-1]
# Now convert datasetbyword matrix to a wordbyword co-occurance matrix
FruitListWide <- as.matrix(FruitListWide)
FruitsBySubj <- t(FruitListWide)
# Look at data structure
FruitsBySubj
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
apple 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 0 1
banana 0 0 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 1 0 0
lemon 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0
orange 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1
peach 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
pear 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0
plum 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1
strawberry 0 1 1 0 1 0 0 1 0 0 1 0 1 1 1 0 0 0 1 0
FruitsByFruits <- crossprod(FruitListWide)
## Make a distance matrix
FruitDistanceMatrix <- cmdscale(dist(FruitsBySubj))
FruitDistanceMatrixDF<- data.frame(FruitDistanceMatrix)
## Using binary mds method
FruitsBySubj %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test2
test3 <- data.frame(test2$points) %>% #mds coordinates
bind_cols(Fruits = rownames(FruitsBySubj)) %>% #bind sample names
bind_cols(count=rowSums(FruitsBySubj)) #bind count by each fruit
# Look at results
test3
X1 X2 Fruits count
apple -0.20005674 0.2972494 apple 15
banana 0.16504543 -0.1902441 banana 8
lemon 0.44547450 0.1667202 lemon 4
orange -0.48782894 0.0130155 orange 6
peach -0.06579848 -0.4850107 peach 2
pear -0.20494625 0.2566384 pear 8
plum 0.05280546 -0.2509947 plum 7
strawberry 0.29530502 0.1926259 strawberry 9
## Plot results
ggplot(test3,aes(x = X1,y = X2, label = Fruits)) +geom_text() + geom_point(aes(size=count),alpha=0.4,color="blue")+labs(size="Count",x="Dimension 1", y="Dimension 2") + ggtitle("MDS of FruitList")
This MDS shows the fruits by reponses, showing us how the fruits are mentioned by different types of respondents. We might also be interested in an MDS of the fruits by fruits co-occurence, which can be calculated through the code below.
## Using binary mds method
FruitsByFruits %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test4
test5 <- data.frame(test4$points) %>% #mds coordinates
bind_cols(Fruits = rownames(FruitsByFruits))#bind sample names
# Look at results
#test5
## Plot results
ggplot(test5,aes(x = X1,y = X2, label = Fruits)) +geom_text(fontface="bold",size=2.5,alpha=0.6,position=position_jitter(width=0.05, height=0.005)) + geom_point(size=NA)+labs(x="Dimension 1", y="Dimension 2") + ggtitle("MDS of FruitbyFruit")
This plot shows us which fruits are more central to the domain “fruits” across all the respondents and which are more peripheral. The fruits in the center of the plot co-occur more frequently since they are in more of the freelists in the sample. Borgatti has a useful presentation explaining more.
In addition to freelist data, pile sorts are a common tool used in cultural domain analysis. I’ve made a sample dataset of pilesorts of 40 trees from 9 individuals. You can load the data in the same way we loaded the fruitlist data. We are doing a bit of data wrangling to get the data into the right format for MDS scaling. Hopefully this helps you understand a bit about how data can be shaped and reshaped no matter what form it comes to you in!
treepilesort <- read.csv("https://maddiebrown.github.io/ethnoecology/pilesort_sampledata.csv")
rownames(treepilesort) <- treepilesort$Tree
# Remove tree column
treepilesort <- treepilesort[, -1]
# Flip so rows are individual responses
PersonbyTree <- t(treepilesort)
PersonbyTree <- as.data.frame(PersonbyTree) # make the output into a dataframe
gather()
you might need a third argument maple:coconut
. You might also consider using pivot_longer()
.Click for solution
# Append rowname to pile number such that each pile has a unique name
PersonbyTree$Person <- rownames(PersonbyTree)
# Convert to long format
PersonbyTreeLong <- PersonbyTree %>% gather(Species, Pile, maple:coconut)
# another way with tidyverse PersonbyTree %>% pivot_longer(!Person, names_to =
# 'Species', values_to = 'Pile') PersonbyTree %>%
# pivot_longer(-Tree,names_to='Species',values_to='Pile') #Another way to do this
# with new tidyverse
Now that our data are in a long format we can add in a new unique identifier for each person_pile combination. This step is required because the piles have numeric names that repeat across respondents. Next we can make a presence/absence column that will be useful for creating a new matrix based on whether or not a tree occurs in a particular pile.
# Add new column which is person ID and pile ID combined together
PersonbyTreeLong <- PersonbyTreeLong %>% mutate(PersonPile = paste(Person, Pile,
sep = "_"))
PersonbyTreeLong$present <- rep(1)
Now we can convert the data back into a wide format, where each column is a different tree species and the values represent presence or absence of the tree.
# Spread into wide datatable. Note: spread function requires unique identifiers
# for rows, so here we remove any duplicate rows
TreeWide <- PersonbyTreeLong %>% select(PersonPile, Species, present) %>% unique %>%
spread(Species, present)
# Convert NAs to 0
TreeWide[is.na(TreeWide)] <- 0
head(TreeWide)
PersonPile alder apple aspen banana beech birch black walnut cedar chestnut
1 A_1 1 0 1 0 1 1 0 1 0
2 A_2 0 1 0 1 0 0 0 0 0
3 A_3 0 0 0 0 0 0 1 0 1
4 B_1 1 1 1 0 1 1 1 0 1
5 B_2 0 0 0 1 0 0 0 0 0
6 B_3 0 0 0 0 0 0 0 1 0
coconut dawn redwood douglas fir gingko horse chestnut japanese maple
1 0 1 1 1 0 1
2 1 0 0 0 0 0
3 0 0 0 0 1 0
4 0 0 0 1 1 1
5 1 0 0 0 0 0
6 0 1 1 0 0 0
laurel oak live oak lodgepole pine longleaf pine mango maple oak orange palm
1 1 1 1 1 0 1 1 0 1
2 0 0 0 0 1 0 0 1 0
3 0 0 0 0 0 0 0 0 0
4 1 1 0 0 1 1 1 1 0
5 0 0 0 0 0 0 0 0 1
6 0 0 1 1 0 0 0 0 0
pear pecan pine ponderosa pine red oak redwood sequioa silver maple spruce
1 0 0 1 1 1 1 1 1 1
2 1 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0 0
4 1 1 0 0 1 0 0 1 0
5 0 0 0 0 0 0 0 0 0
6 0 0 1 1 0 1 1 0 1
sugar maple sycamore walnut white oak white pine
1 1 1 0 1 1
2 0 0 0 0 0
3 0 0 1 0 0
4 1 1 1 1 0
5 0 0 0 0 0
6 0 0 0 0 1
This dataframe looks great. The multidimensional scaling functions in R take a matrix however, so we will convert the dataframe into two different matrices following the same procedure we used for the fruit data.
## Convert dataframe into presenve/absence only by removing the first column.
## First we can assign the Subj column to the rownames
rownames(TreeWide) <- TreeWide$PersonPile
# Remove the subj column
TreeWide <- TreeWide[, -1]
# Now convert datasetbyword matrix to a wordbyword co-occurance matrix
TreeWide <- as.matrix(TreeWide)
TreeByPile <- t(TreeWide)
# Look at data structure
TreeByPile
A_1 A_2 A_3 B_1 B_2 B_3 C_1 C_2 C_3 C_4 D_1 D_2 D_3 D_4 D_5 D_6
alder 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0
apple 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0
aspen 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0
banana 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0
beech 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0
birch 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0
black walnut 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1
cedar 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
chestnut 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1
coconut 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0
dawn redwood 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
douglas fir 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
gingko 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
horse chestnut 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1
japanese maple 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0
laurel oak 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
live oak 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
lodgepole pine 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
longleaf pine 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
mango 0 1 0 1 0 0 1 0 0 0 0 0 0 1 0 0
maple 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0
oak 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
orange 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0
palm 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0
pear 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0
pecan 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1
pine 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
ponderosa pine 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
red oak 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
redwood 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
sequioa 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
silver maple 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0
spruce 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
sugar maple 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0
sycamore 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0
walnut 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1
white oak 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0
white pine 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
D_7 D_8 E_1 E_10 E_2 E_3 E_4 E_5 E_6 E_7 E_8 E_9 F_1 F_10 F_11
alder 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
apple 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
aspen 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
banana 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
beech 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
birch 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
black walnut 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
cedar 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
chestnut 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
coconut 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
dawn redwood 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
douglas fir 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
gingko 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
horse chestnut 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
japanese maple 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
laurel oak 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
live oak 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
lodgepole pine 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
longleaf pine 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
mango 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
maple 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
oak 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
orange 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
palm 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
pear 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
pecan 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
pine 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
ponderosa pine 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
red oak 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
redwood 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
sequioa 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
silver maple 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
spruce 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
sugar maple 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
sycamore 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
walnut 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
white oak 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
white pine 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
F_12 F_13 F_14 F_15 F_16 F_17 F_18 F_19 F_2 F_20 F_3 F_4 F_5 F_6
alder 0 0 0 0 0 0 1 0 0 0 0 0 0 0
apple 0 0 0 0 0 0 0 0 0 0 1 0 0 0
aspen 0 0 0 0 0 0 0 1 0 0 0 0 0 0
banana 0 0 0 0 0 0 0 0 0 0 0 0 0 0
beech 0 0 0 0 0 0 0 0 0 1 0 0 0 0
birch 0 0 0 0 0 0 0 0 0 0 0 0 0 1
black walnut 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cedar 0 1 0 0 0 0 0 0 0 0 0 0 0 0
chestnut 0 0 0 0 0 0 0 0 0 0 0 0 0 0
coconut 0 0 0 0 0 0 0 0 0 0 0 0 1 0
dawn redwood 0 0 0 0 0 0 0 0 0 0 0 0 0 0
douglas fir 0 0 1 0 0 0 0 0 0 0 0 0 0 0
gingko 0 0 0 1 0 0 0 0 0 0 0 0 0 0
horse chestnut 0 0 0 0 0 0 0 0 0 0 0 0 0 0
japanese maple 0 0 0 0 0 0 0 0 0 0 0 0 0 0
laurel oak 0 0 0 0 0 0 0 0 1 0 0 0 0 0
live oak 0 0 0 0 0 0 0 0 1 0 0 0 0 0
lodgepole pine 1 0 0 0 0 0 0 0 0 0 0 0 0 0
longleaf pine 1 0 0 0 0 0 0 0 0 0 0 0 0 0
mango 0 0 0 0 0 0 0 0 0 0 0 0 0 0
maple 0 0 0 0 0 0 0 0 0 0 0 0 0 0
oak 0 0 0 0 0 0 0 0 1 0 0 0 0 0
orange 0 0 0 0 0 0 0 0 0 0 0 1 0 0
palm 0 0 0 0 0 0 0 0 0 0 0 0 1 0
pear 0 0 0 0 0 0 0 0 0 0 1 0 0 0
pecan 0 0 0 0 0 1 0 0 0 0 0 0 0 0
pine 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ponderosa pine 1 0 0 0 0 0 0 0 0 0 0 0 0 0
red oak 0 0 0 0 0 0 0 0 1 0 0 0 0 0
redwood 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sequioa 0 0 0 0 0 0 0 0 0 0 0 0 0 0
silver maple 0 0 0 0 0 0 0 0 0 0 0 0 0 0
spruce 0 1 0 0 0 0 0 0 0 0 0 0 0 0
sugar maple 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sycamore 0 0 0 0 1 0 0 0 0 0 0 0 0 0
walnut 0 0 0 0 0 0 0 0 0 0 0 0 0 0
white oak 0 0 0 0 0 0 0 0 1 0 0 0 0 0
white pine 1 0 0 0 0 0 0 0 0 0 0 0 0 0
F_7 F_8 F_9 G_1 G_2 H_1 H_10 H_2 H_3 H_4 H_5 H_6 H_7 H_8 H_9 I_1
alder 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
apple 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
aspen 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
banana 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0
beech 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
birch 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
black walnut 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
cedar 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
chestnut 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0
coconut 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0
dawn redwood 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
douglas fir 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
gingko 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
horse chestnut 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0
japanese maple 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1
laurel oak 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
live oak 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
lodgepole pine 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
longleaf pine 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
mango 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0
maple 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1
oak 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
orange 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
palm 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0
pear 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
pecan 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
pine 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
ponderosa pine 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
red oak 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
redwood 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
sequioa 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
silver maple 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1
spruce 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
sugar maple 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1
sycamore 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
walnut 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
white oak 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
white pine 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
I_10 I_11 I_12 I_13 I_2 I_3 I_4 I_5 I_6 I_7 I_8 I_9
alder 0 0 0 0 0 0 0 0 1 0 0 0
apple 0 0 0 0 0 1 0 0 0 0 0 0
aspen 0 0 0 0 0 0 0 1 0 0 0 0
banana 0 0 0 0 0 1 0 0 0 0 0 0
beech 0 0 0 0 0 0 0 1 0 0 0 0
birch 0 0 0 0 0 0 0 1 0 0 0 0
black walnut 0 0 0 0 0 0 0 0 0 1 0 0
cedar 1 0 0 0 0 0 0 0 0 0 0 0
chestnut 0 0 0 0 0 0 0 0 0 1 0 0
coconut 0 0 0 0 0 1 0 0 0 0 0 0
dawn redwood 0 0 0 0 0 0 0 0 0 0 1 0
douglas fir 0 0 1 0 0 0 0 0 0 0 0 0
gingko 0 0 0 1 0 0 0 0 0 0 0 0
horse chestnut 0 0 0 0 0 0 0 0 1 0 0 0
japanese maple 0 0 0 0 0 0 0 0 0 0 0 0
laurel oak 0 0 0 0 1 0 0 0 0 0 0 0
live oak 0 0 0 0 1 0 0 0 0 0 0 0
lodgepole pine 1 0 0 0 0 0 0 0 0 0 0 0
longleaf pine 1 0 0 0 0 0 0 0 0 0 0 0
mango 0 0 0 0 0 1 0 0 0 0 0 0
maple 0 0 0 0 0 0 0 0 0 0 0 0
oak 0 0 0 0 1 0 0 0 0 0 0 0
orange 0 0 0 0 0 1 0 0 0 0 0 0
palm 0 0 0 0 0 0 1 0 0 0 0 0
pear 0 0 0 0 0 1 0 0 0 0 0 0
pecan 0 0 0 0 0 0 0 0 0 1 0 0
pine 0 0 0 0 0 0 0 0 0 0 0 1
ponderosa pine 1 0 0 0 0 0 0 0 0 0 0 0
red oak 0 0 0 0 1 0 0 0 0 0 0 0
redwood 0 0 0 0 0 0 0 0 0 0 1 0
sequioa 0 0 0 0 0 0 0 0 0 0 1 0
silver maple 0 0 0 0 0 0 0 0 0 0 0 0
spruce 0 1 0 0 0 0 0 0 0 0 0 0
sugar maple 0 0 0 0 0 0 0 0 0 0 0 0
sycamore 0 0 0 0 0 0 0 0 1 0 0 0
walnut 0 0 0 0 0 0 0 0 0 1 0 0
white oak 0 0 0 0 1 0 0 0 0 0 0 0
white pine 1 0 0 0 0 0 0 0 0 0 0 0
Now that our data are in the proper format, we can examine them with MDS.
#FIRST looking at tree by pile
##distance matrix
TreeByPileMatrix <- cmdscale(dist(TreeByPile))
TreeByPileMatrixDF<- data.frame(TreeByPileMatrix)
##using binary mds method
TreeByPile %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test2
test3 <- data.frame(test2$points) %>% #mds coordinates
bind_cols(Trees= rownames(TreeByPile)) %>% #bind sample names
bind_cols(count=rowSums(TreeByPile)) #bind count by each fruit
# Look at results
test3
X1 X2 Trees count
alder -0.013320666 0.30415761 alder 9
apple -0.310765217 -0.32878474 apple 9
aspen -0.015764792 0.31227594 aspen 9
banana -0.249543392 -0.44506073 banana 9
beech -0.015764792 0.31227594 beech 9
birch -0.182995445 0.26334014 birch 9
black walnut -0.220274963 -0.22711366 black walnut 9
cedar 0.470627266 -0.07922739 cedar 9
chestnut -0.219721484 -0.22750602 chestnut 9
coconut -0.252133376 -0.45172301 coconut 9
dawn redwood 0.388356463 -0.06777733 dawn redwood 9
douglas fir 0.427206297 -0.07215080 douglas fir 9
gingko -0.178681232 0.14655505 gingko 9
horse chestnut -0.038272499 -0.08504713 horse chestnut 9
japanese maple -0.038470051 0.31837162 japanese maple 9
laurel oak -0.259555150 0.28057415 laurel oak 9
live oak -0.259555150 0.28057415 live oak 9
lodgepole pine 0.506667722 -0.05071981 lodgepole pine 9
longleaf pine 0.506667722 -0.05071981 longleaf pine 9
mango -0.314808362 -0.37597848 mango 9
maple -0.201419337 0.26797441 maple 9
oak -0.259555150 0.28057415 oak 9
orange -0.308994465 -0.32601178 orange 9
palm -0.154384024 -0.26830110 palm 9
pear -0.310765217 -0.32878474 pear 9
pecan -0.219019827 -0.22519819 pecan 9
pine 0.489378588 -0.04728660 pine 9
ponderosa pine 0.506667722 -0.05071981 ponderosa pine 9
red oak -0.259555150 0.28057415 red oak 9
redwood 0.388356463 -0.06777733 redwood 9
sequioa 0.388356463 -0.06777733 sequioa 9
silver maple -0.038470051 0.31837162 silver maple 9
spruce 0.433471931 -0.07372209 spruce 9
sugar maple -0.201419337 0.26797441 sugar maple 9
sycamore -0.009385121 0.28105386 sycamore 9
walnut -0.220274963 -0.22711366 walnut 9
white oak -0.259555150 0.28057415 white oak 9
white pine 0.506667722 -0.05071981 white pine 9
##plot results
ggplot(test3,aes(x = X1,y = X2, label = Trees))+geom_text(cex=3,alpha=0.5,position=position_jitter(width=0.05, height=0.05)) + geom_point(size=NA)+labs(x="Dimension 1", y="Dimension 2") + ggtitle("MDS of Trees by Pile")
This plot shows an MDS of the trees by which pile they are in. It shows us there are 4-5 different types of piles usually made by respondents. This can help in interpretation, but often we are most interested in the relationship between each tree species and the other tree species. For this, we work with a TreebyTree matrix.
# Make a treebytree dataframe
TreebyTree <- crossprod(TreeWide)
head(TreebyTree)
## Using binary mds method
TreebyTree %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test4
test5 <- data.frame(test4$points) %>% #mds coordinates
bind_cols(Trees = rownames(TreebyTree))#bind sample names
# Look at results
#test5
Then we can plot the results.
## Plot results
ggplot(test5, aes(x = X1, y = X2, label = Trees)) + geom_text(fontface = "bold",
size = 2, alpha = 0.4, position = position_jitter(width = 0.05, height = 0.02)) +
geom_point(size = NA) + labs(x = "Dimension 1", y = "Dimension 2") + ggtitle("MDS of Trees")
This plot can be a bit difficult to read at times depending on how the labels are plotted. You can rerun the code until a more legible version of the graph is generated. You can also adjust the label transparency, size, and jitter to improve graph legibility. So far, it looks like the people in this sample consider coconuts and bananas to be very different from all the other trees, while pines exhibit their own cluster but are still more similar to other trees than to the coconut-banana cluster.
In addition to multidimensional scaling, we can also examine the pilesort data through hierarchical clustering. This method allows us to understand the hiararchy of cluster similarities to one another. The code below allows us to create a cluster dendrogram of the tree data.
# Then we can run a cluster analysis on the pile sort data Code adapted from:
# https://www.r-bloggers.com/clustering-music-genres-with-r/. Check it out for
# examples of how to take this analysis further. first turn data into matrix and
# remove diagonal
TreebyTreeMat <- as.matrix(TreebyTree)
diag(TreebyTreeMat) <- 0
# make distance matrix of co-occurence data
TreeDistMat <- dist(TreebyTreeMat)
# perform hierchical clustering on data
TreeHC <- hclust(TreeDistMat, method = "ward.D")
plot(TreeHC, xlab = "Tree species", sub = "Ward's method")
Depending on which height the graph is cut at, different numbers of tree clusters are created. This visualization can help us better understand how similar or dissimilar groups of trees are to one another, depending on the level of similarity used as a benchmark. For example, if we cut the dendrogram at a height of 100, there will be two cluster, which roughly correspond to conifers and non-conifers. However, if we cut the dendrogram at a height of 20, we will have 5 clusters, which could be named something like conifers, oaks, non-oak non-fruit/nut bearing, fruit trees, and nut trees. The level at which you choose to define clusters depends on both the data analysis results and questions of your study.