Getting started

This lesson will teach you how to work with and analyze cultural domain data using R. This lesson draws in part on datasets and formulas from the Anthrotools package. You can get the data for this lesson here and here.

Load packages and data

# Load packages install.packages('devtools') #devtools lets us install packages
# from github library('devtools') install_github('alastair-JL/AnthroTools')
library(AnthroTools)
library(tidyverse)
library(ggplot2)
library(tidyr)

Second, let’s load the data we plan to use and assign it a name. Here we will load the FruitList data from the Anthrotools package, which has been made available to you as a .csv file. Take a look at the head() and str() of this spreadsheet. How do you think the data have been organized? What do the rows and columns represent?

FruitList <- read.csv("https://maddiebrown.github.io/ethnoecology/FruitList.csv")
head(FruitList)
  X Subj Order       CODE
1 1    1     1       pear
2 2    1     2     orange
3 3    1     3      apple
4 4    2     1      apple
5 5    2     2      apple
6 6    2     3 strawberry
str(FruitList)
'data.frame':   75 obs. of  4 variables:
 $ X    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Subj : int  1 1 1 2 2 2 3 3 4 4 ...
 $ Order: int  1 2 3 1 2 3 1 2 1 2 ...
 $ CODE : Factor w/ 8 levels "apple","banana",..: 6 4 1 1 1 8 2 8 1 7 ...

We can see that we have 75 rows with 4 variables. The ‘X’ variable is a numeric index autocreated by excel. The ‘Subj’ variable is a numeric identifier for each interview respondent. The ‘Order’ variable represents the order in which each fruit was named by each respondent. The ‘CODE’ variable contains each fruit named in the freelists.

Try it

How many respondents and unique fruits (‘items’) are in the FruitList data?

Click for solution

# How many respondents are there?
length(unique(FruitList$Subj))
# how many unique fruits are named and what are they?
length(unique(FruitList$CODE))
unique(FruitList$CODE)

From these few operations we now know a bit more about the dataset. There are 20 individuals who collectively named a total of 8 unique fruits.

Next, let’s calculate the average number of fruits named per person and explore the distribution of list lengths across responses.

# How many responses per subject
FruitList %>% group_by(Subj) %>% count()
# A tibble: 20 × 2
# Groups:   Subj [20]
    Subj     n
   <int> <int>
 1     1     3
 2     2     3
 3     3     2
 4     4     5
 5     5     3
 6     6     3
 7     7     4
 8     8     4
 9     9     3
10    10     3
11    11     5
12    12     5
13    13     4
14    14     5
15    15     3
16    16     5
17    17     2
18    18     5
19    19     3
20    20     5
# What is the average number of responses per subject?
FruitList %>% group_by(Subj) %>% count() %>% ungroup() %>% summarise(mean = mean(n))
# A tibble: 1 × 1
   mean
  <dbl>
1  3.75
# We can also do this in two parts by creating a new dataframe of the summarized
# data and then calculating the mean value in the 'n' column
BySubject <- FruitList %>% group_by(Subj) %>% count()
mean(BySubject$n)
# This method also allows you to generate additional summary information
summary(BySubject$n)
# We can also make a table showing how many respondents had lists of length 2, 3,
# 4, and 5
table(BySubject$n)

Item frequency analysis

Next, let’s analyze how frequently each fruit was mentioned. For this analysis we are going to create a new dataframe object called ‘ByFruit’ that groups the data according to each unique fruit and counts the number of rows.

ByFruit <- FruitList %>% group_by(CODE) %>% count() %>% arrange(desc(n))
ByFruit

Try it

Plot the number of times each type of fruit is mentioned by interviewees. Make a bar plot with flipped coordinates so that each fruit name is on the y axis and the number of mentions are on the x axis.

Click for solution

ggplot(ByFruit, aes(x = reorder(CODE, n), y = n)) + geom_bar(stat = "identity") + 
    coord_flip() + ggtitle("Frequency of fruit mentions") + labs(x = "Fruit", y = "n")

Scree Plots

Another way to visualize these data is through a scree plot. In these types of plots data are arranged such that each point represents a single item, the y-value for which is its frequency of mention in the dataset. These types of plots can be used to quickly identify trends and cut-off points in the data.

ggplot(ByFruit, aes(x = reorder(CODE, desc(n)), y = n, group = 1)) + geom_point() + 
    geom_line()

Frequency vs. rank

In freelists, we often expect the more highly ranked items to also show up more frequently. The code below plots the frequency of fruit mention against its average rank to explore this pattern.

# First, we make a new 'ByFruit' object that includes the fruit frequency, top
# rank, and average rank.
ByFruit <- FruitList %>% group_by(CODE) %>% summarise(Frequency = n(), topRank = min(Order), 
    avgRank = mean(Order))
ByFruit  # Look at our new object
# A tibble: 8 × 4
  CODE       Frequency topRank avgRank
  <fct>          <int>   <int>   <dbl>
1 apple             25       1    2.52
2 banana             8       1    2.25
3 lemon              5       2    3.2 
4 orange             7       1    3   
5 peach              2       1    1.5 
6 pear              10       1    2.8 
7 plum               9       1    2   
8 strawberry         9       1    2.44
# Plot these varables
plot(ByFruit$Frequency, ByFruit$avgRank)

With more observations there will often be a trend where the top ranking items show up most frequently. This is a toy dataset with only a few samples, so this may explaine why the trend doesn’t seem supported, but try this out with your own freelist data.

Salience calculations

One of the main insights than can be learned from freelist data is the relative cultural salience of items in particular domains. For example, in the domain of household chores, we could determine whether vacuuming, washing dishes, shoveling the driveway, or feeding the snake are considered more salient or central to the idea of household chores compared to other tasks. We might find variations based on individual attributes or the cultural context in which the question is asked. Salience often mirrors frequency, but the calculation is a bit more complicated as it considers both an item’s frequency of mention and the order in which it is usually listed.

Luckily, AnthroTools has a built in salience calculation function that can do the math for us. The code below calculates the salience of each fruit listed in the context of each individual interviewee.

FruitListSalience <- CalculateSalience(FruitList, Order = "Order", Subj = "Subj", 
    CODE = "CODE")
# Note: I have included the arguments for Order, Subj and CODE for illustrative
# purposes. Because these column names match the arguments, you do not actually
# need to include them in this case. However, when you are working with your own
# datasets you may have different column names, so it is helpful to keep the
# underlying structure of functions in mind when you deploy them.
FruitListSalience
    X Subj Order       CODE  Salience
1   1    1     1       pear 1.0000000
2   2    1     2     orange 0.6666667
3   3    1     3      apple 0.3333333
4   4    2     1      apple 1.0000000
5   5    2     2      apple 0.6666667
6   6    2     3 strawberry 0.3333333
7   7    3     1     banana 1.0000000
8   8    3     2 strawberry 0.5000000
9   9    4     1      apple 1.0000000
10 10    4     2       plum 0.8000000
11 11    4     3     banana 0.6000000
12 12    4     4     orange 0.4000000
13 13    4     5      apple 0.2000000
14 14    5     1 strawberry 1.0000000
15 15    5     2      apple 0.6666667
16 16    5     3      apple 0.3333333
17 17    6     1      apple 1.0000000
18 18    6     2      lemon 0.6666667
19 19    6     3      apple 0.3333333
20 20    7     1     banana 1.0000000
21 21    7     2       plum 0.7500000
22 22    7     3      lemon 0.5000000
23 23    7     4      lemon 0.2500000
24 24    8     1 strawberry 1.0000000
25 25    8     2      apple 0.7500000
26 26    8     3       pear 0.5000000
27 27    8     4      apple 0.2500000
28 28    9     1      apple 1.0000000
29 29    9     2      apple 0.6666667
30 30    9     3       plum 0.3333333
31 31   10     1      apple 1.0000000
32 32   10     2       pear 0.6666667
33 33   10     3      apple 0.3333333
34 34   11     1      apple 1.0000000
35 35   11     2 strawberry 0.8000000
36 36   11     3     banana 0.6000000
37 37   11     4      apple 0.4000000
38 38   11     5      lemon 0.2000000
39 39   12     1      apple 1.0000000
40 40   12     2     orange 0.8000000
41 41   12     3       pear 0.6000000
42 42   12     4      apple 0.4000000
43 43   12     5     orange 0.2000000
44 44   13     1       plum 1.0000000
45 45   13     2      peach 0.7500000
46 46   13     3 strawberry 0.5000000
47 47   13     4     banana 0.2500000
48 48   14     1       pear 1.0000000
49 49   14     2      apple 0.8000000
50 50   14     3     banana 0.6000000
51 51   14     4       pear 0.4000000
52 52   14     5 strawberry 0.2000000
53 53   15     1       plum 1.0000000
54 54   15     2 strawberry 0.6666667
55 55   15     3      apple 0.3333333
56 56   16     1     banana 1.0000000
57 57   16     2       plum 0.8000000
58 58   16     3       plum 0.6000000
59 59   16     4       pear 0.4000000
60 60   16     5       pear 0.2000000
61 61   17     1     orange 1.0000000
62 62   17     2      apple 0.5000000
63 63   18     1      peach 1.0000000
64 64   18     2     banana 0.8000000
65 65   18     3     orange 0.6000000
66 66   18     4       pear 0.4000000
67 67   18     5      apple 0.2000000
68 68   19     1       pear 1.0000000
69 69   19     2      lemon 0.6666667
70 70   19     3 strawberry 0.3333333
71 71   20     1       plum 1.0000000
72 72   20     2      apple 0.8000000
73 73   20     3       plum 0.6000000
74 74   20     4     orange 0.4000000
75 75   20     5      apple 0.2000000

The above code calculates the salience for each item by respondent. If you inspect the results, you’ll see that the first item in each list has a salience of 1, with each subsequent item decreasing in relative salience. This is useful for understanding how an individual thinks about the domain of fruits, but what if we are interested in knowing how salient apples are across all responses? We can calculate the salience of particular items as well with the SalienceByCode() function.

Try it

Try running the SalienceByCode() function on the new salience dataframe we made above.

Click for solution

SalienceByFruit <- SalienceByCode(FruitListSalience, dealWithDoubles = "MAX")
        CODE MeanSalience SumSalience   SmithsS
1       pear    0.6958333    5.566667 0.2783333
2     orange    0.6444444    3.866667 0.1933333
3      apple    0.7588889   11.383333 0.5691667
4 strawberry    0.5925926    5.333333 0.2666667
5     banana    0.7312500    5.850000 0.2925000
6       plum    0.8119048    5.683333 0.2841667
7      lemon    0.5083333    2.033333 0.1016667
8      peach    0.8750000    1.750000 0.0875000
# The dealwithdoubles argument tells R what to do if a respondent lists the same
# item twice. There are a few different options available for this, the right one
# to pick will depend on your data and research question.

From this analysis we can see that there are slight differences in the mean salience and Smith’s S. Smith’s S considers the length of lists in its calculation of salience (More info here.

Now let’s plot the Smith’s S results in decreasing order and add a vertical line at the 0.1 mark. This range is generally considered a benchmark level for assessing item salience in freelists.

ggplot(SalienceByFruit, aes(x = reorder(CODE, SmithsS), y = SmithsS)) + geom_bar(stat = "identity") + 
    coord_flip() + ggtitle("Fruit Salience") + labs(x = "Fruit", y = "Smith's S") + 
    geom_hline(yintercept = 0.1)

From this plot, it looks like most of the fruits could be considered salient in this dataset. However, although pears were mentioned slightly more frequently than bananas and plums, their salience is lower when their overall order within lists and other factors are taken into account.

Comparing across groups

It is also possible to compare item salience across different groups of respondents. The following example comes from the AnthroTools package which includes a sample grouping of the FruitList data. First, let’s load the new dataset. Because it is included in the AnthroTools package, we can load it directly with the data() function.

data("WorldList")
WorldList
    Subj Order       CODE GROUPING
1      1     1       plum MAINLAND
2      1     2 strawberry MAINLAND
3      1     3       pear MAINLAND
4      1     4      peach MAINLAND
5      2     1      apple MAINLAND
6      2     2     orange MAINLAND
7      2     3       pear MAINLAND
8      2     4       pear MAINLAND
9      3     1 strawberry MAINLAND
10     3     2      apple MAINLAND
11     3     3      apple MAINLAND
12     3     4      lemon MAINLAND
13     3     5     orange MAINLAND
14     4     1       plum MAINLAND
15     4     2      peach MAINLAND
16     5     1      lemon MAINLAND
17     5     2      lemon MAINLAND
18     5     3 strawberry MAINLAND
19     5     4       plum MAINLAND
20     6     1       plum MAINLAND
21     6     2       plum MAINLAND
22     6     3      apple MAINLAND
23     7     1      peach MAINLAND
24     7     2      apple MAINLAND
25     8     1     banana MAINLAND
26     8     2     orange MAINLAND
27     8     3      apple MAINLAND
28     9     1      apple MAINLAND
29     9     2     orange MAINLAND
30    10     1     banana MAINLAND
31    10     2      apple MAINLAND
32    10     3      apple MAINLAND
33    10     4      apple MAINLAND
34    10     5      apple MAINLAND
35    11     1      apple MAINLAND
36    11     2       pear MAINLAND
37    11     3 strawberry MAINLAND
38    11     4      apple MAINLAND
39    12     1     orange MAINLAND
40    12     2       plum MAINLAND
41    12     3     orange MAINLAND
42    12     4      apple MAINLAND
43    13     1      apple MAINLAND
44    13     2       pear MAINLAND
45    13     3      apple MAINLAND
46    14     1       plum MAINLAND
47    14     2      peach MAINLAND
48    14     3      apple MAINLAND
49    14     4      apple MAINLAND
50    15     1      apple MAINLAND
51    15     2      peach MAINLAND
52    15     3      apple MAINLAND
53    16     1      apple MAINLAND
54    16     2     banana MAINLAND
55    16     3      apple MAINLAND
56    17     1     orange MAINLAND
57    17     2       pear MAINLAND
58     1     1     banana   ISLAND
59     1     2      apple   ISLAND
60     1     3       pear   ISLAND
61     2     1      lemon   ISLAND
62     2     2       plum   ISLAND
63     2     3 strawberry   ISLAND
64     3     1      lemon   ISLAND
65     3     2      apple   ISLAND
66     3     3 strawberry   ISLAND
67     4     1       plum   ISLAND
68     4     2     banana   ISLAND
69     5     1      peach   ISLAND
70     5     2       plum   ISLAND
71     5     3     banana   ISLAND
72     5     4      lemon   ISLAND
73     5     5      apple   ISLAND
74     6     1 strawberry   ISLAND
75     6     2       plum   ISLAND
76     6     3     banana   ISLAND
77     6     4      apple   ISLAND
78     6     5       pear   ISLAND
79     7     1      apple   ISLAND
80     7     2       plum   ISLAND
81     7     3       pear   ISLAND
82     8     1       plum   ISLAND
83     8     2      peach   ISLAND
84     8     3     banana   ISLAND
85     8     4      peach   ISLAND
86     9     1       pear   ISLAND
87     9     2     banana   ISLAND
88     9     3 strawberry   ISLAND
89    10     1      peach   ISLAND
90    10     2      apple   ISLAND
91    10     3      apple   ISLAND
92    10     4     orange   ISLAND
93    11     1     banana   ISLAND
94    11     2      apple   ISLAND
95    11     3      apple   ISLAND
96    11     4      apple   ISLAND
97    12     1       plum   ISLAND
98    12     2      apple   ISLAND
99    12     3      peach   ISLAND
100   13     1       pear   ISLAND
101   13     2      apple   ISLAND
102   13     3     banana   ISLAND
103   13     4       plum   ISLAND
104    1     1      peach     MOON
105    1     2       plum     MOON
106    2     1      apple     MOON
107    2     2 strawberry     MOON
108    2     3       plum     MOON
109    2     4       pear     MOON
110    3     1     banana     MOON
111    3     2       plum     MOON
112    3     3      lemon     MOON
113    3     4       pear     MOON
114    4     1      apple     MOON
115    4     2      apple     MOON
116    4     3 strawberry     MOON
117    5     1      apple     MOON
118    5     2      apple     MOON
119    5     3     banana     MOON
120    5     4     banana     MOON
121    5     5     orange     MOON
122    6     1      lemon     MOON
123    6     2     orange     MOON
124    6     3      apple     MOON
125    6     4       pear     MOON
126    6     5      peach     MOON
127    7     1      apple     MOON
128    7     2       pear     MOON
129    8     1      apple     MOON
130    8     2      lemon     MOON
131    9     1      apple     MOON
132    9     2     banana     MOON
133    9     3      lemon     MOON
134    9     4      apple     MOON
135    9     5      lemon     MOON
136   10     1      peach     MOON
137   10     2      apple     MOON
138   10     3 strawberry     MOON
139   11     1      peach     MOON
140   11     2      apple     MOON
141   12     1      apple     MOON
142   12     2      apple     MOON
143   13     1      apple     MOON
144   13     2      apple     MOON
145   13     3      apple     MOON
146   13     4      apple     MOON
147   13     5      peach     MOON
148   14     1      lemon     MOON
149   14     2      apple     MOON

Try it

Try calculating the salience for each fruit, adding in an argument to differentiate responses by GROUPING. The argument is conveniently called GROUPING. First calculate the salience of each item by response, then calculate the salience of each item. Refer to the code we ran earlier in this lesson as a template to write your solution.

Click for solution

FL1 <- CalculateSalience(WorldList, Order = "Order", Subj = "Subj", CODE = "CODE", 
    GROUPING = "GROUPING")
# Note, this function will not run without adding in the GROUPING argument.
FL2 <- SalienceByCode(FL1, GROUPING = "GROUPING", dealWithDoubles = "MAX")
[1] "plum"
[1] "strawberry"
[1] "pear"
[1] "peach"
[1] "apple"
[1] "orange"
[1] "lemon"
[1] "banana"
[1] "plum"
[1] "strawberry"
[1] "pear"
[1] "peach"
[1] "apple"
[1] "orange"
[1] "lemon"
[1] "banana"
[1] "plum"
[1] "strawberry"
[1] "pear"
[1] "peach"
[1] "apple"
[1] "orange"
[1] "lemon"
[1] "banana"
   GROUPING       CODE MeanSalience SumSalience    SmithsS
1  MAINLAND       plum    0.8333333    5.000000 0.29411765
2    ISLAND       plum    0.7729167    6.183333 0.47564103
3      MOON       plum    0.5833333    1.750000 0.12500000
4  MAINLAND strawberry    0.6875000    2.750000 0.16176471
5    ISLAND strawberry    0.5000000    2.000000 0.15384615
6      MOON strawberry    0.4722222    1.416667 0.10119048
7  MAINLAND       pear    0.5833333    2.916667 0.17156863
8    ISLAND       pear    0.5733333    2.866667 0.22051282
9      MOON       pear    0.3500000    1.400000 0.10000000
10 MAINLAND      peach    0.6333333    3.166667 0.18627451
11   ISLAND      peach    0.7708333    3.083333 0.23717949
12     MOON      peach    0.6800000    3.400000 0.24285714
13 MAINLAND      apple    0.7320513    9.516667 0.55980392
14   ISLAND      apple    0.6500000    5.850000 0.45000000
15     MOON      apple    0.8555556   10.266667 0.73333333
16 MAINLAND     orange    0.6861111    4.116667 0.24215686
17   ISLAND     orange    0.2500000    0.250000 0.01923077
18     MOON     orange    0.5000000    1.000000 0.07142857
19 MAINLAND      lemon    0.7000000    1.400000 0.08235294
20   ISLAND      lemon    0.8000000    2.400000 0.18461538
21     MOON      lemon    0.7200000    3.600000 0.25714286
22 MAINLAND     banana    0.8888889    2.666667 0.15686275
23   ISLAND     banana    0.6708333    5.366667 0.41282051
24     MOON     banana    0.8000000    2.400000 0.17142857

Frequency by group

In addition to evaluating salience, we might be interested in the percentage of respondents from each group who named a particular item. The code below creates a new object where the data are grouped according to the GROUPING variable.

frequencybygroup <- WorldList %>% group_by(GROUPING) %>% mutate(GroupN = length(unique(Subj))) %>% 
    ungroup %>% group_by(GROUPING, CODE, GroupN) %>% summarise(totalResponses = n(), 
    nRespondents = length(unique(Subj)), percentRespondents = round(length(unique(Subj))/first(GroupN) * 
        100, 2)) %>% arrange(GROUPING, desc(percentRespondents))

We can then plot the results of this grouping with a facet wrap graph.

ggplot(frequencybygroup, aes(x = reorder(CODE, desc(percentRespondents)), percentRespondents)) + 
    geom_bar(stat = "identity") + coord_flip() + ggtitle("Frequency of fruit mentions by site") + 
    labs(x = "Fruit", y = "n") + facet_wrap(vars(GROUPING))

It looks like in all cases, apples were mentioned more frequently, but the different sample populations differ in their rates of mentioning oranges and bananas. One thing you will notice in this graph is that although the fruits are listed in descending order for the “ISLAND” sample, the order is not meaningful for the other sites. This is because in the faceted plot, the y axis is the same for each site, making it easier to compare across samples.

Comparing freelist lengths across groups

We can also compare the lengths of lists across each group. This shows us that the average list lengths were not very different across the groups, but were slightly higher in the Island population.

WorldList %>% group_by(GROUPING, Subj) %>% summarise(n = n()) %>% ungroup %>% group_by(GROUPING) %>% 
    summarize(nResponse = n(), avgLength = mean(n), maxLength = max(n), minLength = min(n), 
        medianLength = median(n))
# A tibble: 3 × 6
  GROUPING nResponse avgLength maxLength minLength medianLength
  <chr>        <int>     <dbl>     <int>     <int>        <dbl>
1 ISLAND          13      3.54         5         2            3
2 MAINLAND        17      3.35         5         2            3
3 MOON            14      3.29         5         2            3

Advanced Freelist Analysis

Co-occurence tables

Beyond investigating how many fruits are listed how frequently within the fruit domain, we are also interested in learning more about the structure of this domain. We can think of this as a type of mental map, how do people think about each item within the fruit domain relative to the other items?

One way to investigate this question is through examining co-occurence of items within the same freelist. Essentially, how many times is each pair of fruits mentioned by the same respondent? We can create a co-occurence matrix with tidyverse tools or the AnthroTools package.

Try it

Using FreeListTable() make a table indicating whether or not each respondent mentioned a particular fruit. Hint: This is a presence/absence table.

Click for solution

FruitListTable <- FreeListTable(FruitList, CODE = "CODE", Salience = "Salience", 
    Subj = "Subj", tableType = "PRESENCE")

You can do the same operation with tidyverse.

# add new count column
FruitList$present <- rep(1)
# Spread into wide datatable. Note: spread function requires unique identifiers
# for rows, so here we remove any duplicate rows
FruitListWide <- FruitList %>% select(Subj, CODE, present) %>% unique %>% spread(CODE, 
    present)
# convert NAs to 0
FruitListWide[is.na(FruitListWide)] <- 0
FruitListWide
   Subj apple banana lemon orange peach pear plum strawberry
1     1     1      0     0      1     0    1    0          0
2     2     1      0     0      0     0    0    0          1
3     3     0      1     0      0     0    0    0          1
4     4     1      1     0      1     0    0    1          0
5     5     1      0     0      0     0    0    0          1
6     6     1      0     1      0     0    0    0          0
7     7     0      1     1      0     0    0    1          0
8     8     1      0     0      0     0    1    0          1
9     9     1      0     0      0     0    0    1          0
10   10     1      0     0      0     0    1    0          0
11   11     1      1     1      0     0    0    0          1
12   12     1      0     0      1     0    1    0          0
13   13     0      1     0      0     1    0    1          1
14   14     1      1     0      0     0    1    0          1
15   15     1      0     0      0     0    0    1          1
16   16     0      1     0      0     0    1    1          0
17   17     1      0     0      1     0    0    0          0
18   18     1      1     0      1     1    1    0          0
19   19     0      0     1      0     0    1    0          1
20   20     1      0     0      1     0    0    1          0

In this case, we have made a presence/absence matrix. That is, do the two items co-occur or not. You could also create a matrix that is weighted based on the number of co-occurences. the co-occurence matrix can now be used for clustering, MDS, network analysis and other analyses.

Multidimensional scaling

With the matrix we made above, we can analyze how similar or dissimilar each fruit it from one another using multidimensional scaling (MDS). The data manipulations can be a little tricky, but I’ve included the code below to get you started.

# Here we use the FruitListWide dataframe we made above.
## Convert dataframe into presence/absence only by removing the first column. First we can assign the Subj column to the rownames
rownames(FruitListWide) <- FruitListWide$Subj
# Remove the subj column
FruitListWide<-FruitListWide[,-1]
# Now convert datasetbyword matrix to a wordbyword co-occurance matrix
FruitListWide <- as.matrix(FruitListWide)
FruitsBySubj <- t(FruitListWide)
# Look at data structure
FruitsBySubj
           1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
apple      1 1 0 1 1 1 0 1 1  1  1  1  0  1  1  0  1  1  0  1
banana     0 0 1 1 0 0 1 0 0  0  1  0  1  1  0  1  0  1  0  0
lemon      0 0 0 0 0 1 1 0 0  0  1  0  0  0  0  0  0  0  1  0
orange     1 0 0 1 0 0 0 0 0  0  0  1  0  0  0  0  1  1  0  1
peach      0 0 0 0 0 0 0 0 0  0  0  0  1  0  0  0  0  1  0  0
pear       1 0 0 0 0 0 0 1 0  1  0  1  0  1  0  1  0  1  1  0
plum       0 0 0 1 0 0 1 0 1  0  0  0  1  0  1  1  0  0  0  1
strawberry 0 1 1 0 1 0 0 1 0  0  1  0  1  1  1  0  0  0  1  0
FruitsByFruits <- crossprod(FruitListWide)
## Make a distance matrix
FruitDistanceMatrix <- cmdscale(dist(FruitsBySubj))
FruitDistanceMatrixDF<- data.frame(FruitDistanceMatrix)
## Using binary mds method
FruitsBySubj %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test2
test3 <- data.frame(test2$points) %>% #mds coordinates
  bind_cols(Fruits = rownames(FruitsBySubj)) %>% #bind sample names
  bind_cols(count=rowSums(FruitsBySubj)) #bind count by each fruit
# Look at results
test3
                    X1         X2     Fruits count
apple      -0.20005674  0.2972494      apple    15
banana      0.16504543 -0.1902441     banana     8
lemon       0.44547450  0.1667202      lemon     4
orange     -0.48782894  0.0130155     orange     6
peach      -0.06579848 -0.4850107      peach     2
pear       -0.20494625  0.2566384       pear     8
plum        0.05280546 -0.2509947       plum     7
strawberry  0.29530502  0.1926259 strawberry     9
## Plot results
ggplot(test3,aes(x = X1,y = X2, label = Fruits)) +geom_text() + geom_point(aes(size=count),alpha=0.4,color="blue")+labs(size="Count",x="Dimension 1", y="Dimension 2") + ggtitle("MDS of FruitList")

This MDS shows the fruits by reponses, showing us how the fruits are mentioned by different types of respondents. We might also be interested in an MDS of the fruits by fruits co-occurence, which can be calculated through the code below.

## Using binary mds method
FruitsByFruits %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test4
test5 <- data.frame(test4$points) %>% #mds coordinates
  bind_cols(Fruits = rownames(FruitsByFruits))#bind sample names
# Look at results
#test5
## Plot results
ggplot(test5,aes(x = X1,y = X2, label = Fruits)) +geom_text(fontface="bold",size=2.5,alpha=0.6,position=position_jitter(width=0.05, height=0.005)) + geom_point(size=NA)+labs(x="Dimension 1", y="Dimension 2") + ggtitle("MDS of FruitbyFruit")

This plot shows us which fruits are more central to the domain “fruits” across all the respondents and which are more peripheral. The fruits in the center of the plot co-occur more frequently since they are in more of the freelists in the sample. Borgatti has a useful presentation explaining more.

Pile Sorts

Data wrangling

In addition to freelist data, pile sorts are a common tool used in cultural domain analysis. I’ve made a sample dataset of pilesorts of 40 trees from 9 individuals. You can load the data in the same way we loaded the fruitlist data. We are doing a bit of data wrangling to get the data into the right format for MDS scaling. Hopefully this helps you understand a bit about how data can be shaped and reshaped no matter what form it comes to you in!

treepilesort <- read.csv("https://maddiebrown.github.io/ethnoecology/pilesort_sampledata.csv")
rownames(treepilesort) <- treepilesort$Tree
# Remove tree column
treepilesort <- treepilesort[, -1]
# Flip so rows are individual responses
PersonbyTree <- t(treepilesort)
PersonbyTree <- as.data.frame(PersonbyTree)  # make the output into a dataframe

Try it

  1. Currently the row names correspond to each respondent code (letters from A to I). Append these rownames to the dataframe as a new column named “Person”
  2. Convert the dataframe from wide to long format, where each row conveys which respondent placed which fruit into which pile. Hint: if using gather() you might need a third argument maple:coconut. You might also consider using pivot_longer().

Click for solution

# Append rowname to pile number such that each pile has a unique name
PersonbyTree$Person <- rownames(PersonbyTree)
# Convert to long format
PersonbyTreeLong <- PersonbyTree %>% gather(Species, Pile, maple:coconut)
# another way with tidyverse PersonbyTree %>% pivot_longer(!Person, names_to =
# 'Species', values_to = 'Pile') PersonbyTree %>%
# pivot_longer(-Tree,names_to='Species',values_to='Pile') #Another way to do this
# with new tidyverse

Now that our data are in a long format we can add in a new unique identifier for each person_pile combination. This step is required because the piles have numeric names that repeat across respondents. Next we can make a presence/absence column that will be useful for creating a new matrix based on whether or not a tree occurs in a particular pile.

# Add new column which is person ID and pile ID combined together
PersonbyTreeLong <- PersonbyTreeLong %>% mutate(PersonPile = paste(Person, Pile, 
    sep = "_"))
PersonbyTreeLong$present <- rep(1)

Now we can convert the data back into a wide format, where each column is a different tree species and the values represent presence or absence of the tree.

# Spread into wide datatable. Note: spread function requires unique identifiers
# for rows, so here we remove any duplicate rows
TreeWide <- PersonbyTreeLong %>% select(PersonPile, Species, present) %>% unique %>% 
    spread(Species, present)
# Convert NAs to 0
TreeWide[is.na(TreeWide)] <- 0
head(TreeWide)
  PersonPile alder apple aspen banana beech birch black walnut cedar chestnut
1        A_1     1     0     1      0     1     1            0     1        0
2        A_2     0     1     0      1     0     0            0     0        0
3        A_3     0     0     0      0     0     0            1     0        1
4        B_1     1     1     1      0     1     1            1     0        1
5        B_2     0     0     0      1     0     0            0     0        0
6        B_3     0     0     0      0     0     0            0     1        0
  coconut dawn redwood douglas fir gingko horse chestnut japanese maple
1       0            1           1      1              0              1
2       1            0           0      0              0              0
3       0            0           0      0              1              0
4       0            0           0      1              1              1
5       1            0           0      0              0              0
6       0            1           1      0              0              0
  laurel oak live oak lodgepole pine longleaf pine mango maple oak orange palm
1          1        1              1             1     0     1   1      0    1
2          0        0              0             0     1     0   0      1    0
3          0        0              0             0     0     0   0      0    0
4          1        1              0             0     1     1   1      1    0
5          0        0              0             0     0     0   0      0    1
6          0        0              1             1     0     0   0      0    0
  pear pecan pine ponderosa pine red oak redwood sequioa silver maple spruce
1    0     0    1              1       1       1       1            1      1
2    1     0    0              0       0       0       0            0      0
3    0     1    0              0       0       0       0            0      0
4    1     1    0              0       1       0       0            1      0
5    0     0    0              0       0       0       0            0      0
6    0     0    1              1       0       1       1            0      1
  sugar maple sycamore walnut white oak white pine
1           1        1      0         1          1
2           0        0      0         0          0
3           0        0      1         0          0
4           1        1      1         1          0
5           0        0      0         0          0
6           0        0      0         0          1

This dataframe looks great. The multidimensional scaling functions in R take a matrix however, so we will convert the dataframe into two different matrices following the same procedure we used for the fruit data.

## Convert dataframe into presenve/absence only by removing the first column.
## First we can assign the Subj column to the rownames
rownames(TreeWide) <- TreeWide$PersonPile
# Remove the subj column
TreeWide <- TreeWide[, -1]
# Now convert datasetbyword matrix to a wordbyword co-occurance matrix
TreeWide <- as.matrix(TreeWide)
TreeByPile <- t(TreeWide)
# Look at data structure
TreeByPile
               A_1 A_2 A_3 B_1 B_2 B_3 C_1 C_2 C_3 C_4 D_1 D_2 D_3 D_4 D_5 D_6
alder            1   0   0   1   0   0   0   0   1   0   0   0   0   0   1   0
apple            0   1   0   1   0   0   1   0   0   0   0   0   1   0   0   0
aspen            1   0   0   1   0   0   0   0   1   0   0   0   0   0   1   0
banana           0   1   0   0   1   0   1   0   0   0   0   0   0   1   0   0
beech            1   0   0   1   0   0   0   0   1   0   0   0   0   0   1   0
birch            1   0   0   1   0   0   0   0   1   0   0   0   0   0   1   0
black walnut     0   0   1   1   0   0   0   1   0   0   0   0   0   0   0   1
cedar            1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1
chestnut         0   0   1   1   0   0   0   1   0   0   0   0   0   0   0   1
coconut          0   1   0   0   1   0   1   0   0   0   0   0   0   1   0   0
dawn redwood     1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1
douglas fir      1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1
gingko           1   0   0   1   0   0   0   0   1   0   0   0   0   0   0   0
horse chestnut   0   0   1   1   0   0   0   1   0   0   0   0   0   0   0   1
japanese maple   1   0   0   1   0   0   0   0   1   0   1   0   0   0   0   0
laurel oak       1   0   0   1   0   0   0   0   1   0   0   1   0   0   0   0
live oak         1   0   0   1   0   0   0   0   1   0   0   1   0   0   0   0
lodgepole pine   1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0
longleaf pine    1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0
mango            0   1   0   1   0   0   1   0   0   0   0   0   0   1   0   0
maple            1   0   0   1   0   0   0   0   1   0   1   0   0   0   0   0
oak              1   0   0   1   0   0   0   0   1   0   0   1   0   0   0   0
orange           0   1   0   1   0   0   1   0   0   0   0   0   1   0   0   0
palm             1   0   0   0   1   0   1   0   0   0   0   0   0   1   0   0
pear             0   1   0   1   0   0   1   0   0   0   0   0   1   0   0   0
pecan            0   0   1   1   0   0   0   1   0   0   0   0   0   0   0   1
pine             1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0
ponderosa pine   1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0
red oak          1   0   0   1   0   0   0   0   1   0   0   1   0   0   0   0
redwood          1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1
sequioa          1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1
silver maple     1   0   0   1   0   0   0   0   1   0   1   0   0   0   0   0
spruce           1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1
sugar maple      1   0   0   1   0   0   0   0   1   0   1   0   0   0   0   0
sycamore         1   0   0   1   0   0   0   0   1   0   0   0   0   0   1   0
walnut           0   0   1   1   0   0   0   1   0   0   0   0   0   0   0   1
white oak        1   0   0   1   0   0   0   0   1   0   0   1   0   0   0   0
white pine       1   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0
               D_7 D_8 E_1 E_10 E_2 E_3 E_4 E_5 E_6 E_7 E_8 E_9 F_1 F_10 F_11
alder            0   0   0    0   0   0   0   0   0   0   0   1   0    0    0
apple            0   0   0    1   0   0   0   0   0   0   0   0   0    0    0
aspen            0   0   0    0   0   0   0   0   0   0   0   1   0    0    0
banana           0   0   1    0   0   0   0   0   0   0   0   0   0    0    0
beech            0   0   0    0   0   0   0   0   0   0   0   1   0    0    0
birch            0   0   0    0   0   0   0   0   0   0   0   1   0    0    0
black walnut     0   0   0    0   1   0   0   0   0   0   0   0   0    0    0
cedar            0   0   0    0   0   0   0   1   0   0   0   0   0    0    0
chestnut         0   0   0    0   1   0   0   0   0   0   0   0   0    0    0
coconut          0   0   1    0   0   0   0   0   0   0   0   0   0    0    0
dawn redwood     0   0   0    0   0   0   0   0   1   0   0   0   0    0    1
douglas fir      0   0   0    0   0   0   0   1   0   0   0   0   0    0    0
gingko           0   1   0    0   0   0   1   0   0   0   0   0   0    0    0
horse chestnut   0   0   0    0   1   0   0   0   0   0   0   0   0    0    0
japanese maple   0   0   0    0   0   0   0   0   0   1   0   0   1    0    0
laurel oak       0   0   0    0   0   0   0   0   0   0   1   0   0    0    0
live oak         0   0   0    0   0   0   0   0   0   0   1   0   0    0    0
lodgepole pine   1   0   0    0   0   0   0   1   0   0   0   0   0    0    0
longleaf pine    1   0   0    0   0   0   0   1   0   0   0   0   0    0    0
mango            0   0   1    0   0   0   0   0   0   0   0   0   0    1    0
maple            0   0   0    0   0   0   0   0   0   1   0   0   1    0    0
oak              0   0   0    0   0   0   0   0   0   0   1   0   0    0    0
orange           0   0   0    1   0   0   0   0   0   0   0   0   0    0    0
palm             0   0   1    0   0   0   0   0   0   0   0   0   0    0    0
pear             0   0   0    1   0   0   0   0   0   0   0   0   0    0    0
pecan            0   0   0    0   1   0   0   0   0   0   0   0   0    0    0
pine             1   0   0    0   0   0   0   1   0   0   0   0   0    0    0
ponderosa pine   1   0   0    0   0   0   0   1   0   0   0   0   0    0    0
red oak          0   0   0    0   0   0   0   0   0   0   1   0   0    0    0
redwood          0   0   0    0   0   0   0   0   1   0   0   0   0    0    1
sequioa          0   0   0    0   0   0   0   0   1   0   0   0   0    0    1
silver maple     0   0   0    0   0   0   0   0   0   1   0   0   1    0    0
spruce           0   0   0    0   0   0   0   1   0   0   0   0   0    0    0
sugar maple      0   0   0    0   0   0   0   0   0   1   0   0   1    0    0
sycamore         0   0   0    0   0   1   0   0   0   0   0   0   0    0    0
walnut           0   0   0    0   1   0   0   0   0   0   0   0   0    0    0
white oak        0   0   0    0   0   0   0   0   0   0   1   0   0    0    0
white pine       1   0   0    0   0   0   0   1   0   0   0   0   0    0    0
               F_12 F_13 F_14 F_15 F_16 F_17 F_18 F_19 F_2 F_20 F_3 F_4 F_5 F_6
alder             0    0    0    0    0    0    1    0   0    0   0   0   0   0
apple             0    0    0    0    0    0    0    0   0    0   1   0   0   0
aspen             0    0    0    0    0    0    0    1   0    0   0   0   0   0
banana            0    0    0    0    0    0    0    0   0    0   0   0   0   0
beech             0    0    0    0    0    0    0    0   0    1   0   0   0   0
birch             0    0    0    0    0    0    0    0   0    0   0   0   0   1
black walnut      0    0    0    0    0    0    0    0   0    0   0   0   0   0
cedar             0    1    0    0    0    0    0    0   0    0   0   0   0   0
chestnut          0    0    0    0    0    0    0    0   0    0   0   0   0   0
coconut           0    0    0    0    0    0    0    0   0    0   0   0   1   0
dawn redwood      0    0    0    0    0    0    0    0   0    0   0   0   0   0
douglas fir       0    0    1    0    0    0    0    0   0    0   0   0   0   0
gingko            0    0    0    1    0    0    0    0   0    0   0   0   0   0
horse chestnut    0    0    0    0    0    0    0    0   0    0   0   0   0   0
japanese maple    0    0    0    0    0    0    0    0   0    0   0   0   0   0
laurel oak        0    0    0    0    0    0    0    0   1    0   0   0   0   0
live oak          0    0    0    0    0    0    0    0   1    0   0   0   0   0
lodgepole pine    1    0    0    0    0    0    0    0   0    0   0   0   0   0
longleaf pine     1    0    0    0    0    0    0    0   0    0   0   0   0   0
mango             0    0    0    0    0    0    0    0   0    0   0   0   0   0
maple             0    0    0    0    0    0    0    0   0    0   0   0   0   0
oak               0    0    0    0    0    0    0    0   1    0   0   0   0   0
orange            0    0    0    0    0    0    0    0   0    0   0   1   0   0
palm              0    0    0    0    0    0    0    0   0    0   0   0   1   0
pear              0    0    0    0    0    0    0    0   0    0   1   0   0   0
pecan             0    0    0    0    0    1    0    0   0    0   0   0   0   0
pine              1    0    0    0    0    0    0    0   0    0   0   0   0   0
ponderosa pine    1    0    0    0    0    0    0    0   0    0   0   0   0   0
red oak           0    0    0    0    0    0    0    0   1    0   0   0   0   0
redwood           0    0    0    0    0    0    0    0   0    0   0   0   0   0
sequioa           0    0    0    0    0    0    0    0   0    0   0   0   0   0
silver maple      0    0    0    0    0    0    0    0   0    0   0   0   0   0
spruce            0    1    0    0    0    0    0    0   0    0   0   0   0   0
sugar maple       0    0    0    0    0    0    0    0   0    0   0   0   0   0
sycamore          0    0    0    0    1    0    0    0   0    0   0   0   0   0
walnut            0    0    0    0    0    0    0    0   0    0   0   0   0   0
white oak         0    0    0    0    0    0    0    0   1    0   0   0   0   0
white pine        1    0    0    0    0    0    0    0   0    0   0   0   0   0
               F_7 F_8 F_9 G_1 G_2 H_1 H_10 H_2 H_3 H_4 H_5 H_6 H_7 H_8 H_9 I_1
alder            0   0   0   0   1   0    0   0   0   0   1   0   0   0   0   0
apple            0   0   0   1   0   0    1   0   0   0   0   0   0   0   0   0
aspen            0   0   0   0   1   0    0   0   0   0   1   0   0   0   0   0
banana           0   0   1   1   0   0    0   0   0   0   0   1   0   0   0   0
beech            0   0   0   0   1   0    0   0   0   0   1   0   0   0   0   0
birch            0   0   0   1   0   0    0   0   0   0   1   0   0   0   0   0
black walnut     1   0   0   1   0   0    0   0   0   0   0   0   1   0   0   0
cedar            0   0   0   0   1   0    0   0   1   0   0   0   0   0   0   0
chestnut         0   1   0   1   0   0    0   0   0   0   0   0   1   0   0   0
coconut          0   0   0   1   0   0    0   0   0   0   0   1   0   0   0   0
dawn redwood     0   0   0   0   1   1    0   0   0   0   0   0   0   0   0   0
douglas fir      0   0   0   0   1   0    0   0   1   0   0   0   0   0   0   0
gingko           0   0   0   1   0   0    0   0   0   1   0   0   0   0   0   0
horse chestnut   0   1   0   0   1   0    0   0   0   0   1   0   0   0   0   0
japanese maple   0   0   0   0   1   0    0   0   0   0   0   0   0   1   0   1
laurel oak       0   0   0   1   0   0    0   0   0   0   0   0   0   0   1   0
live oak         0   0   0   1   0   0    0   0   0   0   0   0   0   0   1   0
lodgepole pine   0   0   0   0   1   0    0   1   0   0   0   0   0   0   0   0
longleaf pine    0   0   0   0   1   0    0   1   0   0   0   0   0   0   0   0
mango            0   0   0   1   0   0    0   0   0   0   0   1   0   0   0   0
maple            0   0   0   1   0   0    0   0   0   0   0   0   0   1   0   1
oak              0   0   0   1   0   0    0   0   0   0   0   0   0   0   1   0
orange           0   0   0   1   0   0    1   0   0   0   0   0   0   0   0   0
palm             0   0   0   1   0   0    0   0   0   0   0   1   0   0   0   0
pear             0   0   0   1   0   0    1   0   0   0   0   0   0   0   0   0
pecan            0   0   0   1   0   0    0   0   0   0   0   0   1   0   0   0
pine             0   0   0   0   1   0    0   1   0   0   0   0   0   0   0   0
ponderosa pine   0   0   0   0   1   0    0   1   0   0   0   0   0   0   0   0
red oak          0   0   0   1   0   0    0   0   0   0   0   0   0   0   1   0
redwood          0   0   0   0   1   1    0   0   0   0   0   0   0   0   0   0
sequioa          0   0   0   0   1   1    0   0   0   0   0   0   0   0   0   0
silver maple     0   0   0   0   1   0    0   0   0   0   0   0   0   1   0   1
spruce           0   0   0   0   1   0    0   0   1   0   0   0   0   0   0   0
sugar maple      0   0   0   1   0   0    0   0   0   0   0   0   0   1   0   1
sycamore         0   0   0   0   1   0    0   0   0   0   1   0   0   0   0   0
walnut           1   0   0   1   0   0    0   0   0   0   0   0   1   0   0   0
white oak        0   0   0   1   0   0    0   0   0   0   0   0   0   0   1   0
white pine       0   0   0   0   1   0    0   1   0   0   0   0   0   0   0   0
               I_10 I_11 I_12 I_13 I_2 I_3 I_4 I_5 I_6 I_7 I_8 I_9
alder             0    0    0    0   0   0   0   0   1   0   0   0
apple             0    0    0    0   0   1   0   0   0   0   0   0
aspen             0    0    0    0   0   0   0   1   0   0   0   0
banana            0    0    0    0   0   1   0   0   0   0   0   0
beech             0    0    0    0   0   0   0   1   0   0   0   0
birch             0    0    0    0   0   0   0   1   0   0   0   0
black walnut      0    0    0    0   0   0   0   0   0   1   0   0
cedar             1    0    0    0   0   0   0   0   0   0   0   0
chestnut          0    0    0    0   0   0   0   0   0   1   0   0
coconut           0    0    0    0   0   1   0   0   0   0   0   0
dawn redwood      0    0    0    0   0   0   0   0   0   0   1   0
douglas fir       0    0    1    0   0   0   0   0   0   0   0   0
gingko            0    0    0    1   0   0   0   0   0   0   0   0
horse chestnut    0    0    0    0   0   0   0   0   1   0   0   0
japanese maple    0    0    0    0   0   0   0   0   0   0   0   0
laurel oak        0    0    0    0   1   0   0   0   0   0   0   0
live oak          0    0    0    0   1   0   0   0   0   0   0   0
lodgepole pine    1    0    0    0   0   0   0   0   0   0   0   0
longleaf pine     1    0    0    0   0   0   0   0   0   0   0   0
mango             0    0    0    0   0   1   0   0   0   0   0   0
maple             0    0    0    0   0   0   0   0   0   0   0   0
oak               0    0    0    0   1   0   0   0   0   0   0   0
orange            0    0    0    0   0   1   0   0   0   0   0   0
palm              0    0    0    0   0   0   1   0   0   0   0   0
pear              0    0    0    0   0   1   0   0   0   0   0   0
pecan             0    0    0    0   0   0   0   0   0   1   0   0
pine              0    0    0    0   0   0   0   0   0   0   0   1
ponderosa pine    1    0    0    0   0   0   0   0   0   0   0   0
red oak           0    0    0    0   1   0   0   0   0   0   0   0
redwood           0    0    0    0   0   0   0   0   0   0   1   0
sequioa           0    0    0    0   0   0   0   0   0   0   1   0
silver maple      0    0    0    0   0   0   0   0   0   0   0   0
spruce            0    1    0    0   0   0   0   0   0   0   0   0
sugar maple       0    0    0    0   0   0   0   0   0   0   0   0
sycamore          0    0    0    0   0   0   0   0   1   0   0   0
walnut            0    0    0    0   0   0   0   0   0   1   0   0
white oak         0    0    0    0   1   0   0   0   0   0   0   0
white pine        1    0    0    0   0   0   0   0   0   0   0   0

Multidimensional scaling

Now that our data are in the proper format, we can examine them with MDS.

#FIRST looking at tree by pile
##distance matrix
TreeByPileMatrix <- cmdscale(dist(TreeByPile))
TreeByPileMatrixDF<- data.frame(TreeByPileMatrix)
##using binary mds method
TreeByPile %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test2
test3 <- data.frame(test2$points) %>% #mds coordinates
  bind_cols(Trees= rownames(TreeByPile)) %>% #bind sample names
  bind_cols(count=rowSums(TreeByPile)) #bind count by each fruit
# Look at results
test3
                         X1          X2          Trees count
alder          -0.013320666  0.30415761          alder     9
apple          -0.310765217 -0.32878474          apple     9
aspen          -0.015764792  0.31227594          aspen     9
banana         -0.249543392 -0.44506073         banana     9
beech          -0.015764792  0.31227594          beech     9
birch          -0.182995445  0.26334014          birch     9
black walnut   -0.220274963 -0.22711366   black walnut     9
cedar           0.470627266 -0.07922739          cedar     9
chestnut       -0.219721484 -0.22750602       chestnut     9
coconut        -0.252133376 -0.45172301        coconut     9
dawn redwood    0.388356463 -0.06777733   dawn redwood     9
douglas fir     0.427206297 -0.07215080    douglas fir     9
gingko         -0.178681232  0.14655505         gingko     9
horse chestnut -0.038272499 -0.08504713 horse chestnut     9
japanese maple -0.038470051  0.31837162 japanese maple     9
laurel oak     -0.259555150  0.28057415     laurel oak     9
live oak       -0.259555150  0.28057415       live oak     9
lodgepole pine  0.506667722 -0.05071981 lodgepole pine     9
longleaf pine   0.506667722 -0.05071981  longleaf pine     9
mango          -0.314808362 -0.37597848          mango     9
maple          -0.201419337  0.26797441          maple     9
oak            -0.259555150  0.28057415            oak     9
orange         -0.308994465 -0.32601178         orange     9
palm           -0.154384024 -0.26830110           palm     9
pear           -0.310765217 -0.32878474           pear     9
pecan          -0.219019827 -0.22519819          pecan     9
pine            0.489378588 -0.04728660           pine     9
ponderosa pine  0.506667722 -0.05071981 ponderosa pine     9
red oak        -0.259555150  0.28057415        red oak     9
redwood         0.388356463 -0.06777733        redwood     9
sequioa         0.388356463 -0.06777733        sequioa     9
silver maple   -0.038470051  0.31837162   silver maple     9
spruce          0.433471931 -0.07372209         spruce     9
sugar maple    -0.201419337  0.26797441    sugar maple     9
sycamore       -0.009385121  0.28105386       sycamore     9
walnut         -0.220274963 -0.22711366         walnut     9
white oak      -0.259555150  0.28057415      white oak     9
white pine      0.506667722 -0.05071981     white pine     9

##plot results
ggplot(test3,aes(x = X1,y = X2, label = Trees))+geom_text(cex=3,alpha=0.5,position=position_jitter(width=0.05, height=0.05)) + geom_point(size=NA)+labs(x="Dimension 1", y="Dimension 2") + ggtitle("MDS of Trees by Pile")

This plot shows an MDS of the trees by which pile they are in. It shows us there are 4-5 different types of piles usually made by respondents. This can help in interpretation, but often we are most interested in the relationship between each tree species and the other tree species. For this, we work with a TreebyTree matrix.

# Make a treebytree dataframe
TreebyTree <- crossprod(TreeWide)
head(TreebyTree) 
## Using binary mds method
TreebyTree %>% dist(method="binary") %>% cmdscale(eig=T, k=2) -> test4
test5 <- data.frame(test4$points) %>% #mds coordinates
  bind_cols(Trees = rownames(TreebyTree))#bind sample names
# Look at results
#test5

Then we can plot the results.

## Plot results
ggplot(test5, aes(x = X1, y = X2, label = Trees)) + geom_text(fontface = "bold", 
    size = 2, alpha = 0.4, position = position_jitter(width = 0.05, height = 0.02)) + 
    geom_point(size = NA) + labs(x = "Dimension 1", y = "Dimension 2") + ggtitle("MDS of Trees")

This plot can be a bit difficult to read at times depending on how the labels are plotted. You can rerun the code until a more legible version of the graph is generated. You can also adjust the label transparency, size, and jitter to improve graph legibility. So far, it looks like the people in this sample consider coconuts and bananas to be very different from all the other trees, while pines exhibit their own cluster but are still more similar to other trees than to the coconut-banana cluster.

Cluster analysis

In addition to multidimensional scaling, we can also examine the pilesort data through hierarchical clustering. This method allows us to understand the hiararchy of cluster similarities to one another. The code below allows us to create a cluster dendrogram of the tree data.

# Then we can run a cluster analysis on the pile sort data Code adapted from:
# https://www.r-bloggers.com/clustering-music-genres-with-r/. Check it out for
# examples of how to take this analysis further. first turn data into matrix and
# remove diagonal
TreebyTreeMat <- as.matrix(TreebyTree)
diag(TreebyTreeMat) <- 0
# make distance matrix of co-occurence data
TreeDistMat <- dist(TreebyTreeMat)
# perform hierchical clustering on data
TreeHC <- hclust(TreeDistMat, method = "ward.D")

plot(TreeHC, xlab = "Tree species", sub = "Ward's method")

Depending on which height the graph is cut at, different numbers of tree clusters are created. This visualization can help us better understand how similar or dissimilar groups of trees are to one another, depending on the level of similarity used as a benchmark. For example, if we cut the dendrogram at a height of 100, there will be two cluster, which roughly correspond to conifers and non-conifers. However, if we cut the dendrogram at a height of 20, we will have 5 clusters, which could be named something like conifers, oaks, non-oak non-fruit/nut bearing, fruit trees, and nut trees. The level at which you choose to define clusters depends on both the data analysis results and questions of your study.