Course overview

This course introduces graduate students to theory and methods in quantitative anthropological and archaeological research. This is accomplished through three main themes threaded throughout the semester: 1) asking quantitative questions in anthropology, 2) statistics / data science theory, and 3) data analysis and management. In the first theme, we consider how to design quantitative anthropological research studies, scaling from question design through ethical best practices. In the second theme, students will be introduced to basic statistical theories and contemporary trends in quantitative social science. Finally, the third theme creates space for students to apply new skills in data analysis, management and visualization using R. Collectively, these themes enable students to build theoretical and methodological foundations for conducting independent quantitative anthropological research.

Required texts

  • Urdan, Timothy. 2010. Statistics in Plain English, Third Edition. Routledge.
  • Bernard, H. Russell, 2006. Research Methods in Anthropology [4th edition] Rowman Altamira.
  • Drennan, Robert D. Statistics for Archaeologists: A Common Sense Approach (2nd Edition), Springer.
  • Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.

Required software

Additional materials

Below are webpages with additional course materials and handouts. You are on your honor to not look up problem set answers online. All solutions must be presented in your own words.

Semester schedule

Week 1: Course overview and intro to R

Schedule

  1. Syllabus overview
    • Problem set leader assignments
  2. Getting R installed
  3. R tutorial: Getting Started with R
  4. Collaborative work time

Week 2: Understanding Quantitative Data

Schedule

  1. Problem Set 1 Review
  2. Theory lecture: Understanding quantitative data
  3. R tutorial: Data wrangling with base R
  4. Collaborative work time

Week 3: Distributions and Descriptive Statistics

Schedule

  1. Problem Set 2 Review
  2. Theory lecture: Normal distribution, central tendency, variability
  3. R tutorial: Descriptive statistics and advanced dataframe manipulation
  4. Collaborative work time

Week 4: Data visualization and EDA

Schedule

  1. Problem Set 3 Review
  2. Theory lecture: Data visualization and EDA
  3. R tutorial: Data visualization
  4. Collaborative work time

Week 5: Confidence and Correlation

  • Reading
    • Urdan 2010: ch. 6-8 (Standard Errors; Statistical Significance, Effect Size, Confidence Intervals; Correlation) [pg. 49-92]
    • Bernard 2006, part of ch. 19, pg. 584-587
    • Drennan ch. 9 (confidence and population means) [pg. 107-130]
  • Extra practice: R4DS: Section 5.1-5.4 Data transformation
  • R Tutorial: Tidyverse
  • Lecture slides: Confidence intervals, standard error, statistical significance, correlation
    • On ELMS
  • Problem set 5
  • DUE: Problem set 4
  • DUE: Final project prospectus due by midnight

Schedule

  1. Problem Set 4 Review
  2. Theory lecture: Confidence intervals, standard error, statistical significance, correlation
  3. R tutorial: Tidyverse
  4. Collaborative work time

Week 6: Regression

  • Reading
    • Urdan 2010: ch. 13 (Regression) [pg. 145-160]
    • Drennan ch. 15 (Relating a measurement variable to another measurement variable) [pg. 199-220]
  • Extra practice: Phillips 15 Regression
  • R Tutorial: Regression in R
  • Lecture slides: Regression
    • On ELMS
  • Problem set 6
  • DUE: Problem set 5

Schedule

  1. Problem Set 5 Review
  2. Theory lecture: Regression
  3. R tutorial: Regression in R
  4. Collaborative work time

Week 7: Research design and sampling

  • Reading
    • Bernard 2006: ch. 6-8, 10 (Sampling; Sampling Theory; Nonprobability Sampling and Choosing Informants; Structured Interviewing I: Questionnaires) [pg. 146-209, 251-298]
    • Drennan ch. 7, 11 [Sampling; Categories and population proportions] [pg. 79-96, 139-143]
    • Recommended: Fowler, Floyd. 2014. Survey Research Methods, Fifth Edition. SAGE Publications, Inc., excerpts from ch. 1, 5, 6, 7.
  • R Tutorial: Final project overview and dashboard tutorial
  • Lecture slides: Sampling, research design and data collection
    • On ELMS
  • Problem set 7
  • DUE: Problem set 6

Schedule

  1. Problem Set 6 Review
  2. Theory lecture: Sampling, research design and data collection
  3. R Tutorial: Flexdashboards
  4. Collaborative work time

SPRING BREAK

Week 8: Comparing two sample means

  • Reading
    • Urdan 2010: ch. 9 (t tests) [pg. 93-104]
    • Phillips 13.1-13.3 [Hypothesis tests – t-test]
    • Drennan ch. 12 [comparing two sample means][pg. 147-163]
  • Extra practice: R4DS: Section 5.5-5.7
  • R Tutorial: Advanced tidyverse and t-tests
  • Lecture slides: Hypothesis tests and t-tests
    • On ELMS
  • Problem set 8
  • DUE: Problem set 7

Schedule

  1. Problem Set 7 Review
  2. Theory lecture: Hypothesis tests and t-tests
  3. R tutorial: Advanced tidyverse and t-tests
  4. Collaborative work time

Week 9: Comparing >2 sample means

  • Reading
    • Urdan 2010: ch. 10 (One-way analysis of variance) [pg. 105-118]
    • Drennan ch. 13 (Comparing means of more than two samples)[pg. 165-179]
  • Extra practice: R4DS Section 13 Relational data
  • R Tutorial: Relational data and ANOVA
  • Lecture slides: ANOVA
    • On ELMS
  • Problem set 9
  • DUE: Problem set 8

Schedule

  1. Problem Set 8 Review
  2. Theory lecture: ANOVA
  3. R tutorial: Relational data and ANOVA
  4. Collaborative work time

Week 10: Categorical data

  • Reading
    • Urdan 2010: ch. 14 (The Chi-Square Test of Independence) [pg. 161-168]
    • Drennan ch. 14 (comparing proportions of different samples) [pg. 181-196]
  • Extra practice: R4DS 15: Factors
  • R Tutorial: Interactive graphics
  • Lecture slides: Working with categorical data and chi-square tests
    • On ELMS
  • Problem set 10
  • DUE: Problem set 9

Schedule

  1. Problem Set 9 Review
  2. Theory lecture: Working with categorical data and chi-square tests
  3. R tutorial: Factors and chi-square tests
  4. Collaborative work time

Week 11: Special Topics: Text mining with R

  • Reading
    • Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 1-3.
    • Abramson et al. 2018. The promises of computational ethnography: Improving transparency, replicability, and validity for realist approaches to ethnographic analysis. Ethnography 19(2) 254-284.
    • Recommended: Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 4-6
  • Extra practice: R4DS ch. 14 Strings
  • R Tutorial: Text analysis
  • Lecture slides: Text analysis
    • On ELMS
  • Problem set 11
  • DUE: Problem set 10

Schedule

  1. Problem Set 10 Review
  2. Theory lecture: Text analysis
  3. R tutorial: Text analysis
  4. Collaborative work time

Week 12: Special Topics: Cultural domain analysis and MDS

  • Reading
    • Bernard 2006: ch. 11 (Structured Interviewing II: Cultural Domain Analysis) [pg. 299-317]
    • Bernard 2006, part of ch. 21: pg. 677-689
    • Drennan ch. 23 (Multidimensional scaling)[pg. 285-289]
    • Borgatti, Stephen. 1998. Elicitation Techniques for Cultural Domain Analysis. In J. Schensul and M. LeCompte (eds). The Ethnographers Toolkit, Vol 3. Walnut Creek, CA: Altamira Press.
    • Recommended: Weller, Susan. 2014. Structured Interviewing and Questionnaire Construction.
  • R Tutorial: Cultural domain analysis and MDS
  • Lecture slides: Cultural domain analysis and MDS
    • On ELMS
  • Problem set 12
  • DUE: Problem set 11

Schedule

  1. Problem Set 11 Review
  2. Theory lecture: Cultural domain analysis and MDS
  3. R tutorial: Cultural domain analysis and MDS
  4. Collaborative work time

Week 13: Special Topics: Networks

  • Reading
    • Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network Analysis in the Social Sciences. Science 323:892–896.
    • Crona, B., & Bodin, Ö. (2006). What You Know is Who You Know? Communication Patterns Among Resource Users as a Prerequisite for Co-management. Ecology and Society 11(2):7.
    • Ready and Power. 2018. Why Wage Earners Hunt. Current Anthropology. 59(1):74-97.
    • Baggio, J. A., BurnSilver, S. B., Arenas, A., Magdanz, J. S., Kofinas, G. P., and De Domenico, M. (2016). Multiplex social ecological network analysis reveals how social changes affect community robustness more than resource depletion. PNAS 113(48):13708–13713.
    • Crabtree et al. (2019). Subsistence Transitions and the Simplification of Ecological Networks in the Western Desert of Australia. Human Ecology 47:165–177.
  • R Tutorial: Network analysis
  • Lecture slides: Network analysis in Anthropology
    • On ELMS
  • Problem set 13
  • DUE: Problem set 12

Schedule

  1. Problem Set 12 Review
  2. Theory lecture: Network analysis in Anthropology
  3. R tutorial: Network analysis with R
  4. Collaborative work time

Week 14: Unpacking quantitative approaches

  • Reading
    • Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
    • Merry, Sally. 2015. Quantification and the Paradox of Measurement. Current Anthropology 56(2):205-229.
    • Recommended: Merry, Sally. 2011. Measuring the World: Indicators, Human Rights and Global Governance. Current Anthropology 52:S83-S95.
  • Lecture slides
    • On ELMS
  • DUE: Problem set 13

Schedule

  1. Problem Set 13 Review
  2. Theory discussion: Unpacking quantitative approaches
  3. Collaborative work time

Week 15: PCA, cluster and factor analysis

  • Reading
    • Urdan 2010: ch. 15 (Factor Analysis and Reliability Analysis: Data Reduction Techniques) [pg. 169-182]
    • Drennan ch. 24 (Principal components analysis)[pg. 299-303]
    • Bernard 2006, part of ch. 21: pg. 689-692
    • Drennan ch. 25 (cluster analysis)[pg. 309-318]
  • Lecture slides: PCA, cluster and factor analysis
    • On ELMS

Schedule

  1. Problem Set 14 Review
  2. Theory lecture: PCA, cluster and factor analysis
  3. Wrapping up

Problem sets

Problem Set 1: Intro to R

  1. Phillips 4.5, Q1-5
  2. Phillips 5.4, Q1-9
  3. Drennan Ch. 1: Using the datasets from the chapter, complete the following problems:
    1. Recreate the Kiskiminetas River Valley histogram in the chapter. Note the number of bins and how the bin members are decided. Is the cutoff at the bottom or top of the range? How can you adjust this?
    2. Make two histogram of the scraper length data with different bin sizes. Do you notice anything different in the data distribution when you change the number of breaks? You can load the .RData file by selecting “open with” RStudio.

Jump to Week 1

Problem Set 2: Data wrangling with base R and thinking through quantitative data

  1. Phillips 7.4, Q0-10. Here you are working with this table using each column as a separate vector. This will help you better understand indexing.
  2. Phillips 8.7, Q1-10. In this section, you answer similar questions, but using a dataframe.
  3. Phillips 9.9, Q1-13
  4. Urdan ch. 1 work problems. These problems ask you to think through data sampling, statistics, and types of data in an example study.

Jump to Week 2

Problem Set 3: Descriptive statistics and normal distributions

  • Phillips 10.6, Q1-8. You’ll notice Philips uses dplyr in this chapter. Don’t worry if pipes don’t make sense yet, we will cover this in detail when we cover tidyverse. You can complete this problem set with base R.
  • Drennan Ch. 2 Q1-2 and Ch. 3 Q1-3. Hint: Look at the help file for mean() if you’re stuck. For Ch. 3 questions, instead of stem-and-leaf plots, make histograms.
  • Urdan ch. 4 work problems. These problems ask you to think through the theory behind normal distributions.

Jump to Week 3

Problem Set 4: Data visualization

Jump to Week 4

Problem Set 5: Correlation, confidence intervals, and significance

  • Urdan ch. 7 exercises, ch 12 exercises (in 4th edition, correlation chapter questions)

Jump to Week 5

Problem Set 6: Regression

Jump to Week 6

Problem Set 7: Data wrangling with tidyverse

  • R4DS: Chapter 5 Data transformation - Section 5.2.4, 5.3.1, 5.4.1 exercises. Hint: don’t forget to load the following libraries - library(tidyverse) library(nycflights13)

Jump to Week 7

Problem Set 8: t-tests in R

Jump to Week 8

Problem Set 9: ANOVA

Jump to Week 9

Problem Set 11: Text mining in R

With a text of your choice complete the following analyses and produce a data report. Be sure to document and justify any decisions that you make during the data cleaning and subsetting process. If you are struggling to find a text from your own work you can download one from Project Gutenberg or examine a different open-ended question from the permafrost survey we covered in class. Write up your results in the form of a brief data analysis report.

  1. Wrangle and tidy up the text by removing stop words, missing values and any extra characters that do not add to the analysis. Produce a table of the top 50 words and bigrams. How do you interpret these results?

  2. If there are multiple groups, sections or documents in your text, compare the word frequencies across these different subgroups. Alternatively, you may tag parts of speech and compare the frequency of different parts of speech in your text.

  3. Produce two different figures of your choice based on your analysis of word frequencies. Make sure each figure is accompanied by a descriptive caption.

  4. Either analyze the sentiments or create a topic model of based on your text. Create one figure or table based on this analysis. Explain how you chose which sentiments to focus on or how you created the topic model.

  5. Interpret the results from your analyses and figures. Did you find anything surprising or noteworthy from this analysis? What further questions do you have about the text that remain unanswered? What additional data or analyses would you need in order to answer those questions?

Jump to Week 11

Problem Set 12: Cultural domain analysis and MDS

Using the Thawing Permafrost and Rural Communities Survey, answer the following questions and create the associated graphics. I’ve written the problems below with reference to question 8, but you might also consider also using question 54 (What do you see as the 3 biggest changes you will have to make in your day-to-day life because of thawing permafrost?).

  1. Convert the columns associated with question 8 (What do you think are the 3 biggest issues that will result from thawing permafrost in this area?") from a wide to long dataframe. Add a new column that contains the rank of each issue as named by each respondent.

The following code might help you get started: pfissues <- surv %>% select(ID, Village, X8.1..PF.Issue, X8.2..PF.Issue, X8.3..PF.Issue)

  1. What are the top 10 most frequently mentioned issues in response to the question: “8) What do you think are the 3 biggest issues that will result from thawing permafrost in this area?”. Looking at this list, are there any issues that stand out to you? How useful is this first round of analysis on the raw data?

  2. Remove missing values, convert to lowercase, and categorize the responses into meaningful groups. We don’t know what the overall goal of the researchers might have been, but we can create our own classifications systems for these responses. Briefly explain how you decided to group the responses and then calculate the salience for these issue groups. Produce a barchart detailing the most salient issues.

If you are stuck on how to get started, look back at the lesson on text analysis and string data wrangling.

  1. This sample contains individuals from two different villages. Recalculate the issue salience metrics grouping responses by each community. Produce a facted figure detailing the most salient issues for each community.

  2. Briefly reflect on what you have learned about the perception of issues related to thawing permafrost in each of the communities in the survey. In what ways is this type of analysis meaningful or less informative compared to other analytical techniques you might use on this same set of question responses?

Note: If you have a dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 12

Problem Set 13: Network analysis in R

In this problem set, we will analyze the network of character interactions from two classic Indiana Jones films: Raiders of the Lost Ark and The Last Crusade. Data are taken from the MovieGalaxies database. Using these networks, answer the following questions. Data can be downloaded here and here.

  1. Create a summary table of the two networks. At minimum, include the following information:

    • Are they directed/undirected?
    • The number of nodes and edges
    • Mean node betweenness, indegree, and outdegree What do these metrics tell you about the character interactions in each movie? How do you think indegree and outdegree were calculated?
  2. Plot each network with nodes sized based on a centrality metric of choice and a clear title. Can you compare the node centrality measures across each movie? Why or why not? What do you notice about the overall structure of each movie’s network? Who are the more central nodes and who are more peripheral? How do you interpret these patterns?

  3. Calculate the network densities for each movie. What does this tell you about the character interactions? Can you compare the density for these networks? Why or why not?

  4. Run a community detection algorithm of choice on each network and compare the number of communities and their composition for each film. What do you think drives the modularity in these networks? You may need to look into the plot of each film in order to interpret these results.

Note: If you have a network dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 13

Additional course resources

Data sets

Other useful tools