Course overview

This course introduces graduate students to theory and methods in quantitative anthropological and archaeological research. This is accomplished through three main themes threaded throughout the semester: 1) asking quantitative questions in anthropology, 2) statistics / data science theory, and 3) data analysis and management. In the first theme, we consider how to design quantitative anthropological research studies, scaling from question design through ethical best practices. In the second theme, students will be introduced to basic statistical theories and contemporary trends in quantitative social science. Finally, the third theme creates space for students to apply new skills in data analysis, management and visualization using R. Collectively, these themes enable students to build theoretical and methodological foundations for conducting independent quantitative anthropological research.

Looking for the 2022 version of this course?

Required texts

  • Urdan, Timothy. 2010. Statistics in Plain English, Third Edition. Routledge.
  • Bernard, H. Russell, 2006. Research Methods in Anthropology [4th edition] Rowman Altamira.
  • Bueno de Mesquita, Ethan and Anthony Fowler. 2021. Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis. Princeton University Press.
  • Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.

Required software

Additional materials

Below are webpages with additional course materials and handouts. You are on your honor to not look up problem set answers online. All solutions must be presented in your own words.

Semester schedule

Week 1 (Jan 26): Course overview and Intro to R

Course Schedule

  1. Syllabus overview
    • Coding teams
  2. Intersectional thinking
  3. R tutorial: Getting Started with R
  4. Collaborative work time

Week 2 (Feb 2): Understanding Quantitative Data

  • Reading
    • Urdan 2010: ch. 1 (Introduction to Social Science Research) [pg. 1-11]
    • Thinking Clearly: ch. 2 (Correlation: What is it and what is it good for?) [pg. 13-36]
    • Bernard 2006, part of ch. 19 (Univariate Analysis), pg. 584-587
    • Recommended: Thinking Clearly: ch. 3 (Causation: What is it and what is it good for?) [pg. 37-52]
  • Extra practice: Phillips 7-9 Indexing Vectors, Matrices and Dataframes, Importing, saving and managing data
  • R Tutorial: Data wrangling with base R
  • R Tutorial: RMarkdown
  • Lecture: Working with quantitative data
  • Lecture: Writing code for reproducible research
  • Problem set 2
  • DUE: Problem set 1

Schedule

  1. Problem Set 1 Review
  2. Theory lecture: Understanding quantitative data
  3. Theory lecture: writing code for reproducible research
  4. R tutorial: Data wrangling with base R; RMarkdown
  5. Collaborative work time

Week 3 (Feb 9): Descriptive Statistics, Confidence and Correlation

  • Reading
    • Urdan 2010: ch. 2-4 (Measures of Central Tendency, Measures of Variability, The Normal Distribution) [pg. 13-36]
    • Urdan 2010: ch. 6-8 (Standard Errors; Statistical Significance, Effect Size, Confidence Intervals; Correlation) [pg. 49-92]
    • Thinking Clearly: ch. 6 (Samples, Uncertainty, and Statistical Inference) [pg. 94-111]
    • Recommended: Thinking Clearly: ch. 4 (Correlation requires variation) [pg. 55-72]
  • Extra practice: Phillips 10 Advanced Dataframe Manipulation
  • R Tutorial: Descriptive statistics and advanced dataframe manipulation
  • Lecture: Normal distribution, central tendency, variability
  • Lecture: Confidence intervals, standard error, statistical significance, correlation
  • Problem set 3
  • DUE: Problem set 2

Schedule

  1. Problem Set 2 Review
  2. Theory lecture: Normal distribution, central tendency, variability
  3. Theory lecture: Confidence intervals, standard error, statistical significance, correlation
  4. R tutorial: Descriptive statistics and advanced dataframe manipulation
  5. Final project discussion
  6. Collaborative work time

Week 4 (Feb 16): Data visualization and EDA

Schedule

  1. Problem Set 3 Review
  2. Theory lecture: Data visualization and EDA
  3. Theory lecture: Multi-week project planning
  4. R tutorial: Data visualization
  5. R tutorial: Tidyverse
  6. Collaborative work time

Week 5 (Feb 23): Text analysis with R

  • Reading
    • Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 1-3.
    • Abramson et al. 2018. The promises of computational ethnography: Improving transparency, replicability, and validity for realist approaches to ethnographic analysis. Ethnography 19(2) 254-284.
  • R Tutorial: Text analysis
  • Lecture: Text analysis
  • Problem set 5
  • DUE: Problem set 4
  • DUE: Final project prospectus due by midnight

Schedule

  1. Problem Set 4 Review
  2. Theory lecture: Text analysis
  3. R tutorial: Text analysis
  4. Collaborative work time

Week 6 (Mar 2): Advanced text analysis and data from the web

Schedule

  1. Problem Set 5 Review
  2. Theory lecture: Web scraping and finding data
  3. R tutorial: Web scraping
  4. Collaborative work time

Week 7 (Mar 9): Regression

  • Reading
    • Urdan 2010: ch. 13 (Regression) [pg. 145-160]
    • Thinking Clearly: ch. 5 (Regression for Describing and Forecasting) [pg. 74-93]
    • Recommended: Thinking Clearly: ch. 10 (Controlling for Confounders) [pg. 193-217]
  • Extra practice: Phillips 15 Regression
  • R Tutorial: Regression in R
  • Lecture: Regression
  • Problem set 7
  • DUE: Problem set 6

Schedule

  1. Problem Set 6 Review
  2. Theory lecture: Regression
  3. R tutorial: Regression in R
  4. Collaborative work time

Week 8 (Mar 16): Research design and sampling

  • Reading
    • Bernard 2006: ch. 6-8, 10 (Sampling; Sampling Theory; Nonprobability Sampling and Choosing Informants; Structured Interviewing I: Questionnaires) [pg. 146-209, 251-298]
  • R Tutorial: Final project overview and dashboard tutorial
  • Lecture: Sampling, research design and data collection
  • Assignment: Final project mock-up
  • DUE: Problem set 7

Schedule

  1. Problem Set 7 Review
  2. Theory lecture: Sampling, research design and data collection
  3. R Tutorial: Flexdashboards
  4. Collaborative work time

SPRING BREAK

Week 9 (Mar 30): Comparing sample means (t-tests and ANOVA)

  • Reading
    • Urdan 2010: ch. 9 (t tests) [pg. 93-104]
    • Phillips 13.1-13.3 [Hypothesis tests – t-test]
    • Urdan 2010: ch. 10 (One-way analysis of variance) [pg. 105-118]
  • Extra practice: R4DS: Section 5.5-5.7
  • Extra practice: R4DS Section 13 Relational data
  • R Tutorial: t-tests, ANOVA and joining dataframes
  • Lecture: Hypothesis tests, t-tests, ANOVA
  • Problem set 8
  • Due: Final project mock-up due by midnight

Schedule

  1. Final project discussion
  2. Theory lecture: Hypothesis tests, t-tests, ANOVA
  3. R tutorial: t-tests, ANOVA and joining dataframes
  4. Collaborative work time

Week 10 (Apr 6): Categorical data and communicating stats

  • Reading
    • Urdan 2010: ch. 14 (The Chi-Square Test of Independence) [pg. 161-168]
    • Thinking Clearly: ch. 7 (Over-comparing, Under-reporting) [pg. 113-136]
    • Thinking Clearly: ch. 9 (Why correlation doesn’t imply causation) [pg. 159-191]
    • Recommended: Thinking Clearly: ch. 8 (Reversion to the Mean) [ch. 138-155]
  • Extra practice: Phillips 13.6 chi-sq questions, Q2, Q5, Q6; R4DS 15: Factors
  • R Tutorial: Interactive graphics
  • Lecture: Working with categorical data and chi-square tests
  • Lecture: Communicating statistical results
  • Problem set 9
  • DUE: Problem set 8

Schedule

  1. Problem Set 8 Review
  2. Theory lecture: Working with categorical data and chi-square tests
  3. Theory lecture: Communicating statistical results
  4. R tutorial: Interactive graphics
  5. Collaborative work time

Week 11 (April 13): Cultural domain analysis and MDS

  • Reading
    • Bernard 2006: ch. 11 (Structured Interviewing II: Cultural Domain Analysis) [pg. 299-317]
    • Bernard 2006, part of ch. 21: pg. 677-689
    • Borgatti, Stephen. 1998. Elicitation Techniques for Cultural Domain Analysis. In J. Schensul and M. LeCompte (eds). The Ethnographers Toolkit, Vol 3. Walnut Creek, CA: Altamira Press.
    • Recommended: Weller, Susan. 2014. Structured Interviewing and Questionnaire Construction.
  • R Tutorial: Cultural domain analysis and MDS
  • Lecture: Cultural domain analysis and MDS
  • Problem set 10
  • DUE: Problem set 9

Schedule

  1. Problem Set 9 Review
  2. Theory lecture: Cultural domain analysis
  3. R tutorial: Cultural domain analysis and MDS
  4. Collaborative work time

Week 12 (April 20): Network Analysis

  • Reading
    • Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network Analysis in the Social Sciences. Science 323:892–896.
    • Crona, B., & Bodin, Ö. (2006). What You Know is Who You Know? Communication Patterns Among Resource Users as a Prerequisite for Co-management. Ecology and Society 11(2):7.
    • Ready and Power. 2018. Why Wage Earners Hunt. Current Anthropology. 59(1):74-97.
    • Baggio, J. A., BurnSilver, S. B., Arenas, A., Magdanz, J. S., Kofinas, G. P., and De Domenico, M. (2016). Multiplex social ecological network analysis reveals how social changes affect community robustness more than resource depletion. PNAS 113(48):13708–13713.
    • Crabtree et al. (2019). Subsistence Transitions and the Simplification of Ecological Networks in the Western Desert of Australia. Human Ecology 47:165–177.
  • R Tutorial: Network analysis
  • Lecture: Network analysis in Anthropology
  • Problem set 11
  • DUE: Problem set 10

Schedule

  1. Problem Set 10 Review
  2. Theory lecture: Network analysis in Anthropology
  3. R tutorial: Network analysis with R
  4. Collaborative work time

Week 13 (April 27): MDS, principle component, cluster, and factor analysis

  • Reading
    • Urdan 2010: ch. 15 (Factor Analysis and Reliability Analysis: Data Reduction Techniques) [pg. 169-182]
    • Bernard 2006, part of ch. 21: pg. 689-692
  • Lecture: PCA, cluster and factor analysis
  • Guest lecture
  • Problem set 12 [Due in 2 weeks]

Schedule

  1. Problem Set 11 Review
  2. Theory lecture: MDS, PCA, cluster and factor analysis
  3. Guest lecture
  4. Collaborative work time

Week 14 (May 4): Unpacking quantitative approaches

  • Reading
    • Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
    • Merry, Sally. 2015. Quantification and the Paradox of Measurement. Current Anthropology 56(2):205-229.
    • Recommended: Merry, Sally. 2011. Measuring the World: Indicators, Human Rights and Global Governance. Current Anthropology 52:S83-S95.

Schedule

  1. Theory discussion: Unpacking quantitative approaches
  2. Collaborative work time

Week 15 (May 11): Final project check in and quantitative analysis of qualitative data

  • Reading
    • Thinking Clearly: ch. 15 (Turn Statistics into Substance) [pg. 305-334]
    • Thinking Clearly: ch. 17 (On the Limits of Quantification) [pg. 357-368]
    • Recommended: Thinking Clearly: ch. 16 (Measure Your Mission) [pg. 336-355]
  • R Tutorial: Qualitative data analysis with R
  • DUE: Problem Set 12

Schedule

  1. Problem Set 12 Review
  2. Theory lecture: Working quantitatively with qualitative data
  3. R Tutorial: Qualitative analysis in R
  4. Final projects discussion
  5. Wrapping up

Problem sets

Problem Set 1: Intro to R

Note: This week alone you are able to submit the problem set as a word document. In subsequent weeks you must submit a clean, knit, .html file documenting your results. Code printouts alone will not be accepted.

  1. Phillips 4.5, Q1-5
  2. Phillips 5.4, Q1-9
  3. Practice loading data and making histograms
    • Read in beaver temperature data with this sequence of lines of code: library(datasets) data(beavers) beaver1. The data should be in the datasets package (may already be loaded, if not, install the package).
    • Examine and describe the structure of the dataset. What data type are the variables and how many observations are there?
    • Make a histogram of the temperature observations. What does this tell you about the general trends of beaver temperature?
    • Change the number of bins in the histogram. Make one histogram with very few bins and another with a ton of bins. Do these different visualizations change your understanding of beaver temperatures? If so, in which ways?

Jump to Week 1

Problem Set 2: Data wrangling with base R and thinking through quantitative data

  1. Phillips 7.4, Q0-10. Here you are working with this table using each column as a separate vector. This will help you better understand indexing.
  2. Phillips 8.7, Q1-10. In this section, you answer similar questions, but using a dataframe.
  3. Phillips 9.9, Q1-13. For any problems that ask you to load or download files, simply work through them. No need to document your work here. The link to the data download isn’t working for one of the files mentioned in Phillips. Simply read through the questions to understand how you can load in different file types in R. More explanation on how to load different file types
  4. Urdan ch. 1 work problems. These problems ask you to think through data sampling, statistics, and types of data in an example study.

Jump to Week 2

Problem Set 3: Descriptive statistics, correlation, and distributions

  • Phillips 10.6, Q1-8. You’ll notice Philips uses dplyr in this chapter. Don’t worry if pipes don’t make sense yet, we will cover this in detail when we cover tidyverse. You can complete this problem set with either base R or dplyr.
  • Urdan ch. 4 work problems. These problems ask you to think through the theory behind normal distributions.
  • Urdan ch. 7 work problems, ch 12 work problems (in 4th edition, correlation chapter questions)

Jump to Week 3

Problem Set 4: Data visualization

Final project prospectus

Your research prospectus should be a 250-500 word description of your research project. In your description, please include the following: 1) your proposed question/hypothesis; 2) what methods you plan to use, including your sampling strategy; 3) why this study is significant (both intellectually and in the ’real world‘). In addition to the above, you may also include the following, as appropriate for your project: 1) a list of potential questions or data sources; 2) list of groups/agencies etc. to be sampled; 3) schedule of data collection; 3) citations of papers motivating your study. The more specific you are in your research prospectus, the more feedback you will be able to receive at this stage of your project.

Jump to Week 4

Problem Set 5: Text mining with R

With a text of your choice complete the following analyses and produce a data report. Be sure to document and justify any decisions that you make during the data cleaning and subsetting process. If you are struggling to find a text from your own work you can download one from Project Gutenberg or examine a different open-ended question from the permafrost survey we covered in class. Write up your results in the form of a brief data analysis report.

  1. Wrangle and tidy up the text by removing stop words, missing values and any extra characters that do not add to the analysis. Produce a table of the top 50 words and bigrams. How do you interpret these results?

  2. If there are multiple groups, sections or documents in your text, compare the word frequencies across these different subgroups. Alternatively, you may tag parts of speech and compare the frequency of different parts of speech in your text.

  3. Produce two different figures of your choice based on your analysis of word frequencies. Make sure each figure is accompanied by a descriptive caption.

  4. Either analyze the sentiments or create a topic model of based on your text. Create one figure or table based on this analysis. Explain how you chose which sentiments to focus on or how you created the topic model.

  5. Interpret the results from your analyses and figures. Did you find anything surprising or noteworthy from this analysis? What further questions do you have about the text that remain unanswered? What additional data or analyses would you need in order to answer those questions?

Jump to Week 5

Problem Set 6: Working with messy data

This problem set is an opportunity to practice working with real world datasets. You can use a dataset of choice (e.g. from social media, webscraping, from a database, from whitepapers) as an example. With your dataset, complete at least 3 of the following exercises. Document the process of transforming and cleaning the data.

  1. Make a new column based on subsetting or grouping the original data. Use string searches to help with this.

  2. Pivot all or part of the dataframe into either wide or long format.

  3. Using cleaned and wrangled data either a) make a table of the new categories or b) analyze word frequencies.

  4. With an unstructured text field, convert text to lowercase, remove stopwords and analyze the sentiments. Make a custom stopword list to augment an existing stopword list.

  5. Create a clearly labeled, multi-color plot based on your dataset.

Jump to Week 6

Final project mockup

Details on ELMS.

Jump to Week 8

Problem Set 9: Team data analysis project: DC Trees

For this problem set you will be obtaining and analyzing data from Open Data DC about Urban Forestry Street Trees. You will work in your coding team to produce a polished, clear report analyzing these data. At minimum, include the following sections:

  1. Descriptive Statistics
    • Total number of trees, total number of species
    • Most frequently occurring species
    • Which wards have the most and the least species? What other data about these areas would help you make sense of this distribution? Do some searching online to learn more about these regions.
  2. Edible Species Map
    • Make an interactive map of 5 edible species, with clickable and informative labels.
    • Make a table of the distribution of these edible tree species across wards. How does this compare to the previous calculation of the total number of trees across wards? How might these results be interpreted?
  3. Tree Pests
    • Which 5 tree families have the most total pest observations?
    • How do their total numbers of pest observations compare to the total observations? How might these ratios be interpreted?
  4. Predicting tree heights
    • Make a linear model predicting MAX_CROWN_HEIGHT for all trees and those in the genus Pinus and another genus of choice. Which variables are most useful for predicting tree height? What other variables might improve the model, if any?

Jump to Week 10

Problem Set 10: Cultural Domain Analysis

Using the Thawing Permafrost and Rural Communities Survey, answer the following questions and create the associated graphics. I’ve written the problems below with reference to question 8, but you might also consider also using question 54 (What do you see as the 3 biggest changes you will have to make in your day-to-day life because of thawing permafrost?).

  1. Convert the columns associated with question 8 (What do you think are the 3 biggest issues that will result from thawing permafrost in this area?“) from a wide to long dataframe. Add a new column that contains the rank of each issue as named by each respondent.

The following code might help you get started: pfissues <- surv %>% select(ID, Village, X8.1..PF.Issue, X8.2..PF.Issue, X8.3..PF.Issue)

  1. What are the top 10 most frequently mentioned issues in response to the question: “8) What do you think are the 3 biggest issues that will result from thawing permafrost in this area?”. Looking at this list, are there any issues that stand out to you? How useful is this first round of analysis on the raw data?

  2. Remove missing values, convert to lowercase, and categorize the responses into meaningful groups. We don’t know what the overall goal of the researchers might have been, but we can create our own classifications systems for these responses. Briefly explain how you decided to group the responses and then calculate the salience for these issue groups. Produce a barchart detailing the most salient issues.

If you are stuck on how to get started, look back at the lesson on text analysis and string data wrangling.

  1. This sample contains individuals from two different villages. Recalculate the issue salience metrics grouping responses by each community. Produce a facted figure detailing the most salient issues for each community.

  2. Briefly reflect on what you have learned about the perception of issues related to thawing permafrost in each of the communities in the survey. In what ways is this type of analysis meaningful or less informative compared to other analytical techniques you might use on this same set of question responses?

Note: If you have a dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 11

Problem Set 11: Network analysis in R

In this problem set, we will analyze the network of character interactions from two classic Indiana Jones films: Raiders of the Lost Ark and The Last Crusade. Data are taken from the MovieGalaxies database. Using these networks, answer the following questions. Data can be downloaded here and here.

  1. Create a summary table of the two networks. At minimum, include the following information:

    • Are they directed/undirected?
    • The number of nodes and edges
    • Mean node betweenness, indegree, and outdegree What do these metrics tell you about the character interactions in each movie? How do you think indegree and outdegree were calculated?
  2. Plot each network with nodes sized based on a centrality metric of choice and a clear title. Can you compare the node centrality measures across each movie? Why or why not? What do you notice about the overall structure of each movie’s network? Who are the more central nodes and who are more peripheral? How do you interpret these patterns?

  3. Calculate the network densities for each movie. What does this tell you about the character interactions? Can you compare the density for these networks? Why or why not?

  4. Run a community detection algorithm of choice on each network and compare the number of communities and their composition for each film. What do you think drives the modularity in these networks? You may need to look into the plot of each film in order to interpret these results.

Note: If you have a network dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 12

Problem Set 12: Advanced visualizations and analysis

This problem set asks you to step out on your own to analyze and present data using the tools you have learned throughout the semester. You can use one dataset for all the questions or different datasets for each one. Treat these prompts as mini data reports. This means you should describe the analysis used, motivation for using this particular analysis, where the data come from, how they are measured, and finally, how you interpret the results. This problem set should be submitted as a link to a Rpub. If there are other complex data analyses or visualizations that are more directly in line with your final project that you would like to make instead of those outlined below, reach out to me in advance.

  1. Working with a dataset of your choice, create a heatmap. Document all code used to create the heatmap and label figure as though it is going to be submitted for publication. Write a figure caption describing the data used to create the figure and your interpretation of the results. Why is a heatmap the best way to represent these data?

  2. Analyze a dataset of your choice using regression models. Create both a figure and table and interpret your results.

  3. Create and interpret an interactive graphic of choice.

Jump to Week 13

Additional course resources

Data sets

Other useful tools