Course overview

This course introduces graduate students to theory and methods in quantitative anthropological and archaeological research. This is accomplished through three main themes threaded throughout the semester: 1) asking quantitative questions in anthropology, 2) statistics / data science theory, and 3) data analysis and management. In the first theme, we consider how to design quantitative anthropological research studies, scaling from question design through ethical best practices. In the second theme, students will be introduced to basic statistical theories and contemporary trends in quantitative social science. Finally, the third theme creates space for students to apply new skills in data analysis, management and visualization using R. Collectively, these themes enable students to build theoretical and methodological foundations for conducting independent quantitative anthropological research.

Looking for the 2023 version of this course?

Course materials

Required books

  • Urdan, Timothy. 2010. Statistics in Plain English, Third Edition. Routledge.
  • Bernard, H. Russell, 2011. Research Methods in Anthropology [5th edition]. AltaMira Press.
  • Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
  • Llaudet, Elena and Kosuke Imai. 2023. Data Analysis for Social Science: A friendly and practical introduction. Princeton University Press.

Required software

Additional materials

Below are webpages with answer guides and handouts from the books used in this course. You are on your honor to not look up Problem Set answers online. All solutions must be presented in your own words.

Semester schedule

Week 1 (Jan 25): Course Overview and Intro to R

Before class

Reading

  • Phillips ch. 2: Installing R and RStudio
  • Data Analysis for Social Science: Introduction [pg. 1-26]
  • Bernard 2011: Chapter 2 (The Foundations of Social Research)[pg. 23-53]

Going further

  • Listen: Spreadsheet disasters by BBC’s More or Less
  • Read: Thinking Clearly with Data: Chapter 1 (Thinking Clearly in a Data-Driven Age) [pg. 1-9]

In class

Lectures

  • Syllabus overview
  • Introduction to quantitative research in Anthropology

Activities and Tutorials

Collaborative work time

After class

Extra Practice

Assignment

Week 2 (Feb 1): Understanding Quantitative Data

Before class

Reading

  • Urdan 2010: ch. 1 (Introduction to Social Science Research) [pg. 1-11]
  • Data Analysis for Social Science: ch. 2, Section 2.5 (Do small classes improve student performance?), 2.6 (Summary) [pg. 39-46].

Going further

  • Listen: How to better understand and explain numbers by BBC’s More or Less
  • Read: Thinking Clearly: ch. 2 (Correlation: What is it and what is it good for?) [pg. 13-36]
  • Read: Thinking Clearly: ch. 3 (Causation: What is it and what is it good for?) [pg. 37-52]
  • Read: Data Analysis for Social Science: If you are interested in analyzing experimental data, read the first sections of Chapter 2.

DUE: Problem Set 1

In class

Code Review

  • Problem Set 1 Review

Lectures

  • Working with quantitative data
  • Writing code for reproducible research

Activities and Tutorials

Collaborative work time

After class

Week 3 (Feb 8): Descriptive Statistics, Confidence and Correlation

Before class

Reading

  • Urdan 2010: ch. 2-4 (Measures of Central Tendency, Measures of Variability, The Normal Distribution) [pg. 13-36]
  • Urdan 2010: ch. 6-8 (Standard Errors; Statistical Significance, Effect Size, Confidence Intervals; Correlation) [pg. 49-92]
  • Data Analysis for Social Science: Chapter 3 (Inferring Population Characteristics via Survey Research)[pg. 51-89]

Going further

  • Thinking Clearly: ch. 6 (Samples, Uncertainty, and Statistical Inference) [pg. 94-111]
  • Thinking Clearly: ch. 4 (Correlation requires variation) [pg. 55-72]

DUE: Problem Set 2

In class

Code Review

  • Problem Set 2 Review

Lectures

  • Normal distribution, central tendency, variability
  • Confidence intervals, standard error, statistical significance, correlation

Activities and Tutorials

Collaborative work time

After class

Extra Practice

Week 4 (Feb 15): Data visualization and EDA

Before class

Reading

Going further

DUE: Problem Set 3

In class

Code Review

  • Problem Set 3 Review

Lectures

  • Data visualization and EDA
  • Multi-week project planning

Activities and Tutorials

Collaborative work time

After class

Extra Practice

  • Extra practice: R4DS: Section 5.1-5.4 Data transformation
  • Extra practice: QSS: Introduction. Code files here offer extra practice with basic tidyverse functions.

Assignment

Week 5 (Feb 22): Text analysis with R

Before class

Reading

  • Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 1-3.
  • Abramson et al. 2018. The promises of computational ethnography: Improving transparency, replicability, and validity for realist approaches to ethnographic analysis. Ethnography 19(2) 254-284.

Going further

DUE: Problem Set 4

DUE: Final project prospectus due by midnight

In class

Code Review

  • Problem Set 4 Review

Lectures

  • Text analysis

Activities and Tutorials

Collaborative work time

After class

Extra Practice

Assignment

Week 6 (Feb 29): Advanced text analysis and data from the web

Before class

Reading

  • Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 4-6
  • R4DS ch. 14 Strings

Going further

DUE: Problem Set 5

In class

Code Review

  • Problem Set 5 Review

Lectures

  • Web scraping and finding data

Activities and Tutorials

Collaborative work time

After class

Week 7 (Mar 7): Regression and Interactive Graphics

Before class

Reading

  • Urdan 2010: Chapter 13 (Regression) [pg. 145-160]
  • Data Analysis for Social Science: Chapter 4 (Predicting outcomes using linear regression)[pg. 98-123, can skip section 4.4.2]

Going further

In class

Code Review

  • Problem Set 6 Review

Lectures

  • Regression

Activities and Tutorials

Collaborative work time

After class

Extra Practice

Assignment

Week 8 (Mar 14): Research design and sampling

Before class

Reading

  • Bernard 2011: ch. 5-7, 9 (Sampling I: The Basics; Sampling II: Theory; Sampling III: Nonprobability Sampling and Choosing Informants; Interviewing II: Questionnaires) [pg. 113-155, 187-222]

Going further

DUE: Problem Set 7

In class

Code Review

  • Problem Set 7 Review

Lectures

  • Sampling, research design and data collection

Activities and Tutorials

Collaborative work time

After class

Extra Practice

Assignment

SPRING BREAK

Week 9 (Mar 28): Final project update and qualitative data

Before class

Reading

  • Bernard 2011: Chapter 20 (Univariate Analysis) excerpt on coding and data cleaning [pg. 458-464]

Going further

Due: Final project mock-up due by midnight

In class

No class session this week. Instead, find a time to meet with your peer review partner to discuss your final project progress. Details in ELMS.

Activities and Tutorials

After class

Extra Practice

Assignment

Week 10 (Apr 4): Comparing sample means (t-tests and ANOVA) and working with categorical data (chi-square)

Before class

Reading

  • Urdan 2010: ch. 9 (t tests) [pg. 93-104]
  • Phillips 13.1-13.3 [Hypothesis tests – t-test]
  • Urdan 2010: ch. 10 (One-way analysis of variance) [pg. 105-118]
  • Urdan 2010: ch. 14 (The Chi-Square Test of Independence) [pg. 161-168]
  • Working with categorical data and chi-square tests

Going further

DUE: Problem Set 8

In class

Code Review

  • Problem Set 8 Review

Lectures

  • t-tests, ANOVA and joining dataframes

Activities and Tutorials

Collaborative work time

After class

Extra Practice

  • Bernard 2011: Chapter 21 (Bivariate Analysis: Testing Relations)[pg. 492-499, sections on t-test and ANOVA]
  • Thinking Clearly: ch. 9 (Why correlation doesn’t imply causation) [pg. 159-191]
  • Recommended: Thinking Clearly: ch. 8 (Reversion to the Mean) [ch. 138-155]

Assignment

Week 11 (April 11): Cultural domain analysis and MDS

Before class

Reading

  • Bernard 2011: ch. 10 (Interviewing III: Cultural Domains) [pg. 223-237]
  • Bernard 2011: ch. 16: Analyzing Cultural Domains [pg. 346-385]
  • Borgatti, Stephen. 1998. Elicitation Techniques for Cultural Domain Analysis. In J. Schensul and M. LeCompte (eds). The Ethnographers Toolkit, Vol 3. Walnut Creek, CA: Altamira Press.

Going further

DUE: Problem Set 9

In class

Code Review

  • Problem Set 9 Review

Lectures

  • Cultural domain analysis

Activities and Tutorials

Collaborative work time

After class

Extra Practice

Assignment

Week 12 (April 18): Network Analysis

Before class

Reading

  • Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network Analysis in the Social Sciences. Science 323:892–896.
  • Crona, B., & Bodin, Ö. (2006). What You Know is Who You Know? Communication Patterns Among Resource Users as a Prerequisite for Co-management. Ecology and Society 11(2):7.
  • Ready and Power. 2018. Why Wage Earners Hunt. Current Anthropology. 59(1):74-97.
  • Baggio, J. A., BurnSilver, S. B., Arenas, A., Magdanz, J. S., Kofinas, G. P., and De Domenico, M. (2016). Multiplex social ecological network analysis reveals how social changes affect community robustness more than resource depletion. PNAS 113(48):13708–13713.
  • Crabtree et al. (2019). Subsistence Transitions and the Simplification of Ecological Networks in the Western Desert of Australia. Human Ecology 47:165–177.

Going further

DUE: Problem Set 10

In class

Code Review

  • Problem Set 10 Review

Lectures

  • Network analysis in Anthropology

Activities and Tutorials

Collaborative work time

After class

Extra Practice

Assignment

Week 13 (April 25): Advanced network analysis and spatial data

Before class

Reading

  • Spatial Statistics for Data Science: Theory and Practice with R. Chapters 1-6 [Spatial Data Section]
  • Spatial Data Science: With applications in R. Chapter 7 [Sections on sf package and spatial joins]

Going further

  • Spatial Statistics for Data Science: Theory and Practice with R. Chapters 7-11 [Areal Data Section]

DUE: Problem Set 11

In class

Code Review

  • Problem Set 11 Review

Lectures

Activities and Tutorials

Collaborative work time)

After class

Extra Practice

  • Spatial Data Science: With applications in R. Chapters 1-3, 5.

Assignment

Week 14 (May 2): Unpacking quantitative approaches

Before class

Reading

  • Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
  • Merry, Sally. 2015. Quantification and the Paradox of Measurement. Current Anthropology 56(2):205-229.

Going further

In class

Activities and Tutorials

  • Discussion: Unpacking quantitative approaches

Collaborative work time

After class

Extra Practice

Assignment

Week 15 (May 9): Principle component, cluster, and factor analysis

Before class

Reading

  • Urdan 2010: ch. 15 (Factor Analysis and Reliability Analysis: Data Reduction Techniques) [pg. 169-182]

Going further

  • Thinking Clearly: ch. 15 (Turn Statistics into Substance) [pg. 305-334]
  • Thinking Clearly: ch. 17 (On the Limits of Quantification) [pg. 357-368]
  • Recommended: Thinking Clearly: ch. 16 (Measure Your Mission) [pg. 336-355]

DUE: Problem Set 12

In class

Code Review

  • Problem Set 12 Review

Lectures

  • PCA, cluster and factor analysis

Activities and Tutorials

  • Final projects discussion

Collaborative work time

After class

Extra Practice

Assignment

Problem Sets

Problem Set 1: Intro to R

Note: This week alone you are able to submit the Problem Set as a word document. In subsequent weeks you must submit a clean, knit, .html file documenting your results. Code printouts alone will not be accepted.

  1. Phillips 4.5, Q1-5
  2. Phillips 5.4, Q1-9
  3. Practice loading data and making histograms
    • Read in beaver temperature data with this sequence of lines of code: library(datasets) data(beavers) beaver1. The data should be in the datasets package (may already be loaded, if not, install the package).
    • Examine and describe the structure of the dataset. What data type are the variables and how many observations are there?
    • Make a histogram of the temperature observations. What does this tell you about the general trends of beaver temperature?
    • Change the number of bins in the histogram. Make one histogram with very few bins and another with a ton of bins. Do these different visualizations change your understanding of beaver temperatures? If so, in which ways?

Jump to Week 1

Problem Set 2: Data wrangling with base R and thinking through quantitative data

  1. Phillips 7.4, Q0-10. Here you are working with this table using each column as a separate vector. This will help you better understand indexing.
  2. Phillips 8.7, Q1-10. In this section, you answer similar questions, but using a dataframe.
  3. Phillips 9.9, Q1-13. For any problems that ask you to load or download files, simply work through them. No need to document your work here. The link to the data download isn’t working for one of the files mentioned in Phillips. Simply read through the questions to understand how you can load in different file types in R. More explanation on how to load different file types
  4. Urdan ch. 1 work problems. These problems ask you to think through data sampling, statistics, and types of data in an example study.

Jump to Week 2

Problem Set 3: Descriptive statistics, correlation, and distributions

  • Phillips 10.6, Q1-8. You’ll notice Philips uses dplyr in this chapter. Don’t worry if pipes don’t make sense yet, we will cover this in detail when we cover tidyverse. You can complete this Problem Set with either base R or dplyr.
  • Urdan ch. 4 work problems. These problems ask you to think through the theory behind normal distributions.
  • Urdan ch. 7 work problems, ch. 12 work problems (in 4th edition, correlation chapter questions).

Jump to Week 3

Problem Set 4: Data visualization

Final project prospectus

Your research prospectus should be a 250-500 word description of your research project. In your description, please include the following: 1) your proposed question/hypothesis; 2) what methods you plan to use, including your sampling strategy; 3) why this study is significant (both intellectually and in the ’real world‘). In addition to the above, you may also include the following, as appropriate for your project: 1) a list of potential questions or data sources; 2) list of groups/agencies etc. to be sampled; 3) schedule of data collection; 3) citations of papers motivating your study. The more specific you are in your research prospectus, the more feedback you will be able to receive at this stage of your project.

Jump to Week 4

Problem Set 5: Text mining with R

With a text of your choice complete the following analyses and produce a data report. Be sure to document and justify any decisions that you make during the data cleaning and subsetting process. If you are struggling to find a text from your own work you can download one from Project Gutenberg or examine a different open-ended question from the permafrost survey we covered in class. Write up your results in the form of a brief data analysis report.

  1. Wrangle and tidy up the text by removing stop words, missing values and any extra characters that do not add to the analysis. Produce a table of the top 50 words and bigrams. How do you interpret these results?

  2. If there are multiple groups, sections or documents in your text, compare the word frequencies across these different subgroups. Alternatively, you may tag parts of speech and compare the frequency of different parts of speech in your text.

  3. Produce two different figures of your choice based on your analysis of word frequencies. Make sure each figure is accompanied by a descriptive caption.

  4. Either analyze the sentiments or create a topic model of based on your text. Create one figure or table based on this analysis. Explain how you chose which sentiments to focus on or how you created the topic model.

  5. Interpret the results from your analyses and figures. Did you find anything surprising or noteworthy from this analysis? What further questions do you have about the text that remain unanswered? What additional data or analyses would you need in order to answer those questions?

Jump to Week 5

Problem Set 6: Working with messy data

This Problem Set is an opportunity to practice working with real world datasets. You can use a dataset of choice (e.g. from social media, webscraping, from a database, from whitepapers) as an example. With your dataset, complete at least 3 of the following exercises. Document the process of transforming and cleaning the data.

  1. Make a new column based on subsetting or grouping the original data. Use string searches to help with this.

  2. Pivot all or part of the dataframe into either wide or long format.

  3. Using cleaned and wrangled data either a) make a table of the new categories or b) analyze word frequencies.

  4. With an unstructured text field, convert text to lowercase, remove stopwords and analyze the sentiments. Make a custom stopword list to augment an existing stopword list.

  5. Create a clearly labeled, multi-color plot based on your dataset.

Jump to Week 6

Final project mockup

Details on ELMS.

Jump to Week 8

Problem Set 8: Team data analysis project: DC Trees

For this Problem Set you will be obtaining and analyzing data from Open Data DC about Urban Forestry Street Trees. You will work in your coding team to produce a polished, clear report analyzing these data. At minimum, include the following sections:

  1. Descriptive Statistics
    • Total number of trees, total number of species
    • Most frequently occurring species
    • Which wards have the most and the least species? What other data about these areas would help you make sense of this distribution? Do some searching online to learn more about these regions.
  2. Edible Species Map
    • Make an interactive map of 5 edible species, with clickable and informative labels.
    • Make a table of the distribution of these edible tree species across wards. How does this compare to the previous calculation of the total number of trees across wards? How might these results be interpreted?
  3. Tree Pests
    • Which 5 tree families have the most total pest observations?
    • How do their total numbers of pest observations compare to the total observations? How might these ratios be interpreted?
  4. Predicting tree heights
    • Make a linear model predicting MAX_CROWN_HEIGHT for all trees and those in the genus Pinus and another genus of choice. Which variables are most useful for predicting tree height? What other variables might improve the model, if any?

Jump to Week 9

Problem Set 9: t-tests, ANOVA, and chi-square

Jump to Week 10

Problem Set 10: Cultural Domain Analysis

Using the Thawing Permafrost and Rural Communities Survey, answer the following questions and create the associated graphics. I’ve written the problems below with reference to question 8, but you might also consider also using question 54 (What do you see as the 3 biggest changes you will have to make in your day-to-day life because of thawing permafrost?).

  1. Convert the columns associated with question 8 (What do you think are the 3 biggest issues that will result from thawing permafrost in this area?“) from a wide to long dataframe. Add a new column that contains the rank of each issue as named by each respondent.

The following code might help you get started: pfissues <- surv %>% select(ID, Village, X8.1..PF.Issue, X8.2..PF.Issue, X8.3..PF.Issue)

  1. What are the top 10 most frequently mentioned issues in response to the question: “8) What do you think are the 3 biggest issues that will result from thawing permafrost in this area?”. Looking at this list, are there any issues that stand out to you? How useful is this first round of analysis on the raw data?

  2. Remove missing values, convert to lowercase, and categorize the responses into meaningful groups. We don’t know what the overall goal of the researchers might have been, but we can create our own classifications systems for these responses. This means that you are grouping responses according to some shared criteria such as “water issues” or “changes in animal behavior”. Briefly explain how you decided to group the responses and then calculate the salience for these issue groups. Produce a barchart detailing the Smith’s S calculations for the most frequently occuring issues with a vertical line showing the 0.1 level Smith’s S at 0.1.

If you are stuck on how to get started, look back at the lesson on text analysis and string data wrangling.

  1. This sample contains individuals from two different villages. Recalculate the issue salience metrics grouping responses by each community. Produce a facted figure detailing the most salient issues for each community.

  2. Briefly reflect on what you have learned about the perception of issues related to thawing permafrost in each of the communities in the survey. In what ways is this type of analysis meaningful or less informative compared to other analytical techniques you might use on this same set of question responses?

Note: If you have a dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 11

Problem Set 11: Network analysis in R

In this Problem Set, we will analyze the network of character interactions from two classic Indiana Jones films: Raiders of the Lost Ark and The Last Crusade. Data are taken from the MovieGalaxies database. Using these networks, answer the following questions. Data can be downloaded here and here.

  1. Create a summary table of the two networks. At minimum, include the following information:

    • Are they directed/undirected?
    • The number of nodes and edges
    • Mean node betweenness, indegree, and outdegree What do these metrics tell you about the character interactions in each movie? How do you think indegree and outdegree were calculated?
  2. Plot each network with nodes sized based on a centrality metric of choice and a clear title. Can you compare the node centrality measures across each movie? Why or why not? What do you notice about the overall structure of each movie’s network? Who are the more central nodes and who are more peripheral? How do you interpret these patterns?

  3. Calculate the network densities for each movie. What does this tell you about the character interactions? Can you compare the density for these networks? Why or why not?

  4. Run a community detection algorithm of choice on each network and compare the number of communities and their composition for each film. What do you think drives the modularity in these networks? You may need to look into the plot of each film in order to interpret these results.

Note: If you have a network dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 12

Problem Set 12: Advanced visualizations and analysis

This problem set asks you to step out on your own to analyze and present data using the tools you have learned throughout the semester. You may use the dataset that you plan to present in your final project or a different dataset of your choosing. The goal is that these graphics and analyses will be able to be used in your final project.

  1. Create and interpret a static graphic.

  2. Create and interpret an interactive graphic.

  3. Create and interpret a summary table, regression model or other statistical analysis.

Jump to Week 13

Additional course resources

Data sets

Other useful tools