Course overview

This course introduces graduate students to theory and methods in quantitative anthropological and archaeological research. This is accomplished through three main themes threaded throughout the semester: 1) asking quantitative questions in anthropology, 2) statistics / data science theory, and 3) data analysis and management. In the first theme, we consider how to design quantitative anthropological research studies, scaling from question design through ethical best practices. In the second theme, students will be introduced to basic statistical theories and contemporary trends in quantitative social science. Finally, the third theme creates space for students to apply new skills in data analysis, management and visualization using R. Collectively, these themes enable students to build theoretical and methodological foundations for conducting independent quantitative anthropological research.

Looking for the archived version of this course?

Required texts

Urdan, Timothy. 2010. Statistics in Plain English, Third Edition. Routledge.
Bernard, H. Russell, 2006. Research Methods in Anthropology [4th edition] Rowman Altamira.
Drennan, Robert D. Statistics for Archaeologists: A Common Sense Approach (2nd Edition), Springer.
Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.

Required software

R
RStudio

Required online texts

Additional materials

Below are webpages with additional course materials and handouts. You are on your honor to not look up problem set answers online. All solutions must be presented in your own words.

Semester schedule

Week 1: Course overview and intro to R

Reading
- Bernard 2006: ch. 16 (Introduction to Qualitative and Quantitative Analysis) [pg. 451-462]
- Drennan ch. 1 (Batches of Numbers) [pg. 3-15]
- Phillips ch. 2 Installing R and RStudio
Extra Practice: Phillips 4-6 The Basics, Scalars and Vectors, Vector functions
R Tutorial: Getting Started with R
Lecture slides: Introduction to quantitative research in Anthropology
Lecture slides: Writing code for reproducible research
- On ELMS
Problem set 1

Schedule

Syllabus overview
- Problem set leader assignments
Lecture on writing code for reproducible research
R tutorial: Getting Started with R
Collaborative work time

Week 2: Understanding Quantitative Data

Reading
- Urdan 2010: ch. 1 (Introduction to Social Science Research) [pg1-11]
Extra practice: Phillips 7-9 Indexing Vectors, Matrices and Dataframes, Importing, saving and managing data
R Tutorial: Data wrangling with base R
R Tutorial: RMarkdown
Lecture slides: Working with quantitative data
- On ELMS
Problem set 2
DUE: Problem set 1

Schedule

Problem Set 1 Review
Theory lecture: Understanding quantitative data
R tutorial: Data wrangling with base R; RMarkdown
Collaborative work time

Week 3: Distributions and Descriptive Statistics

Reading
- Urdan 2010: ch. 2-4 (Measures of Central Tendency, Measures of Variability, The Normal Distribution) [pg. 13-36]
- Drennan ch. 2-3 (The Level or Center of a Batch; The Spread or Dispersion of a Batch) [pg. 17-36]
Extra practice: Phillips 10 Advanced Dataframe Manipulation
R Tutorial: Descriptive statistics and advanced dataframe manipulation
Lecture slides: Normal distribution, central tendency, variability
- On ELMS
Problem set 3
DUE: Problem set 2

Schedule

Problem Set 2 Review
Theory lecture: Normal distribution, central tendency, variability
Final project discussion
R tutorial: Descriptive statistics and advanced dataframe manipulation
Collaborative work time

Week 4: Data visualization and EDA

Reading
- Drennan ch 4, 6 (Comparing Batches; Categories) [pg. 37-49, 63-75]
Extra practice: R4DS: Section 3 Data visualization, R4DS: 7 Exploratory Data Analysis
R Tutorial: Data visualization
Lecture slides: Data visualization and EDA
- On ELMS
Problem set 4
DUE: Problem set 3

Schedule

Problem Set 3 Review
Theory lecture: Data visualization and EDA
R tutorial: Data visualization
Collaborative work time

Week 5: Confidence and Correlation

Reading
- Urdan 2010: ch. 6-8 (Standard Errors; Statistical Significance, Effect Size, Confidence Intervals; Correlation) [pg. 49-92]
- Bernard 2006, part of ch. 19, pg. 584-587
- Drennan ch. 9 (confidence and population means) [pg. 107-130]
Extra practice: R4DS: Section 5.1-5.4 Data transformation
R Tutorial: Tidyverse
Lecture slides: Confidence intervals, standard error, statistical significance, correlation
- On ELMS
Problem set 5
DUE: Problem set 4
DUE: Final project prospectus due by midnight

Schedule

Problem Set 4 Review
Theory lecture: Confidence intervals, standard error, statistical significance, correlation
R tutorial: Tidyverse
Collaborative work time

Week 6: Regression

Reading
- Urdan 2010: ch. 13 (Regression) [pg. 145-160]
- Drennan ch. 15 (Relating a measurement variable to another measurement variable) [pg. 199-220]
Extra practice: Phillips 15 Regression
R Tutorial: Regression in R
Lecture slides: Regression
- On ELMS
Problem set 6
DUE: Problem set 5

Schedule

Problem Set 5 Review
Theory lecture: Regression
R tutorial: Regression in R
Collaborative work time

Week 7: Research design and sampling

Reading
- Bernard 2006: ch. 6-8, 10 (Sampling; Sampling Theory; Nonprobability Sampling and Choosing Informants; Structured Interviewing I: Questionnaires) [pg. 146-209, 251-298]
- Drennan ch. 7, 11 [Sampling; Categories and population proportions] [pg. 79-96, 139-143]
- Recommended: Fowler, Floyd. 2014. Survey Research Methods, Fifth Edition. SAGE Publications, Inc., excerpts from ch. 1, 5, 6, 7.
R Tutorial: Final project overview and dashboard tutorial
Lecture slides: Sampling, research design and data collection
- On ELMS
Problem set 7
DUE: Problem set 6

Schedule

Problem Set 6 Review
Theory lecture: Sampling, research design and data collection
R Tutorial: Flexdashboards
Collaborative work time

Week 8: Comparing sample means (t-tests and ANOVA)

Reading
- Urdan 2010: ch. 9 (t tests) [pg. 93-104]
- Phillips 13.1-13.3 [Hypothesis tests – t-test]
- Drennan ch. 12 [comparing two sample means][pg. 147-163]
- Urdan 2010: ch. 10 (One-way analysis of variance) [pg. 105-118]
- Drennan ch. 13 (Comparing means of more than two samples)[pg. 165-179]
Extra practice: R4DS: Section 5.5-5.7
Extra practice: R4DS Section 13 Relational data
R Tutorial: t-tests, ANOVA and joining dataframes
Lecture slides: Hypothesis tests, t-tests, ANOVA
- On ELMS
Problem set 8
DUE: Problem set 7

Schedule

Problem Set 7 Review
Theory lecture: Hypothesis tests, t-tests, ANOVA
R tutorial: t-tests, ANOVA and joining dataframes
Collaborative work time

Week 9: Categorical data

Reading
- Urdan 2010: ch. 14 (The Chi-Square Test of Independence) [pg. 161-168]
- Drennan ch. 14 (comparing proportions of different samples) [pg. 181-196]
Extra practice: R4DS 15: Factors
R Tutorial: Interactive graphics
Lecture slides: Working with categorical data and chi-square tests
- On ELMS
Problem set 9
DUE: Problem set 8

Schedule

Problem Set 8 Review
Theory lecture: Working with categorical data and chi-square tests
R tutorial: Factors and chi-square tests
Collaborative work time

Week 10: Text mining with R

Reading
- Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 1-3.
- Abramson et al. 2018. The promises of computational ethnography: Improving transparency, replicability, and validity for realist approaches to ethnographic analysis. Ethnography 19(2) 254-284.
- Recommended: Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 4-6
Extra practice: R4DS ch. 14 Strings
R Tutorial: Text analysis
Lecture slides: Text analysis
- On ELMS
Problem set 10
DUE: Problem set 9

Schedule

Problem Set 9 Review
Theory lecture: Text analysis
R tutorial: Text analysis
Collaborative work time

Week 11: Cultural domain analysis and MDS

Reading
- Bernard 2006: ch. 11 (Structured Interviewing II: Cultural Domain Analysis) [pg. 299-317]
- Bernard 2006, part of ch. 21: pg. 677-689
- Drennan ch. 23 (Multidimensional scaling)[pg. 285-289]
- Borgatti, Stephen. 1998. Elicitation Techniques for Cultural Domain Analysis. In J. Schensul and M. LeCompte (eds). The Ethnographers Toolkit, Vol 3. Walnut Creek, CA: Altamira Press.
- Recommended: Weller, Susan. 2014. Structured Interviewing and Questionnaire Construction.
R Tutorial: Cultural domain analysis and MDS
Lecture slides: Cultural domain analysis and MDS
- On ELMS
Problem set 11
DUE: Problem set 10

Schedule

Problem Set 10 Review
Theory lecture: Cultural domain analysis and MDS
R tutorial: Cultural domain analysis and MDS
Collaborative work time

Week 12: Network Analysis

Reading
- Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network Analysis in the Social Sciences. Science 323:892–896.
- Crona, B., & Bodin, Ö. (2006). What You Know is Who You Know? Communication Patterns Among Resource Users as a Prerequisite for Co-management. Ecology and Society 11(2):7.
- Ready and Power. 2018. Why Wage Earners Hunt. Current Anthropology. 59(1):74-97.
- Baggio, J. A., BurnSilver, S. B., Arenas, A., Magdanz, J. S., Kofinas, G. P., and De Domenico, M. (2016). Multiplex social ecological network analysis reveals how social changes affect community robustness more than resource depletion. PNAS 113(48):13708–13713.
- Crabtree et al. (2019). Subsistence Transitions and the Simplification of Ecological Networks in the Western Desert of Australia. Human Ecology 47:165–177.
R Tutorial: Network analysis
Lecture slides: Network analysis in Anthropology
- On ELMS
Problem set 12
DUE: Problem set 11

Schedule

Problem Set 11 Review
Theory lecture: Network analysis in Anthropology
R tutorial: Network analysis with R
Collaborative work time

Week 13: Unpacking quantitative approaches

Reading
- Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
- Merry, Sally. 2015. Quantification and the Paradox of Measurement. Current Anthropology 56(2):205-229.
- Recommended: Merry, Sally. 2011. Measuring the World: Indicators, Human Rights and Global Governance. Current Anthropology 52:S83-S95.
Problem set 13
Lecture slides
- On ELMS
DUE: Problem set 12

Schedule

Problem Set 12 Review
Theory discussion: Unpacking quantitative approaches
Collaborative work time

Week 14: Quantitative analysis of qualitative data and PCA, cluster and factor analysis

Reading
- Urdan 2010: ch. 15 (Factor Analysis and Reliability Analysis: Data Reduction Techniques) [pg. 169-182]
- Drennan ch. 24 (Principal components analysis)[pg. 299-303]
- Bernard 2006, part of ch. 21: pg. 689-692
- Drennan ch. 25 (cluster analysis)[pg. 309-318]
Lecture slides: PCA, cluster and factor analysis
- On ELMS
R Tutorial: Qualitative data analysis with R
DUE: Problem set 13

Schedule

Problem Set 13 Review
Theory lecture: Working quantitatively with qualitative data
Theory lecture: PCA, cluster and factor analysis
R Tutorial: Qualitative analysis in R
Final projects discussion
Wrapping up

Problem sets

Problem Set 1: Intro to R

Phillips 4.5, Q1-5
Phillips 5.4, Q1-9
Drennan Ch. 1: Using the datasets from the chapter, complete the following problems:
1. Recreate the Kiskiminetas River Valley histogram in the chapter. Note the number of bins and how the bin members are decided. Is the cutoff at the bottom or top of the range? How can you adjust this?
2. Make two histograms of the scraper length data with different bin sizes. Do you notice anything different in the data distribution when you change the number of breaks? You can load the .RData file by selecting “open with” RStudio when prompted or download the .csv file linked above.

Jump to Week 1

Problem Set 2: Data wrangling with base R and thinking through quantitative data

Phillips 7.4, Q0-10. Here you are working with this table using each column as a separate vector. This will help you better understand indexing.
Phillips 8.7, Q1-10. In this section, you answer similar questions, but using a dataframe.
Phillips 9.9, Q1-13
Urdan ch. 1 work problems. These problems ask you to think through data sampling, statistics, and types of data in an example study.

Jump to Week 2

Problem Set 3: Descriptive statistics and normal distributions

Phillips 10.6, Q1-8. You’ll notice Philips uses dplyr in this chapter. Don’t worry if pipes don’t make sense yet, we will cover this in detail when we cover tidyverse. You can complete this problem set with base R.
Drennan Ch. 2 Q1-2 and Ch. 3 Q1-3. Hint: Look at the help file for mean() if you’re stuck. For Ch. 3 questions, instead of stem-and-leaf plots, make histograms. Nanxiong data
Urdan ch. 4 work problems. These problems ask you to think through the theory behind normal distributions.

Jump to Week 3

Problem Set 4: Data visualization

R4DS: Chapter 3 Data Visualization - Section 3.2, 3.3, 3.5, 3.6 exercises
R4DS: Chapter 7 Exploratory Data Analysis - Section 7.3
Drennan Practice ch. 4 Q1-4. If you’re stuck, consult the R companion for Drennan for assistance.

Jump to Week 4

Problem Set 5: Correlation, confidence intervals, and significance

Urdan ch. 7 exercises, ch 12 exercises (in 4th edition, correlation chapter questions)

Jump to Week 5

Problem Set 6: Regression

Phillips 15.6 regression, Q1-8
Drennan ch. 15 practice. RSHoes data, Yenang data

Jump to Week 6

Problem Set 7: Data wrangling with tidyverse

R4DS: Chapter 5 Data transformation - Section 5.2.4, 5.3.1, 5.4.1 exercises. Hint: don’t forget to load the following libraries - library(tidyverse) library(nycflights13)

Jump to Week 7

Problem Set 8: t-tests, ANOVA

Phillips 13.6 t-test questions, Q1 & Q4
Drennan ch. 12 practice, Q1-4. For Q1, make a boxplot instead of a stem-and-leaf diagram. Zirconium data
Phillips 14.8 ANOVA, Q1-4
Drennan ch. 13 practice, Q1-4. ArchaicPts data and Neolithic data

Jump to Week 8

Problem Set 9: Advanced tidyverse and categorical data

R4DS: Chapter 5 Data transformation - Section 5.5.2, Q1-5, 5.6.7, Q2 exercises. Hint: use the following code to create a not_cancelled dataframe - not_cancelled <- flights %>% filter(!is.na(dep_delay), !is.na(arr_delay))
R4DS: Chapter 13 Relational Data - Section 13.2.1 [no need to code for these, think through and describe how these tasks could be accomplished], 13.4.6, Q1-2, Q4 (pick one weather variable), Q5 exercises
Phillips 13.6 chi-sq questions, Q2, Q5, Q6

Jump to Week 9

Problem Set 10: Text mining in R

With a text of your choice complete the following analyses and produce a data report. Be sure to document and justify any decisions that you make during the data cleaning and subsetting process. If you are struggling to find a text from your own work you can download one from Project Gutenberg or examine a different open-ended question from the permafrost survey we covered in class. Write up your results in the form of a brief data analysis report.

Wrangle and tidy up the text by removing stop words, missing values and any extra characters that do not add to the analysis. Produce a table of the top 50 words and bigrams. How do you interpret these results?
If there are multiple groups, sections or documents in your text, compare the word frequencies across these different subgroups. Alternatively, you may tag parts of speech and compare the frequency of different parts of speech in your text.
Produce two different figures of your choice based on your analysis of word frequencies. Make sure each figure is accompanied by a descriptive caption.
Either analyze the sentiments or create a topic model of based on your text. Create one figure or table based on this analysis. Explain how you chose which sentiments to focus on or how you created the topic model.
Interpret the results from your analyses and figures. Did you find anything surprising or noteworthy from this analysis? What further questions do you have about the text that remain unanswered? What additional data or analyses would you need in order to answer those questions?

Jump to Week 10

Problem Set 11: Cultural domain analysis and MDS

Using the Thawing Permafrost and Rural Communities Survey, answer the following questions and create the associated graphics. I’ve written the problems below with reference to question 8, but you might also consider also using question 54 (What do you see as the 3 biggest changes you will have to make in your day-to-day life because of thawing permafrost?).

Convert the columns associated with question 8 (What do you think are the 3 biggest issues that will result from thawing permafrost in this area?“) from a wide to long dataframe. Add a new column that contains the rank of each issue as named by each respondent.

The following code might help you get started: pfissues <- surv %>% select(ID, Village, X8.1..PF.Issue, X8.2..PF.Issue, X8.3..PF.Issue)

What are the top 10 most frequently mentioned issues in response to the question: “8) What do you think are the 3 biggest issues that will result from thawing permafrost in this area?”. Looking at this list, are there any issues that stand out to you? How useful is this first round of analysis on the raw data?
Remove missing values, convert to lowercase, and categorize the responses into meaningful groups. We don’t know what the overall goal of the researchers might have been, but we can create our own classifications systems for these responses. Briefly explain how you decided to group the responses and then calculate the salience for these issue groups. Produce a barchart detailing the most salient issues.

If you are stuck on how to get started, look back at the lesson on text analysis and string data wrangling.

This sample contains individuals from two different villages. Recalculate the issue salience metrics grouping responses by each community. Produce a facted figure detailing the most salient issues for each community.
Briefly reflect on what you have learned about the perception of issues related to thawing permafrost in each of the communities in the survey. In what ways is this type of analysis meaningful or less informative compared to other analytical techniques you might use on this same set of question responses?

Note: If you have a dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 11

Problem Set 12: Network analysis in R

In this problem set, we will analyze the network of character interactions from two classic Indiana Jones films: Raiders of the Lost Ark and The Last Crusade. Data are taken from the MovieGalaxies database. Using these networks, answer the following questions. Data can be downloaded here and here.

Create a summary table of the two networks. At minimum, include the following information:
- Are they directed/undirected?
- The number of nodes and edges
- Mean node betweenness, indegree, and outdegree What do these metrics tell you about the character interactions in each movie? How do you think indegree and outdegree were calculated?
Plot each network with nodes sized based on a centrality metric of choice and a clear title. Can you compare the node centrality measures across each movie? Why or why not? What do you notice about the overall structure of each movie’s network? Who are the more central nodes and who are more peripheral? How do you interpret these patterns?
Calculate the network densities for each movie. What does this tell you about the character interactions? Can you compare the density for these networks? Why or why not?
Run a community detection algorithm of choice on each network and compare the number of communities and their composition for each film. What do you think drives the modularity in these networks? You may need to look into the plot of each film in order to interpret these results.

Note: If you have a network dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 12

Problem Set 13: Advanced visualizations and analysis

This problem set asks you to step out on your own to analyze and present data using the tools you have learned throughout the semester. You can use one dataset for all the questions or different datasets for each one. Treat these prompts as mini data reports. This means you should describe the analysis used, motivation for using this particular analysis, where the data come from, how they are measured, and finally, how you interpret the results. This problem set should be submitted as a link to a Rpub. If there are other complex data analyses or visualizations that are more directly in line with your final project that you would like to make instead of those outlined below, reach out to me in advance.

Working with a dataset of your choice, create a heatmap. Document all code used to create the heatmap and label figure as though it is going to be submitted for publication. Write a figure caption describing the data used to create the figure and your interpretation of the results. Why is a heatmap the best way to represent these data?
Analyze a dataset of your choice using regression models. Create both a figure and table and interpret your results.
Create and interpret an interactive graphic of choice.

Jump to Week 14

Additional course resources

Data sets

Philip A. Loring, Anne Beaudreau, and Cecile Tang. 2020. Alaska Native Service Survey of Native Foods, Yukon River communities, 1940s-1970s. Arctic Data Center. doi:10.18739/A2GX44V7K. https://arcticdata.io/catalog/view/doi%3A10.18739%2FA2GX44V7K
William B. Bowden 2013.Perceptions and implications of thawing permafrost and climate change in two Inupiaq villages of arctic Alaska. doi:10.18739/A2GB1XH83 https://arcticdata.io/catalog/view/doi%3A10.18739%2FA23Z48]
Tiffany Stephens and Ginny Eckert. 2019. Boat-based counts of sea otters at specific sites in Southeast Alaska. Knowledge Network for Biocomplexity. urn:uuid:b910f74b-171b-4d2b-b065-fb21823a8e84. https://knb.ecoinformatics.org/view/urn%3Auuid%3Ab910f74b-171b-4d2b-b065-fb21823a8e84#urn%3Auuid%3A7eba259b-eeb5-4375-9596-e797bbb0b27d

R references

Getting started with R

Other useful tools

Version control with Git
- Git Guide
Data cleaning with OpenRefine
- OpenRefine
- Data carpentry: OpenRefine tutorial
Online publishing with Rpubs
- Getting Started with RPubs
Coding conventions
- Google’s R Style Guide
- tidyverse style guide

Specific analysis techniques

Spatial analysis
- An Introduction to Spatial Data Analysis and Visualisation in R
- Spatial Data Science with applications in R
Data visualization
Markdown
- Markdown Table Generator
- R Markdown Gallery

ANTH630: Quantification and Statistics in Applied Anthropology

Dr. Madeline Brown

Spring 2022

Course overview

Required texts

Required software

Required online texts

Additional materials

Semester schedule

Week 1: Course overview and intro to R

Schedule

Week 2: Understanding Quantitative Data

Schedule

Week 3: Distributions and Descriptive Statistics

Schedule

Week 4: Data visualization and EDA

Schedule

Week 5: Confidence and Correlation

Schedule

Week 6: Regression

Schedule

Week 7: Research design and sampling

Schedule

Week 8: Comparing sample means (t-tests and ANOVA)

Schedule

Week 9: Categorical data

Schedule

Week 10: Text mining with R

Schedule

Week 11: Cultural domain analysis and MDS

Schedule

Week 12: Network Analysis

Schedule

Week 13: Unpacking quantitative approaches

Schedule

Week 14: Quantitative analysis of qualitative data and PCA, cluster and factor analysis

Schedule

Problem sets

Problem Set 1: Intro to R

Problem Set 2: Data wrangling with base R and thinking through quantitative data

Problem Set 3: Descriptive statistics and normal distributions

Problem Set 4: Data visualization

Problem Set 5: Correlation, confidence intervals, and significance

Problem Set 6: Regression

Problem Set 7: Data wrangling with tidyverse

Problem Set 8: t-tests, ANOVA

Problem Set 9: Advanced tidyverse and categorical data

Problem Set 10: Text mining in R

Problem Set 11: Cultural domain analysis and MDS

Problem Set 12: Network analysis in R

Problem Set 13: Advanced visualizations and analysis

Additional course resources

Data sets

R references

Getting started with R

Other useful tools

Specific analysis techniques

Data products and inspiration