Course overview
This course introduces graduate students to theory and methods in quantitative anthropological and archaeological research. This is accomplished through three main themes threaded throughout the semester: 1) asking quantitative questions in anthropology, 2) statistics / data science theory, and 3) data analysis and management. In the first theme, we consider how to design quantitative anthropological research studies, scaling from question design through ethical best practices. In the second theme, students will be introduced to basic statistical theories and contemporary trends in quantitative social science. Finally, the third theme creates space for students to apply new skills in data analysis, management and visualization using R. Collectively, these themes enable students to build theoretical and methodological foundations for conducting independent quantitative anthropological research.
Looking for the archived version of this course?
Required texts
- Urdan, Timothy. 2010. Statistics in Plain English, Third Edition. Routledge.
- Bernard, H. Russell, 2006. Research Methods in Anthropology [4th edition] Rowman Altamira.
- Drennan, Robert D. Statistics for Archaeologists: A Common Sense Approach (2nd Edition), Springer.
- Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
Additional materials
Below are webpages with additional course materials and handouts. You are on your honor to not look up problem set answers online. All solutions must be presented in your own words.
Semester schedule
Week 1: Course overview and intro to R
Schedule
- Syllabus overview
- Problem set leader assignments
- Lecture on writing code for reproducible research
- R tutorial: Getting Started with R
- Collaborative work time
Week 2: Understanding Quantitative Data
Schedule
- Problem Set 1 Review
- Theory lecture: Understanding quantitative data
- R tutorial: Data wrangling with base R; RMarkdown
- Collaborative work time
Week 3: Distributions and Descriptive Statistics
Schedule
- Problem Set 2 Review
- Theory lecture: Normal distribution, central tendency, variability
- Final project discussion
- R tutorial: Descriptive statistics and advanced dataframe manipulation
- Collaborative work time
Week 4: Data visualization and EDA
Schedule
- Problem Set 3 Review
- Theory lecture: Data visualization and EDA
- R tutorial: Data visualization
- Collaborative work time
Week 5: Confidence and Correlation
- Reading
- Urdan 2010: ch. 6-8 (Standard Errors; Statistical Significance, Effect Size, Confidence Intervals; Correlation) [pg. 49-92]
- Bernard 2006, part of ch. 19, pg. 584-587
- Drennan ch. 9 (confidence and population means) [pg. 107-130]
- Extra practice: R4DS: Section 5.1-5.4 Data transformation
- R Tutorial: Tidyverse
- Lecture slides: Confidence intervals, standard error, statistical significance, correlation
- Problem set 5
- DUE: Problem set 4
- DUE: Final project prospectus due by midnight
Schedule
- Problem Set 4 Review
- Theory lecture: Confidence intervals, standard error, statistical significance, correlation
- R tutorial: Tidyverse
- Collaborative work time
Week 6: Regression
- Reading
- Urdan 2010: ch. 13 (Regression) [pg. 145-160]
- Drennan ch. 15 (Relating a measurement variable to another measurement variable) [pg. 199-220]
- Extra practice: Phillips 15 Regression
- R Tutorial: Regression in R
- Lecture slides: Regression
- Problem set 6
- DUE: Problem set 5
Schedule
- Problem Set 5 Review
- Theory lecture: Regression
- R tutorial: Regression in R
- Collaborative work time
Week 7: Research design and sampling
- Reading
- Bernard 2006: ch. 6-8, 10 (Sampling; Sampling Theory; Nonprobability Sampling and Choosing Informants; Structured Interviewing I: Questionnaires) [pg. 146-209, 251-298]
- Drennan ch. 7, 11 [Sampling; Categories and population proportions] [pg. 79-96, 139-143]
- Recommended: Fowler, Floyd. 2014. Survey Research Methods, Fifth Edition. SAGE Publications, Inc., excerpts from ch. 1, 5, 6, 7.
- R Tutorial: Final project overview and dashboard tutorial
- Lecture slides: Sampling, research design and data collection
- Problem set 7
- DUE: Problem set 6
Schedule
- Problem Set 6 Review
- Theory lecture: Sampling, research design and data collection
- R Tutorial: Flexdashboards
- Collaborative work time
Week 8: Comparing sample means (t-tests and ANOVA)
- Reading
- Urdan 2010: ch. 9 (t tests) [pg. 93-104]
- Phillips 13.1-13.3 [Hypothesis tests – t-test]
- Drennan ch. 12 [comparing two sample means][pg. 147-163]
- Urdan 2010: ch. 10 (One-way analysis of variance) [pg. 105-118]
- Drennan ch. 13 (Comparing means of more than two samples)[pg. 165-179]
- Extra practice: R4DS: Section 5.5-5.7
- Extra practice: R4DS Section 13 Relational data
- R Tutorial: t-tests, ANOVA and joining dataframes
- Lecture slides: Hypothesis tests, t-tests, ANOVA
- Problem set 8
- DUE: Problem set 7
Schedule
- Problem Set 7 Review
- Theory lecture: Hypothesis tests, t-tests, ANOVA
- R tutorial: t-tests, ANOVA and joining dataframes
- Collaborative work time
Week 9: Categorical data
- Reading
- Urdan 2010: ch. 14 (The Chi-Square Test of Independence) [pg. 161-168]
- Drennan ch. 14 (comparing proportions of different samples) [pg. 181-196]
- Extra practice: R4DS 15: Factors
- R Tutorial: Interactive graphics
- Lecture slides: Working with categorical data and chi-square tests
- Problem set 9
- DUE: Problem set 8
Schedule
- Problem Set 8 Review
- Theory lecture: Working with categorical data and chi-square tests
- R tutorial: Factors and chi-square tests
- Collaborative work time
Week 10: Text mining with R
- Reading
- Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 1-3.
- Abramson et al. 2018. The promises of computational ethnography: Improving transparency, replicability, and validity for realist approaches to ethnographic analysis. Ethnography 19(2) 254-284.
- Recommended: Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 4-6
- Extra practice: R4DS ch. 14 Strings
- R Tutorial: Text analysis
- Lecture slides: Text analysis
- Problem set 10
- DUE: Problem set 9
Schedule
- Problem Set 9 Review
- Theory lecture: Text analysis
- R tutorial: Text analysis
- Collaborative work time
Week 11: Cultural domain analysis and MDS
- Reading
- Bernard 2006: ch. 11 (Structured Interviewing II: Cultural Domain Analysis) [pg. 299-317]
- Bernard 2006, part of ch. 21: pg. 677-689
- Drennan ch. 23 (Multidimensional scaling)[pg. 285-289]
- Borgatti, Stephen. 1998. Elicitation Techniques for Cultural Domain Analysis. In J. Schensul and M. LeCompte (eds). The Ethnographers Toolkit, Vol 3. Walnut Creek, CA: Altamira Press.
- Recommended: Weller, Susan. 2014. Structured Interviewing and Questionnaire Construction.
- R Tutorial: Cultural domain analysis and MDS
- Lecture slides: Cultural domain analysis and MDS
- Problem set 11
- DUE: Problem set 10
Schedule
- Problem Set 10 Review
- Theory lecture: Cultural domain analysis and MDS
- R tutorial: Cultural domain analysis and MDS
- Collaborative work time
Week 12: Network Analysis
- Reading
- Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network Analysis in the Social Sciences. Science 323:892–896.
- Crona, B., & Bodin, Ö. (2006). What You Know is Who You Know? Communication Patterns Among Resource Users as a Prerequisite for Co-management. Ecology and Society 11(2):7.
- Ready and Power. 2018. Why Wage Earners Hunt. Current Anthropology. 59(1):74-97.
- Baggio, J. A., BurnSilver, S. B., Arenas, A., Magdanz, J. S., Kofinas, G. P., and De Domenico, M. (2016). Multiplex social ecological network analysis reveals how social changes affect community robustness more than resource depletion. PNAS 113(48):13708–13713.
- Crabtree et al. (2019). Subsistence Transitions and the Simplification of Ecological Networks in the Western Desert of Australia. Human Ecology 47:165–177.
- R Tutorial: Network analysis
- Lecture slides: Network analysis in Anthropology
- Problem set 12
- DUE: Problem set 11
Schedule
- Problem Set 11 Review
- Theory lecture: Network analysis in Anthropology
- R tutorial: Network analysis with R
- Collaborative work time
Week 13: Unpacking quantitative approaches
Reading
- Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
- Merry, Sally. 2015. Quantification and the Paradox of Measurement. Current Anthropology 56(2):205-229.
- Recommended: Merry, Sally. 2011. Measuring the World: Indicators, Human Rights and Global Governance. Current Anthropology 52:S83-S95.
Problem set 13
Lecture slides
DUE: Problem set 12
Schedule
- Problem Set 12 Review
- Theory discussion: Unpacking quantitative approaches
- Collaborative work time
Week 14: Quantitative analysis of qualitative data and PCA, cluster and factor analysis
- Reading
- Urdan 2010: ch. 15 (Factor Analysis and Reliability Analysis: Data Reduction Techniques) [pg. 169-182]
- Drennan ch. 24 (Principal components analysis)[pg. 299-303]
- Bernard 2006, part of ch. 21: pg. 689-692
- Drennan ch. 25 (cluster analysis)[pg. 309-318]
- Lecture slides: PCA, cluster and factor analysis
- R Tutorial: Qualitative data analysis with R
- DUE: Problem set 13
Schedule
- Problem Set 13 Review
- Theory lecture: Working quantitatively with qualitative data
- Theory lecture: PCA, cluster and factor analysis
- R Tutorial: Qualitative analysis in R
- Final projects discussion
- Wrapping up
Problem sets
Problem Set 1: Intro to R
- Phillips 4.5, Q1-5
- Phillips 5.4, Q1-9
- Drennan Ch. 1: Using the datasets from the chapter, complete the following problems:
- Recreate the Kiskiminetas River Valley histogram in the chapter. Note the number of bins and how the bin members are decided. Is the cutoff at the bottom or top of the range? How can you adjust this?
- Make two histograms of the scraper length data with different bin sizes. Do you notice anything different in the data distribution when you change the number of breaks? You can load the .RData file by selecting “open with” RStudio when prompted or download the .csv file linked above.
Jump to Week 1
Problem Set 2: Data wrangling with base R and thinking through quantitative data
- Phillips 7.4, Q0-10. Here you are working with this table using each column as a separate vector. This will help you better understand indexing.
- Phillips 8.7, Q1-10. In this section, you answer similar questions, but using a dataframe.
- Phillips 9.9, Q1-13
- Urdan ch. 1 work problems. These problems ask you to think through data sampling, statistics, and types of data in an example study.
Jump to Week 2
Problem Set 3: Descriptive statistics and normal distributions
- Phillips 10.6, Q1-8. You’ll notice Philips uses
dplyr
in this chapter. Don’t worry if pipes don’t make sense yet, we will cover this in detail when we cover tidyverse
. You can complete this problem set with base R.
- Drennan Ch. 2 Q1-2 and Ch. 3 Q1-3. Hint: Look at the help file for
mean()
if you’re stuck. For Ch. 3 questions, instead of stem-and-leaf plots, make histograms. Nanxiong data
- Urdan ch. 4 work problems. These problems ask you to think through the theory behind normal distributions.
Jump to Week 3
Problem Set 5: Correlation, confidence intervals, and significance
- Urdan ch. 7 exercises, ch 12 exercises (in 4th edition, correlation chapter questions)
Jump to Week 5
Problem Set 9: Advanced tidyverse and categorical data
- R4DS: Chapter 5 Data transformation - Section 5.5.2, Q1-5, 5.6.7, Q2 exercises. Hint: use the following code to create a
not_cancelled
dataframe - not_cancelled <- flights %>% filter(!is.na(dep_delay), !is.na(arr_delay))
- R4DS: Chapter 13 Relational Data - Section 13.2.1 [no need to code for these, think through and describe how these tasks could be accomplished], 13.4.6, Q1-2, Q4 (pick one weather variable), Q5 exercises
- Phillips 13.6 chi-sq questions, Q2, Q5, Q6
Jump to Week 9
Problem Set 10: Text mining in R
With a text of your choice complete the following analyses and produce a data report. Be sure to document and justify any decisions that you make during the data cleaning and subsetting process. If you are struggling to find a text from your own work you can download one from Project Gutenberg or examine a different open-ended question from the permafrost survey we covered in class. Write up your results in the form of a brief data analysis report.
Wrangle and tidy up the text by removing stop words, missing values and any extra characters that do not add to the analysis. Produce a table of the top 50 words and bigrams. How do you interpret these results?
If there are multiple groups, sections or documents in your text, compare the word frequencies across these different subgroups. Alternatively, you may tag parts of speech and compare the frequency of different parts of speech in your text.
Produce two different figures of your choice based on your analysis of word frequencies. Make sure each figure is accompanied by a descriptive caption.
Either analyze the sentiments or create a topic model of based on your text. Create one figure or table based on this analysis. Explain how you chose which sentiments to focus on or how you created the topic model.
Interpret the results from your analyses and figures. Did you find anything surprising or noteworthy from this analysis? What further questions do you have about the text that remain unanswered? What additional data or analyses would you need in order to answer those questions?
Jump to Week 10
Problem Set 11: Cultural domain analysis and MDS
Using the Thawing Permafrost and Rural Communities Survey, answer the following questions and create the associated graphics. I’ve written the problems below with reference to question 8, but you might also consider also using question 54 (What do you see as the 3 biggest changes you will have to make in your day-to-day life because of thawing permafrost?).
- Convert the columns associated with question 8 (What do you think are the 3 biggest issues that will result from thawing permafrost in this area?“) from a wide to long dataframe. Add a new column that contains the rank of each issue as named by each respondent.
The following code might help you get started: pfissues <- surv %>% select(ID, Village,
X8.1..PF.Issue, X8.2..PF.Issue, X8.3..PF.Issue)
What are the top 10 most frequently mentioned issues in response to the question: “8) What do you think are the 3 biggest issues that will result from thawing permafrost in this area?”. Looking at this list, are there any issues that stand out to you? How useful is this first round of analysis on the raw data?
Remove missing values, convert to lowercase, and categorize the responses into meaningful groups. We don’t know what the overall goal of the researchers might have been, but we can create our own classifications systems for these responses. Briefly explain how you decided to group the responses and then calculate the salience for these issue groups. Produce a barchart detailing the most salient issues.
If you are stuck on how to get started, look back at the lesson on text analysis and string data wrangling.
This sample contains individuals from two different villages. Recalculate the issue salience metrics grouping responses by each community. Produce a facted figure detailing the most salient issues for each community.
Briefly reflect on what you have learned about the perception of issues related to thawing permafrost in each of the communities in the survey. In what ways is this type of analysis meaningful or less informative compared to other analytical techniques you might use on this same set of question responses?
Note: If you have a dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.
Jump to Week 11
Problem Set 12: Network analysis in R
In this problem set, we will analyze the network of character interactions from two classic Indiana Jones films: Raiders of the Lost Ark and The Last Crusade. Data are taken from the MovieGalaxies database. Using these networks, answer the following questions. Data can be downloaded here and here.
Create a summary table of the two networks. At minimum, include the following information:
- Are they directed/undirected?
- The number of nodes and edges
- Mean node betweenness, indegree, and outdegree What do these metrics tell you about the character interactions in each movie? How do you think indegree and outdegree were calculated?
Plot each network with nodes sized based on a centrality metric of choice and a clear title. Can you compare the node centrality measures across each movie? Why or why not? What do you notice about the overall structure of each movie’s network? Who are the more central nodes and who are more peripheral? How do you interpret these patterns?
Calculate the network densities for each movie. What does this tell you about the character interactions? Can you compare the density for these networks? Why or why not?
Run a community detection algorithm of choice on each network and compare the number of communities and their composition for each film. What do you think drives the modularity in these networks? You may need to look into the plot of each film in order to interpret these results.
Note: If you have a network dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.
Jump to Week 12
Problem Set 13: Advanced visualizations and analysis
This problem set asks you to step out on your own to analyze and present data using the tools you have learned throughout the semester. You can use one dataset for all the questions or different datasets for each one. Treat these prompts as mini data reports. This means you should describe the analysis used, motivation for using this particular analysis, where the data come from, how they are measured, and finally, how you interpret the results. This problem set should be submitted as a link to a Rpub. If there are other complex data analyses or visualizations that are more directly in line with your final project that you would like to make instead of those outlined below, reach out to me in advance.
Working with a dataset of your choice, create a heatmap. Document all code used to create the heatmap and label figure as though it is going to be submitted for publication. Write a figure caption describing the data used to create the figure and your interpretation of the results. Why is a heatmap the best way to represent these data?
Analyze a dataset of your choice using regression models. Create both a figure and table and interpret your results.
Create and interpret an interactive graphic of choice.
Jump to Week 14
Additional course resources
Specific analysis techniques
- Spatial analysis
- Data visualization
- Markdown