Course overview

This course introduces graduate students to theory and methods in quantitative anthropological and archaeological research. This is accomplished through three main themes threaded throughout the semester: 1) asking quantitative questions in anthropology, 2) statistics / data science theory, and 3) data analysis and management. In the first theme, we consider how to design quantitative anthropological research studies, scaling from question design through ethical best practices. In the second theme, students will be introduced to basic statistical theories and contemporary trends in quantitative social science. Finally, the third theme creates space for students to apply new skills in data analysis, management and visualization using R. Collectively, these themes enable students to build theoretical and methodological foundations for conducting independent quantitative anthropological research.

Required texts

Urdan, Timothy. 2010. Statistics in Plain English, Third Edition. Routledge.
Bernard, H. Russell, 2006. Research Methods in Anthropology [4th edition] Rowman Altamira.
Drennan, Robert D. Statistics for Archaeologists: A Common Sense Approach (2nd Edition), Springer.
Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.

Required software

R
RStudio

Required online texts

Additional materials

Below are webpages with additional course materials and handouts. You are on your honor to not look up problem set answers online. All solutions must be presented in your own words.

Semester schedule

Week 1: Course overview and intro to R

Reading
- Bernard 2006: ch. 16 (Introduction to Qualitative and Quantitative Analysis) [pg. 451-462]
- Drennan ch. 1 (Batches of Numbers) [pg. 3-15]
- Phillips ch. 2 Installing R and RStudio
Extra Practice: Phillips 4-6 The Basics, Scalars and Vectors, Vector functions
R Tutorial: Getting Started with R
Lecture slides: Introduction to quantitative research in Anthropology
- On ELMS
Problem set 1

Schedule

Syllabus overview
- Problem set leader assignments
Getting R installed
R tutorial: Getting Started with R
Collaborative work time

Week 2: Understanding Quantitative Data

Reading
- Urdan 2010: ch. 1 (Introduction to Social Science Research) [pg1-11]
Extra practice: Phillips 7-9 Indexing Vectors, Matrices and Dataframes, Importing, saving and managing data
R Tutorial: Data wrangling with base R
Lecture slides: Variables, metrics, and databases
- On ELMS
Problem set 2
DUE: Problem set 1

Schedule

Problem Set 1 Review
Theory lecture: Understanding quantitative data
R tutorial: Data wrangling with base R
Collaborative work time

Week 3: Distributions and Descriptive Statistics

Reading
- Urdan 2010: ch. 2-4 (Measures of Central Tendency, Measures of Variability, The Normal Distribution) [pg. 13-36]
- Drennan ch. 2-3 (The Level or Center of a Batch; The Spread or Dispersion of a Batch) [pg. 17-36]
Extra practice: Phillips 10 Advanced Dataframe Manipulation
R Tutorial: Descriptive statistics and advanced dataframe manipulation
Lecture slides: Normal distribution, central tendency, variability
- On ELMS
Problem set 3
DUE: Problem set 2

Schedule

Problem Set 2 Review
Theory lecture: Normal distribution, central tendency, variability
R tutorial: Descriptive statistics and advanced dataframe manipulation
Collaborative work time

Week 4: Data visualization and EDA

Reading
- Drennan ch 4, 6 (Comparing Batches; Categories) [pg. 37-49, 63-75]
Extra practice: R4DS: Section 3 Data visualization, R4DS: 7 Exploratory Data Analysis
R Tutorial: Data visualization
Lecture slides: Data visualization and EDA
- On ELMS
Problem set 4
DUE: Problem set 3

Schedule

Problem Set 3 Review
Theory lecture: Data visualization and EDA
R tutorial: Data visualization
Collaborative work time

Week 5: Confidence and Correlation

Reading
- Urdan 2010: ch. 6-8 (Standard Errors; Statistical Significance, Effect Size, Confidence Intervals; Correlation) [pg. 49-92]
- Bernard 2006, part of ch. 19, pg. 584-587
- Drennan ch. 9 (confidence and population means) [pg. 107-130]
Extra practice: R4DS: Section 5.1-5.4 Data transformation
R Tutorial: Tidyverse
Lecture slides: Confidence intervals, standard error, statistical significance, correlation
- On ELMS
Problem set 5
DUE: Problem set 4
DUE: Final project prospectus due by midnight

Schedule

Problem Set 4 Review
Theory lecture: Confidence intervals, standard error, statistical significance, correlation
R tutorial: Tidyverse
Collaborative work time

Week 6: Regression

Reading
- Urdan 2010: ch. 13 (Regression) [pg. 145-160]
- Drennan ch. 15 (Relating a measurement variable to another measurement variable) [pg. 199-220]
Extra practice: Phillips 15 Regression
R Tutorial: Regression in R
Lecture slides: Regression
- On ELMS
Problem set 6
DUE: Problem set 5

Schedule

Problem Set 5 Review
Theory lecture: Regression
R tutorial: Regression in R
Collaborative work time

Week 7: Research design and sampling

Reading
- Bernard 2006: ch. 6-8, 10 (Sampling; Sampling Theory; Nonprobability Sampling and Choosing Informants; Structured Interviewing I: Questionnaires) [pg. 146-209, 251-298]
- Drennan ch. 7, 11 [Sampling; Categories and population proportions] [pg. 79-96, 139-143]
- Recommended: Fowler, Floyd. 2014. Survey Research Methods, Fifth Edition. SAGE Publications, Inc., excerpts from ch. 1, 5, 6, 7.
R Tutorial: Final project overview and dashboard tutorial
Lecture slides: Sampling, research design and data collection
- On ELMS
Problem set 7
DUE: Problem set 6

Schedule

Problem Set 6 Review
Theory lecture: Sampling, research design and data collection
R Tutorial: Flexdashboards
Collaborative work time

SPRING BREAK

Week 8: Comparing two sample means

Reading
- Urdan 2010: ch. 9 (t tests) [pg. 93-104]
- Phillips 13.1-13.3 [Hypothesis tests – t-test]
- Drennan ch. 12 [comparing two sample means][pg. 147-163]
Extra practice: R4DS: Section 5.5-5.7
R Tutorial: Advanced tidyverse and t-tests
Lecture slides: Hypothesis tests and t-tests
- On ELMS
Problem set 8
DUE: Problem set 7

Schedule

Problem Set 7 Review
Theory lecture: Hypothesis tests and t-tests
R tutorial: Advanced tidyverse and t-tests
Collaborative work time

Week 9: Comparing >2 sample means

Reading
- Urdan 2010: ch. 10 (One-way analysis of variance) [pg. 105-118]
- Drennan ch. 13 (Comparing means of more than two samples)[pg. 165-179]
Extra practice: R4DS Section 13 Relational data
R Tutorial: Relational data and ANOVA
Lecture slides: ANOVA
- On ELMS
Problem set 9
DUE: Problem set 8

Schedule

Problem Set 8 Review
Theory lecture: ANOVA
R tutorial: Relational data and ANOVA
Collaborative work time

Week 10: Categorical data

Reading
- Urdan 2010: ch. 14 (The Chi-Square Test of Independence) [pg. 161-168]
- Drennan ch. 14 (comparing proportions of different samples) [pg. 181-196]
Extra practice: R4DS 15: Factors
R Tutorial: Interactive graphics
Lecture slides: Working with categorical data and chi-square tests
- On ELMS
Problem set 10
DUE: Problem set 9

Schedule

Problem Set 9 Review
Theory lecture: Working with categorical data and chi-square tests
R tutorial: Factors and chi-square tests
Collaborative work time

Week 11: Special Topics: Text mining with R

Reading
- Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 1-3.
- Abramson et al. 2018. The promises of computational ethnography: Improving transparency, replicability, and validity for realist approaches to ethnographic analysis. Ethnography 19(2) 254-284.
- Recommended: Silge, Julia and David Robinson. 2017. Tidy Text Mining with R. Ch. 4-6
Extra practice: R4DS ch. 14 Strings
R Tutorial: Text analysis
Lecture slides: Text analysis
- On ELMS
Problem set 11
DUE: Problem set 10

Schedule

Problem Set 10 Review
Theory lecture: Text analysis
R tutorial: Text analysis
Collaborative work time

Week 12: Special Topics: Cultural domain analysis and MDS

Reading
- Bernard 2006: ch. 11 (Structured Interviewing II: Cultural Domain Analysis) [pg. 299-317]
- Bernard 2006, part of ch. 21: pg. 677-689
- Drennan ch. 23 (Multidimensional scaling)[pg. 285-289]
- Borgatti, Stephen. 1998. Elicitation Techniques for Cultural Domain Analysis. In J. Schensul and M. LeCompte (eds). The Ethnographers Toolkit, Vol 3. Walnut Creek, CA: Altamira Press.
- Recommended: Weller, Susan. 2014. Structured Interviewing and Questionnaire Construction.
R Tutorial: Cultural domain analysis and MDS
Lecture slides: Cultural domain analysis and MDS
- On ELMS
Problem set 12
DUE: Problem set 11

Schedule

Problem Set 11 Review
Theory lecture: Cultural domain analysis and MDS
R tutorial: Cultural domain analysis and MDS
Collaborative work time

Week 13: Special Topics: Networks

Reading
- Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network Analysis in the Social Sciences. Science 323:892–896.
- Crona, B., & Bodin, Ö. (2006). What You Know is Who You Know? Communication Patterns Among Resource Users as a Prerequisite for Co-management. Ecology and Society 11(2):7.
- Ready and Power. 2018. Why Wage Earners Hunt. Current Anthropology. 59(1):74-97.
- Baggio, J. A., BurnSilver, S. B., Arenas, A., Magdanz, J. S., Kofinas, G. P., and De Domenico, M. (2016). Multiplex social ecological network analysis reveals how social changes affect community robustness more than resource depletion. PNAS 113(48):13708–13713.
- Crabtree et al. (2019). Subsistence Transitions and the Simplification of Ecological Networks in the Western Desert of Australia. Human Ecology 47:165–177.
R Tutorial: Network analysis
Lecture slides: Network analysis in Anthropology
- On ELMS
Problem set 13
DUE: Problem set 12

Schedule

Problem Set 12 Review
Theory lecture: Network analysis in Anthropology
R tutorial: Network analysis with R
Collaborative work time

Week 14: Unpacking quantitative approaches

Reading
- Walter, Maggie and Chris Andersen. 2013. Indigenous Statistics: A Quantitative Research Methodology. Routledge.
- Merry, Sally. 2015. Quantification and the Paradox of Measurement. Current Anthropology 56(2):205-229.
- Recommended: Merry, Sally. 2011. Measuring the World: Indicators, Human Rights and Global Governance. Current Anthropology 52:S83-S95.
Lecture slides
- On ELMS
DUE: Problem set 13

Schedule

Problem Set 13 Review
Theory discussion: Unpacking quantitative approaches
Collaborative work time

Week 15: PCA, cluster and factor analysis

Reading
- Urdan 2010: ch. 15 (Factor Analysis and Reliability Analysis: Data Reduction Techniques) [pg. 169-182]
- Drennan ch. 24 (Principal components analysis)[pg. 299-303]
- Bernard 2006, part of ch. 21: pg. 689-692
- Drennan ch. 25 (cluster analysis)[pg. 309-318]
Lecture slides: PCA, cluster and factor analysis
- On ELMS

Schedule

Problem Set 14 Review
Theory lecture: PCA, cluster and factor analysis
Wrapping up

Problem sets

Problem Set 1: Intro to R

Phillips 4.5, Q1-5
Phillips 5.4, Q1-9
Drennan Ch. 1: Using the datasets from the chapter, complete the following problems:
1. Recreate the Kiskiminetas River Valley histogram in the chapter. Note the number of bins and how the bin members are decided. Is the cutoff at the bottom or top of the range? How can you adjust this?
2. Make two histogram of the scraper length data with different bin sizes. Do you notice anything different in the data distribution when you change the number of breaks? You can load the .RData file by selecting “open with” RStudio.

Jump to Week 1

Problem Set 2: Data wrangling with base R and thinking through quantitative data

Phillips 7.4, Q0-10. Here you are working with this table using each column as a separate vector. This will help you better understand indexing.
Phillips 8.7, Q1-10. In this section, you answer similar questions, but using a dataframe.
Phillips 9.9, Q1-13
Urdan ch. 1 work problems. These problems ask you to think through data sampling, statistics, and types of data in an example study.

Jump to Week 2

Problem Set 3: Descriptive statistics and normal distributions

Phillips 10.6, Q1-8. You’ll notice Philips uses dplyr in this chapter. Don’t worry if pipes don’t make sense yet, we will cover this in detail when we cover tidyverse. You can complete this problem set with base R.
Drennan Ch. 2 Q1-2 and Ch. 3 Q1-3. Hint: Look at the help file for mean() if you’re stuck. For Ch. 3 questions, instead of stem-and-leaf plots, make histograms.
Urdan ch. 4 work problems. These problems ask you to think through the theory behind normal distributions.

Jump to Week 3

Problem Set 4: Data visualization

R4DS: Chapter 3 Data Visualization - Section 3.2, 3.3, 3.5, 3.6 exercises
R4DS: Chapter 7 Exploratory Data Analysis - Section 7.3
Drennan Practice ch. 4 Q1-4. If you’re stuck, consult the R companion for Drennan for assistance.

Jump to Week 4

Problem Set 5: Correlation, confidence intervals, and significance

Urdan ch. 7 exercises, ch 12 exercises (in 4th edition, correlation chapter questions)

Jump to Week 5

Problem Set 6: Regression

Phillips 15.6 regression, Q1-8
Drennan ch. 15 practice

Jump to Week 6

Problem Set 7: Data wrangling with tidyverse

R4DS: Chapter 5 Data transformation - Section 5.2.4, 5.3.1, 5.4.1 exercises. Hint: don’t forget to load the following libraries - library(tidyverse) library(nycflights13)

Jump to Week 7

Problem Set 8: t-tests in R

R4DS: Chapter 5 Data transformation - Section 5.5.2, Q1-5, 5.6.7, Q2 exercises. Hint: use the following code to create a not_cancelled dataframe - not_cancelled <- flights %>% filter(!is.na(dep_delay), !is.na(arr_delay))
Phillips 13.6 t-test questions, Q1 & Q4
Drennan ch. 12 practice, Q1-4. For Q1, make a boxplot instead of a stem-and-leaf diagram.

Jump to Week 8

Problem Set 9: ANOVA

R4DS: Chapter 13 Relational Data - Section 13.2.1 [no need to code for these, think through and describe how these tasks could be accomplished], 13.4.6, Q1-2, Q4 (pick one weather variable), Q5 exercises
Phillips 14.8 ANOVA, Q1-4
Drennan ch. 13 practice, Q1-4.

Jump to Week 9

Problem Set 10: Categorical data

Phillips 13.6 chi-sq questions, Q2, Q5, Q6

Jump to Week 10

Problem Set 11: Text mining in R

With a text of your choice complete the following analyses and produce a data report. Be sure to document and justify any decisions that you make during the data cleaning and subsetting process. If you are struggling to find a text from your own work you can download one from Project Gutenberg or examine a different open-ended question from the permafrost survey we covered in class. Write up your results in the form of a brief data analysis report.

Wrangle and tidy up the text by removing stop words, missing values and any extra characters that do not add to the analysis. Produce a table of the top 50 words and bigrams. How do you interpret these results?
If there are multiple groups, sections or documents in your text, compare the word frequencies across these different subgroups. Alternatively, you may tag parts of speech and compare the frequency of different parts of speech in your text.
Produce two different figures of your choice based on your analysis of word frequencies. Make sure each figure is accompanied by a descriptive caption.
Either analyze the sentiments or create a topic model of based on your text. Create one figure or table based on this analysis. Explain how you chose which sentiments to focus on or how you created the topic model.
Interpret the results from your analyses and figures. Did you find anything surprising or noteworthy from this analysis? What further questions do you have about the text that remain unanswered? What additional data or analyses would you need in order to answer those questions?

Jump to Week 11

Problem Set 12: Cultural domain analysis and MDS

Using the Thawing Permafrost and Rural Communities Survey, answer the following questions and create the associated graphics. I’ve written the problems below with reference to question 8, but you might also consider also using question 54 (What do you see as the 3 biggest changes you will have to make in your day-to-day life because of thawing permafrost?).

Convert the columns associated with question 8 (What do you think are the 3 biggest issues that will result from thawing permafrost in this area?") from a wide to long dataframe. Add a new column that contains the rank of each issue as named by each respondent.

The following code might help you get started: pfissues <- surv %>% select(ID, Village, X8.1..PF.Issue, X8.2..PF.Issue, X8.3..PF.Issue)

What are the top 10 most frequently mentioned issues in response to the question: “8) What do you think are the 3 biggest issues that will result from thawing permafrost in this area?”. Looking at this list, are there any issues that stand out to you? How useful is this first round of analysis on the raw data?
Remove missing values, convert to lowercase, and categorize the responses into meaningful groups. We don’t know what the overall goal of the researchers might have been, but we can create our own classifications systems for these responses. Briefly explain how you decided to group the responses and then calculate the salience for these issue groups. Produce a barchart detailing the most salient issues.

If you are stuck on how to get started, look back at the lesson on text analysis and string data wrangling.

This sample contains individuals from two different villages. Recalculate the issue salience metrics grouping responses by each community. Produce a facted figure detailing the most salient issues for each community.
Briefly reflect on what you have learned about the perception of issues related to thawing permafrost in each of the communities in the survey. In what ways is this type of analysis meaningful or less informative compared to other analytical techniques you might use on this same set of question responses?

Note: If you have a dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 12

Problem Set 13: Network analysis in R

In this problem set, we will analyze the network of character interactions from two classic Indiana Jones films: Raiders of the Lost Ark and The Last Crusade. Data are taken from the MovieGalaxies database. Using these networks, answer the following questions. Data can be downloaded here and here.

Create a summary table of the two networks. At minimum, include the following information:
- Are they directed/undirected?
- The number of nodes and edges
- Mean node betweenness, indegree, and outdegree What do these metrics tell you about the character interactions in each movie? How do you think indegree and outdegree were calculated?
Plot each network with nodes sized based on a centrality metric of choice and a clear title. Can you compare the node centrality measures across each movie? Why or why not? What do you notice about the overall structure of each movie’s network? Who are the more central nodes and who are more peripheral? How do you interpret these patterns?
Calculate the network densities for each movie. What does this tell you about the character interactions? Can you compare the density for these networks? Why or why not?
Run a community detection algorithm of choice on each network and compare the number of communities and their composition for each film. What do you think drives the modularity in these networks? You may need to look into the plot of each film in order to interpret these results.

Note: If you have a network dataset from your own research that you would like to analyze in place of the provided datasets, please contact me in advance and we can discuss alternatives.

Jump to Week 13

Additional course resources

Data sets

Philip A. Loring, Anne Beaudreau, and Cecile Tang. 2020. Alaska Native Service Survey of Native Foods, Yukon River communities, 1940s-1970s. Arctic Data Center. doi:10.18739/A2GX44V7K. https://arcticdata.io/catalog/view/doi%3A10.18739%2FA2GX44V7K
William B. Bowden 2013.Perceptions and implications of thawing permafrost and climate change in two Inupiaq villages of arctic Alaska. doi:10.18739/A2GB1XH83 https://arcticdata.io/catalog/view/doi%3A10.18739%2FA23Z48]
Tiffany Stephens and Ginny Eckert. 2019. Boat-based counts of sea otters at specific sites in Southeast Alaska. Knowledge Network for Biocomplexity. urn:uuid:b910f74b-171b-4d2b-b065-fb21823a8e84. https://knb.ecoinformatics.org/view/urn%3Auuid%3Ab910f74b-171b-4d2b-b065-fb21823a8e84#urn%3Auuid%3A7eba259b-eeb5-4375-9596-e797bbb0b27d

R references

Getting started with R

Other useful tools

Version control with Git
- Git Guide
Data cleaning with OpenRefine
- OpenRefine
- Data carpentry: OpenRefine tutorial
Online publishing with Rpubs
- Getting Started with RPubs

Specific analysis techniques

Spatial analysis
- An Introduction to Spatial Data Analysis and Visualisation in R
- Spatial Data Science with applications in R
Data visualization
- Data Visualization with R
Markdown
- Markdown Table Generator
- R Markdown Gallery

ANTH630: Quantification and Statistics in Applied Anthropology

Dr. Madeline Brown

Spring 2021

Course overview

Required texts

Required software

Required online texts

Additional materials

Semester schedule

Week 1: Course overview and intro to R

Schedule

Week 2: Understanding Quantitative Data

Schedule

Week 3: Distributions and Descriptive Statistics

Schedule

Week 4: Data visualization and EDA

Schedule

Week 5: Confidence and Correlation

Schedule

Week 6: Regression

Schedule

Week 7: Research design and sampling

Schedule

SPRING BREAK

Week 8: Comparing two sample means

Schedule

Week 9: Comparing >2 sample means

Schedule

Week 10: Categorical data

Schedule

Week 11: Special Topics: Text mining with R

Schedule

Week 12: Special Topics: Cultural domain analysis and MDS

Schedule

Week 13: Special Topics: Networks

Schedule

Week 14: Unpacking quantitative approaches

Schedule

Week 15: PCA, cluster and factor analysis

Schedule

Problem sets

Problem Set 1: Intro to R

Problem Set 2: Data wrangling with base R and thinking through quantitative data

Problem Set 3: Descriptive statistics and normal distributions

Problem Set 4: Data visualization

Problem Set 5: Correlation, confidence intervals, and significance

Problem Set 6: Regression

Problem Set 7: Data wrangling with tidyverse

Problem Set 8: t-tests in R

Problem Set 9: ANOVA

Problem Set 10: Categorical data

Problem Set 11: Text mining in R

Problem Set 12: Cultural domain analysis and MDS

Problem Set 13: Network analysis in R

Additional course resources

Data sets

R references

Getting started with R

Other useful tools

Specific analysis techniques