ANTH475 Home

This lesson introduces R basics including exploratory data analysis with the tidyverse package in R.

Installing packages

Base R has many useful functions but where R really shines is through the 22,977 and counting packages that you can download to enhance R’s functionality.

Let’s install and then load the tidyverse suite of packages. You only need to install a package once, but you have to load the library every time you start a new R session.

#install.packages("tidyverse")
library(tidyverse)

Packages can also be installed by using the “Tools” –> “Install Packages” menu in RStudio.

Loading data files

Let’s start working with some real data. Here we will work with the Open Data DC Urban Forestry Street Trees dataset. First, download the dataset and then we will load it into R. For the tutorial, I will be loading an older version of this file that I have uploaded online. This means our output may look a bit different.

Details on how to read data files from a Windows operating system: intro2r link.

urbantrees <- read.csv("https://maddiebrown.github.io/ANTH630/data/Urban_Forestry_Street_Trees_2024.csv")

Examining data structure

Let’s examine the structure of our dataset.

str(urbantrees)
'data.frame':   211117 obs. of  54 variables:
 $ X               : num  -77 -77 -77 -77 -77 ...
 $ Y               : num  38.9 38.9 38.9 38.9 38.9 ...
 $ SCI_NM          : chr  "Quercus montana" "Acer rubrum" "Quercus robur fastigiata" "Tilia americana" ...
 $ CMMN_NM         : chr  "Rock chestnut oak" "Red maple" "Columnar English oak" "American linden" ...
 $ GENUS_NAME      : chr  "Quercus" "Acer" "Quercus" "Tilia" ...
 $ FAM_NAME        : chr  "Fagaceae" "Sapindaceae" "Fagaceae" "Tiliaceae" ...
 $ DATE_PLANT      : chr  "2018/02/01 18:50:34+00" "" "" "" ...
 $ FACILITYID      : chr  "31982-090-3001-0269-000" "31982-100-3005-0155-000" "10150-300-3001-0050-000" "32691-092-3001-0105-000" ...
 $ VICINITY        : chr  "922 C ST SE" "1017 C ST SE" "3029 15TH ST NW" "904 D ST SE" ...
 $ WARD            : int  6 6 1 6 6 1 6 1 6 1 ...
 $ TBOX_L          : num  99 8 6 9 8 99 12 9 9 12 ...
 $ TBOX_W          : num  7 4 3 4 4 4 4 3 4 5 ...
 $ WIRES           : chr  "None" "None" "None" "None" ...
 $ CURB            : chr  "Permanent" "Permanent" "Permanent" "Permanent" ...
 $ SIDEWALK        : chr  "Permanent" "Permanent" "Permanent" "Permanent" ...
 $ TBOX_STAT       : chr  "Plant" "Plant" "Plant" "Plant" ...
 $ RETIREDDT       : chr  "" "" "" "" ...
 $ DBH             : num  5.7 17.7 10.9 13.4 11.9 9.3 1.6 5.5 24.5 21 ...
 $ DISEASE         : chr  "" "" "" "" ...
 $ PESTS           : chr  "" "" "" "" ...
 $ CONDITION       : chr  "Excellent" "Fair" "Fair" "Good" ...
 $ CONDITIODT      : chr  "2024/02/28 23:57:09+00" "2021/02/17 22:21:46+00" "2021/09/13 18:55:03+00" "2020/02/14 01:33:24+00" ...
 $ OWNERSHIP       : chr  "UFA" "UFA" "UFA" "UFA" ...
 $ TREE_NOTES      : chr  "Elevated street side. Feb 2024." "P dead wood only and r small mulberry at base, be careful of roots" "" "" ...
 $ MBG_WIDTH       : num  13.1 39.4 29.5 29.5 39.4 ...
 $ MBG_LENGTH      : num  19.7 45.9 46.5 45.9 45.9 ...
 $ MBG_ORIENTATION : num  90 90 163 0 90 ...
 $ MAX_CROWN_HEIGHT: num  18.9 45.9 37.4 41.5 32.6 ...
 $ MAX_MEAN        : num  14.3 30.7 21.3 22.6 21.2 ...
 $ MIN_CROWN_BASE  : num  0.0533 -0.1557 -0.2178 0.1589 -0.1809 ...
 $ DTM_MEAN        : num  82.3 81.2 202.9 77 81.1 ...
 $ PERIM           : num  65.6 183.7 170.6 164 177.2 ...
 $ CROWN_AREA      : num  215 1259 743 1130 1119 ...
 $ CICADA_SURVEY   : chr  "" "" "" "" ...
 $ ONEYEARPHOTO    : chr  "" "" "" "" ...
 $ SPECIALPHOTO    : chr  "" "" "" "" ...
 $ PHOTOREMARKS    : chr  "" "" "" "" ...
 $ ELEVATION       : chr  "Unknown" "Unknown" "Unknown" "Unknown" ...
 $ SIGN            : chr  "Unknown" "Unknown" "Unknown" "Unknown" ...
 $ TRRS            : int  NA NA NA NA NA NA NA NA NA NA ...
 $ WARRANTY        : chr  "2017-2018" "Unknown" "Unknown" "Unknown" ...
 $ CREATED_USER    : chr  "" "" "" "" ...
 $ CREATED_DATE    : chr  "" "" "" "" ...
 $ EDITEDBY        : chr  "sward" "jchapman" "mmcphee" "sward" ...
 $ LAST_EDITED_USER: chr  "sward" "jchapman" "mmcphee" "sward" ...
 $ LAST_EDITED_DATE: chr  "2024/02/28 23:57:52+00" "2021/02/17 22:21:47+00" "2021/09/13 18:54:32+00" "2020/02/14 01:34:14+00" ...
 $ GIS_ID          : logi  NA NA NA NA NA NA ...
 $ GLOBALID        : chr  "{0B358D52-AAD4-41AC-B1AF-B19740DBC02A}" "{0F7845B3-E5DE-480B-96EC-B595354BCA5C}" "{EA1C7F1D-8FF6-4A3A-BFBD-0147BABCA5F7}" "{ADB853B2-E32F-4BB4-B949-DE7B5656DCD5}" ...
 $ CREATOR         : logi  NA NA NA NA NA NA ...
 $ CREATED         : logi  NA NA NA NA NA NA ...
 $ EDITOR          : logi  NA NA NA NA NA NA ...
 $ EDITED          : logi  NA NA NA NA NA NA ...
 $ SHAPE           : logi  NA NA NA NA NA NA ...
 $ OBJECTID        : int  40100904 40100905 40100906 40100907 40100908 40100909 40100910 40100911 40100912 40101121 ...

We can also look at the first six rows.

head(urbantrees)
          X        Y                   SCI_NM              CMMN_NM GENUS_NAME
1 -76.99281 38.88609          Quercus montana    Rock chestnut oak    Quercus
2 -76.99206 38.88599              Acer rubrum            Red maple       Acer
3 -77.03567 38.92727 Quercus robur fastigiata Columnar English oak    Quercus
4 -76.99334 38.88417          Tilia americana      American linden      Tilia
5 -76.99838 38.88728         Acer platanoides         Norway maple       Acer
6 -77.03931 38.92800           Quercus lyrata          Overcup oak    Quercus
     FAM_NAME             DATE_PLANT              FACILITYID          VICINITY
1    Fagaceae 2018/02/01 18:50:34+00 31982-090-3001-0269-000       922 C ST SE
2 Sapindaceae                        31982-100-3005-0155-000      1017 C ST SE
3    Fagaceae                        10150-300-3001-0050-000   3029 15TH ST NW
4   Tiliaceae                        32691-092-3001-0105-000       904 D ST SE
5 Sapindaceae                        30060-020-3001-0101-000     208 6TH ST SE
6    Fagaceae 2011/02/17 05:00:00+00 14582-160-3005-0656-000 1653 HOBART ST NW
  WARD TBOX_L TBOX_W WIRES      CURB  SIDEWALK TBOX_STAT RETIREDDT  DBH
1    6     99      7  None Permanent Permanent     Plant            5.7
2    6      8      4  None Permanent Permanent     Plant           17.7
3    1      6      3  None Permanent Permanent     Plant           10.9
4    6      9      4  None Permanent Permanent     Plant           13.4
5    6      8      4  None Permanent Permanent     Plant           11.9
6    1     99      4  None Permanent Permanent     Plant            9.3
    DISEASE PESTS CONDITION             CONDITIODT OWNERSHIP
1                 Excellent 2024/02/28 23:57:09+00       UFA
2                      Fair 2021/02/17 22:21:46+00       UFA
3                      Fair 2021/09/13 18:55:03+00       UFA
4                      Good 2020/02/14 01:33:24+00       UFA
5                      Good 2020/09/16 19:38:17+00       UFA
6 Hypoxylon            Dead 2023/05/22 19:49:55+00       UFA
                                                          TREE_NOTES MBG_WIDTH
1                                    Elevated street side. Feb 2024.  13.12336
2 P dead wood only and r small mulberry at base, be careful of roots  39.37008
3                                                                     29.53926
4                                                                     29.52756
5 Arborist removed some deadwood and scheduled for pruning on 1/5/17  39.37008
6                                                                     29.52756
  MBG_LENGTH MBG_ORIENTATION MAX_CROWN_HEIGHT MAX_MEAN MIN_CROWN_BASE  DTM_MEAN
1   19.68504         90.0000         18.91814 14.26427     0.05331409  82.26296
2   45.93176         90.0000         45.90728 30.68867    -0.15571680  81.22527
3   46.50863        163.3008         37.41346 21.32403    -0.21777124 202.87526
4   45.93176          0.0000         41.53025 22.57074     0.15885049  77.00650
5   45.93176         90.0000         32.59466 21.19249    -0.18092014  81.08643
6   32.80840          0.0000         37.61407 18.87533    -0.57492221 187.02505
     PERIM CROWN_AREA CICADA_SURVEY ONEYEARPHOTO SPECIALPHOTO PHOTOREMARKS
1  65.6168   215.2780                                                     
2 183.7270  1259.3763                                                     
3 170.6037   742.7091                                                     
4 164.0420  1130.2095                                                     
5 177.1654  1119.4456                                                     
6 144.3570   688.8896                                                     
  ELEVATION    SIGN TRRS  WARRANTY CREATED_USER CREATED_DATE EDITEDBY
1   Unknown Unknown   NA 2017-2018                              sward
2   Unknown Unknown   NA   Unknown                           jchapman
3   Unknown Unknown   NA   Unknown                            mmcphee
4   Unknown Unknown   NA   Unknown                              sward
5   Unknown Unknown   NA   Unknown                              sward
6   Unknown Unknown   NA 2010-2011                            jmiller
  LAST_EDITED_USER       LAST_EDITED_DATE GIS_ID
1            sward 2024/02/28 23:57:52+00     NA
2         jchapman 2021/02/17 22:21:47+00     NA
3          mmcphee 2021/09/13 18:54:32+00     NA
4            sward 2020/02/14 01:34:14+00     NA
5            sward 2020/09/16 19:51:11+00     NA
6          jmiller 2023/05/22 19:50:08+00     NA
                                GLOBALID CREATOR CREATED EDITOR EDITED SHAPE
1 {0B358D52-AAD4-41AC-B1AF-B19740DBC02A}      NA      NA     NA     NA    NA
2 {0F7845B3-E5DE-480B-96EC-B595354BCA5C}      NA      NA     NA     NA    NA
3 {EA1C7F1D-8FF6-4A3A-BFBD-0147BABCA5F7}      NA      NA     NA     NA    NA
4 {ADB853B2-E32F-4BB4-B949-DE7B5656DCD5}      NA      NA     NA     NA    NA
5 {300EF1F5-F440-4E16-BBC0-69C6CDD772CA}      NA      NA     NA     NA    NA
6 {0BEFB0A1-AAF4-4958-849C-CFBFBA3D4E78}      NA      NA     NA     NA    NA
  OBJECTID
1 40100904
2 40100905
3 40100906
4 40100907
5 40100908
6 40100909

Exploratory data analysis

In tidyverse, the basic operator for linking functions is %>% or a pipe operator. We can use this to string many functions together.

Subsetting data

The basic function for subsetting columns/variables in tidyverse is select().

urbantrees %>% select(CMMN_NM)

The basic function for selecting particular rows is filter().

urbantrees %>% filter(CMMN_NM == "Red maple" & DISEASE == "Ganoderma Root Rot")

We can also select all the unique observations within a particular variable. For example, we might be intersted in knowing what all the unique ward names are.

urbantrees %>% distinct(WARD)
   WARD
1     6
2     1
3     2
4     7
5     8
6     4
7     3
8     5
9    NA
10   10
11    0
12    9
13   88
14   99

We can also ask R to tell us how many distinct values there are within a variable.

n_distinct(urbantrees$FAM_NAME)
[1] 104

Try it

Recalling what we learned about subsetting dataframes, try to complete the following tasks using base R and/or tidyverse.

  1. Select the first 6 observations where the tree genus is Quercus. Hint: Use head().
  2. Show all the unique species names within the family Rosaceae.
  3. The previous line of code showed us that bur oaks are listed as being in Rosaceae. This seems odd. Write code to show all the different family names that are associated with bur oaks.
Click for solution
urbantrees %>% filter(GENUS_NAME=="Quercus") %>% head()
          X        Y                   SCI_NM              CMMN_NM GENUS_NAME
1 -76.99281 38.88609          Quercus montana    Rock chestnut oak    Quercus
2 -77.03567 38.92727 Quercus robur fastigiata Columnar English oak    Quercus
3 -77.03931 38.92800           Quercus lyrata          Overcup oak    Quercus
4 -77.00198 38.88539        Quercus palustris              Pin oak    Quercus
5 -77.04009 38.93254          Quercus phellos           Willow oak    Quercus
6 -77.04090 38.92535        Quercus palustris              Pin oak    Quercus
  FAM_NAME             DATE_PLANT              FACILITYID          VICINITY
1 Fagaceae 2018/02/01 18:50:34+00 31982-090-3001-0269-000       922 C ST SE
2 Fagaceae                        10150-300-3001-0050-000   3029 15TH ST NW
3 Fagaceae 2011/02/17 05:00:00+00 14582-160-3005-0656-000 1653 HOBART ST NW
4 Fagaceae                        30030-030-3001-0237-000 OPP 319 3RD ST SE
5 Fagaceae                        16890-178-3005-0043-000   1737 PARK RD NW
6 Fagaceae                        15408-165-3005-0467-000 1741 LANIER PL NW
  WARD TBOX_L TBOX_W WIRES      CURB  SIDEWALK TBOX_STAT RETIREDDT  DBH
1    6     99      7  None Permanent Permanent     Plant            5.7
2    1      6      3  None Permanent Permanent     Plant           10.9
3    1     99      4  None Permanent Permanent     Plant            9.3
4    6      9      4  None Permanent Permanent     Plant           24.5
5    1     12      5  None Permanent Permanent     Plant           21.0
6    1     99      5  None Permanent Flexipave     Plant           28.1
    DISEASE PESTS CONDITION             CONDITIODT OWNERSHIP
1                 Excellent 2024/02/28 23:57:09+00       UFA
2                      Fair 2021/09/13 18:55:03+00       UFA
3 Hypoxylon            Dead 2023/05/22 19:49:55+00       UFA
4                      Fair 2020/11/16 21:32:38+00       UFA
5                 Excellent 2022/11/18 21:24:48+00       UFA
6                      Fair 2022/08/18 19:26:54+00       UFA
                                                                                                                                                                                                                      TREE_NOTES
1                                                                                                                                                                                                Elevated street side. Feb 2024.
2                                                                                                                                                                                                                               
3                                                                                                                                                                                                                               
4 Bread loaf-sized Inonatus at base. Three.“Black crust” Kretzschmeria conk, fist-sized, on root flare, edge of sidewalk.  Grew one inch DBH since 2017. Another shelf conk at 15’ up.  Dieback sprinkled thru crown, June 2019.
5                                                                                                                                                                                                                               
6                                                                                                                                                                         P. Beginning of bls potentiallyWash gas disrupted soil
  MBG_WIDTH MBG_LENGTH MBG_ORIENTATION MAX_CROWN_HEIGHT MAX_MEAN MIN_CROWN_BASE
1  13.12336   19.68504        90.00000         18.91814 14.26427     0.05331409
2  29.53926   46.50863       163.30076         37.41346 21.32403    -0.21777124
3  29.52756   32.80840         0.00000         37.61407 18.87533    -0.57492221
4  39.37008   65.61680        90.00000         67.73044 56.71571     0.01390713
5  38.23960   78.39118       150.94540         54.32866 35.59290    -0.13329457
6  55.73578   83.25091        53.74616         61.28306 41.10743    -1.44659974
   DTM_MEAN    PERIM CROWN_AREA CICADA_SURVEY ONEYEARPHOTO SPECIALPHOTO
1  82.26296  65.6168   215.2780                                        
2 202.87526 170.6037   742.7091                                        
3 187.02505 144.3570   688.8896                                        
4  72.45985 216.5354  1668.4045                                        
5 198.98500 249.3438  1636.1128                                        
6 186.02665 295.2756  2755.5584                                        
  PHOTOREMARKS ELEVATION    SIGN TRRS  WARRANTY CREATED_USER CREATED_DATE
1                Unknown Unknown   NA 2017-2018                          
2                Unknown Unknown   NA   Unknown                          
3                Unknown Unknown   NA 2010-2011                          
4                Unknown Unknown   NA   Unknown                          
5                Unknown Unknown   NA   Unknown                          
6                Unknown Unknown   NA                                    
  EDITEDBY LAST_EDITED_USER       LAST_EDITED_DATE GIS_ID
1    sward            sward 2024/02/28 23:57:52+00     NA
2  mmcphee          mmcphee 2021/09/13 18:54:32+00     NA
3  jmiller          jmiller 2023/05/22 19:50:08+00     NA
4    sward            sward 2020/11/16 21:32:41+00     NA
5  jmiller          jmiller 2022/11/18 21:23:51+00     NA
6  mmcphee          mmcphee 2022/08/18 19:26:19+00     NA
                                GLOBALID CREATOR CREATED EDITOR EDITED SHAPE
1 {0B358D52-AAD4-41AC-B1AF-B19740DBC02A}      NA      NA     NA     NA    NA
2 {EA1C7F1D-8FF6-4A3A-BFBD-0147BABCA5F7}      NA      NA     NA     NA    NA
3 {0BEFB0A1-AAF4-4958-849C-CFBFBA3D4E78}      NA      NA     NA     NA    NA
4 {CFA5BDF4-B306-4D54-A501-FCE33E3C5146}      NA      NA     NA     NA    NA
5 {A09B6E85-1A6C-4A13-8011-E93255AEAF21}      NA      NA     NA     NA    NA
6 {4B386921-E1E9-455D-B878-8488AD418224}      NA      NA     NA     NA    NA
  OBJECTID
1 40100904
2 40100906
3 40100909
4 40100912
5 40101121
6 40101124

urbantrees %>% filter(FAM_NAME=="Rosaceae") %>% distinct(CMMN_NM)
                                CMMN_NM
1                 Bradford callery pear
2                                Cherry
3                 Shadblow serviceberry
4                    Prunus x yedoensis
5                    Cherry (Snowgoose)
6                      Purple leaf plum
7                             Crabapple
8                Alleghany serviceberry
9                        Yoshino cherry
10                          Chokecherry
11                         Okame cherry
12                       Kwanzan cherry
13                   Downy serviceberry
14                                     
15                     Arnold crabapple
16                     Golden rain tree
17                         Serviceberry
18      Autumn brilliance service berry
19               Donald Wyman Crabapple
20                 Adirondack Crabapple
21              Whitehouse callery pear
22               Crimson Cloud hawthorn
23                          Honeylocust
24             Crabapple (Harvest Gold)
25                         Crape myrtle
26                    Radiant crabapple
27                  Washington hawthorn
28                       Eastern redbud
29                     Japanese Apricot
30                     Lavalle hawthorn
31                               Redbud
32                    Other (See Notes)
33                  Snowdrift crabapple
34                   Prunus x yodoensis
35                    American hornbeam
36               Canada Red Chekecherry
37           Winter King Green hawthorn
38                             Blackgum
39                            Hackberry
40                     Snowgoose cherry
41       Ivory Silk Japanese tree lilac
42                        Trident maple
43                     Chinese pistache
44                   Thunder cloud plum
45                         Higan Cherry
46                      Swamp white oak
47                  Kentucky coffeetree
48                    Flowering Dogwood
49                         Silver maple
50                           Yellowwood
51                    Red horsechestnut
52                    Hardy Rubber Tree
53                          Hedge maple
54                          River birch
55          Moonglow Sweet Bay Magnolia
56                                  Elm
57       Autumn Brilliance serviceberry
58                                Lilac
59                   Chinese flame tree
60                    Sweetbay magnolia
61                         Bald cypress
62                         Deodar cedar
63                          Scarlet oak
64 Autumn Brilliance Apple serviceberry
65                       Staghorn sumac
66                     Japanese zelkova
67          Green Vase Japanese zelkova
68                    American sycamore
69                          Chinese elm
70              Shademaster honeylocust
71               Dura heat' river birch
72                  Carolina silverbell
73                     Cornelian Cherry
74                         Black Cherry
75                              Bur oak
76                    Southern magnolia
77                            Tuliptree
78                          Katsuratree
79                            Persimmon
80       Autumn Brilliance Serviceberry
81                 Thunder cloud  plum 
82                             Sweetgum
83                              Red oak
84                           Willow oak
85                    London plane tree

urbantrees %>% filter(CMMN_NM=="Bur oak") %>% distinct(FAM_NAME) 
     FAM_NAME
1    Fagaceae
2   Fagaceae 
3 Sapindaceae
4    Ulmaceae
5            
6    Rosaceae
7        Null

Summarizing data

In order to answer questions about our data, we need to summarize it in various ways. Below are two ways to make a table of the counts of the number of trees that have various diseases.

table(urbantrees$DISEASE)

                    Armillaria Root Rot                 B&B                 BLS 
             209039                  35                   2                 279 
           Butt Rot                 DED  Ganoderma Root Rot           Hypoxylon 
                152                 144                 441                 222 
           jchapman             jconlon             jmiller           mlehtonen 
                  5                   1                  14                   1 
            mmcphee            msampson        None present      Powdery Mildew 
                  1                   4                 191                  31 
           Root Rot               sdoan              smckim               sward 
                 74                   3                   1                   8 
         Trunk Root           Trunk Rot 
                 40                 429 

urbantrees %>% group_by(DISEASE) %>% count() %>% arrange(desc(n))
# A tibble: 22 × 2
# Groups:   DISEASE [22]
   DISEASE                   n
   <chr>                 <int>
 1 ""                   209039
 2 "Ganoderma Root Rot"    441
 3 "Trunk Rot"             429
 4 "BLS"                   279
 5 "Hypoxylon"             222
 6 "None present"          191
 7 "Butt Rot"              152
 8 "DED"                   144
 9 "Root Rot"               74
10 "Trunk Root"             40
# ℹ 12 more rows

In tidyverse we can also create new summarized dataframes, such as the one below that tells us the mean height of the trees as well as the tallest height and the genus of the tallest tree.

urbantrees %>% summarise(meanheight=mean(MAX_CROWN_HEIGHT, na.rm=T), maxheight=max(MAX_CROWN_HEIGHT, na.rm=T), tallestspecies=urbantrees[max(urbantrees$MAX_CROWN_HEIGHT, na.rm=T), "GENUS_NAME" ])
  meanheight maxheight tallestspecies
1   36.66681  182.9099          Ulmus

Try it

  1. Make a table of plant genus counts.
  2. Make a table of the number of trees per ward arranged in descending order. How does this fit into what you know about these wards?
  3. Make a table summarizing the number of hickory and pawpaw trees per ward. Keep both the pawpaw and hickory counts as separate rows. Hint: Use the | operator in your filter() function to keep all rows matching both conditions.
Click for solution
urbantrees %>% group_by(GENUS_NAME) %>% count()
table(urbantrees$GENUS_NAME)

urbantrees %>% group_by(WARD) %>% count() %>% arrange(desc(n))

urbantrees %>% group_by(WARD, CMMN_NM) %>% filter(CMMN_NM == "Pawpaw" | CMMN_NM == "Hickory") %>% count()

Exporting data

Now that we’ve made some summary tables of our tree data, we might want to download these tables to our computers. Let’s export this table as a .csv file.

diseasecounts <- urbantrees %>% group_by(DISEASE) %>% count() %>% arrange(desc(n))

write.csv(diseasecounts, "diseasecounts.csv")

The file will be saved into your working directory. You can see where your working directory is set using the getwd() function. More details on working directories can be found in An Introduction to R.

ifelse() statements

Another common form of logical testing in R is the ifelse() statement. In this case, you pass a logical test to R and if the output is true, a certain action is performed, then if it is false, another action is performed. This can be used to make new variables, subset data, color points on a graph and much more.

Let’s annotate the urban tree data according to whether or not the tree is in fair condition and located in ward 6.

head(ifelse(urbantrees$CONDITION == "Fair" & urbantrees$WARD == "6", "fair tree in ward 6", "other"))
[1] "other"               "fair tree in ward 6" "other"              
[4] "other"               "other"               "other"              

# now we can add this to our tree dataset
urbantrees$wardsixfair <- ifelse(urbantrees$CONDITION == "Fair" & urbantrees$WARD == "6", "fair tree in ward 6", "other")

#and take a look at our new variable and double check that it worked as intended
urbantrees %>% select(CMMN_NM, CONDITION, WARD, wardsixfair) %>% head(10)
                   CMMN_NM CONDITION WARD         wardsixfair
1        Rock chestnut oak Excellent    6               other
2                Red maple      Fair    6 fair tree in ward 6
3     Columnar English oak      Fair    1               other
4          American linden      Good    6               other
5             Norway maple      Good    6               other
6              Overcup oak      Dead    1               other
7  Redmond American Linden      Good    6               other
8          New Harmony elm Excellent    1               other
9                  Pin oak      Fair    6 fair tree in ward 6
10              Willow oak Excellent    1               other

Try it

ifelse() statements can also be nested. How might you write code to output the annotation “fair tree in ward 6” for fair trees in ward 6, as well as the annotation “good tree in ward 6” for good trees in ward six. You can put these ifelse() statements in the same line of code.

Click for solution
ifelse(urbantrees$CONDITION == "Fair" & urbantrees$WARD == "6", "fair tree in ward 6", ifelse(urbantrees$CONDITION == "Good" & urbantrees$WARD == "6", "good tree in ward 6", "other"))

Plotting data

For this tutorial, we will use ggplot2 to plot data. In this package, you initialize a ggplot() object and then add aesthetic layers such as color controls, lines, points or text annotations.

First, we will make a basic scatterplot. This shows the perimeter of the crown by the mean crown height. Points are colored according to ward number.

ggplot(urbantrees, aes(PERIM, MAX_MEAN, color=as.factor(WARD))) + geom_point() + ggtitle("DC Tree Attributes")

Customizing aesthetics

There are multiple aesthetic parameters that can be customized in ggplots. This includes: color, fill, linetype, size, shape, font, and more. It just depends on which geom you are working with. We will explore some of these graphical parameters further as this tutorial introduces different geoms. Here is a vignette about aesthetic customization in ggplot2.

Different geoms

geom_col()
geom_point()
geom_line()
geom_smooth()
geom_histogram()
geom_boxplot()
geom_text()
geom_density()
geom_errorbar()
geom_hline()
geom_abline()

Bar plots

Bar plots are great for showing frequencies or proportions across different groups. For instance, we may want to calculate the number of pawpaw trees per ward and then plot this in a bargraph with ggplot2.

npawpawbyward <- urbantrees %>% group_by(WARD, CMMN_NM) %>% filter(CMMN_NM == "Pawpaw") %>% count()

ggplot(npawpawbyward, aes(x=WARD, y=n)) + geom_col()

Try it

  1. Create a new dataframe summarizing the number of maple trees with each type of disease. Remove all rows with blank cells. Hint: Observations with blank cells are marked by an empty set "".
  2. Plot the data as a barplot in order from least to most common disease. Hint: Use the reorder() function to control the order of the x axis variable.
  3. Flip the coordinates of the plot using coord_flip().
  4. Finish your plot by adding in x and y axis labels and a title. Hint: Use the labs() and ggtitle() functions.
Click for solution

ndisease <- urbantrees %>% group_by(CMMN_NM, DISEASE) %>% filter(CMMN_NM == "Red maple" & DISEASE != "") %>% count()

ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n)) + geom_col() + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")

Colors

R has many built-in colors. You can view them by using the colors() function.

Let’s add color to our plot of maple tree diseases. You can directly assign a color as an aesthetic trait in ggplot or assign the colors to a variable.

Try it

  1. In the geom_col() function of your previous plot code, add in colors with both the fill= and color= arguments.
  2. Try setting the fill() to the disease variable from within the aes() argument of your ggplot() function. What happens?
Click for solution
ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n)) + geom_col(fill="green", color="blue") + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")


ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n, fill=DISEASE)) + geom_col() + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")

Combining multiple plots into a single figure

Sometimes it is helpful to combine multiple plots into a single output. The package ggpubr can help with this.

library(ggpubr)

barplot <- ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n)) + geom_col(fill="green", color="blue") + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")

boxplot <- ggboxplot(urbantrees, "WARD", "DBH", outliers = FALSE, title="DBH of trees by Ward")
names(urbantrees)
 [1] "X"                "Y"                "SCI_NM"           "CMMN_NM"         
 [5] "GENUS_NAME"       "FAM_NAME"         "DATE_PLANT"       "FACILITYID"      
 [9] "VICINITY"         "WARD"             "TBOX_L"           "TBOX_W"          
[13] "WIRES"            "CURB"             "SIDEWALK"         "TBOX_STAT"       
[17] "RETIREDDT"        "DBH"              "DISEASE"          "PESTS"           
[21] "CONDITION"        "CONDITIODT"       "OWNERSHIP"        "TREE_NOTES"      
[25] "MBG_WIDTH"        "MBG_LENGTH"       "MBG_ORIENTATION"  "MAX_CROWN_HEIGHT"
[29] "MAX_MEAN"         "MIN_CROWN_BASE"   "DTM_MEAN"         "PERIM"           
[33] "CROWN_AREA"       "CICADA_SURVEY"    "ONEYEARPHOTO"     "SPECIALPHOTO"    
[37] "PHOTOREMARKS"     "ELEVATION"        "SIGN"             "TRRS"            
[41] "WARRANTY"         "CREATED_USER"     "CREATED_DATE"     "EDITEDBY"        
[45] "LAST_EDITED_USER" "LAST_EDITED_DATE" "GIS_ID"           "GLOBALID"        
[49] "CREATOR"          "CREATED"          "EDITOR"           "EDITED"          
[53] "SHAPE"            "OBJECTID"         "wardsixfair"     

ggarrange(barplot, boxplot, nrow=2)