This lesson introduces R basics including exploratory data analysis
with the tidyverse package in R.
Base R has many useful functions but where R really shines is through the 22,977 and counting packages that you can download to enhance R’s functionality.
Let’s install and then load the tidyverse suite of
packages. You only need to install a package once, but you have to load
the library every time you start a new R session.
#install.packages("tidyverse")
library(tidyverse)
Packages can also be installed by using the “Tools” –> “Install Packages” menu in RStudio.
Let’s start working with some real data. Here we will work with the Open Data DC Urban Forestry Street Trees dataset. First, download the dataset and then we will load it into R. For the tutorial, I will be loading an older version of this file that I have uploaded online. This means our output may look a bit different.
Details on how to read data files from a Windows operating system: intro2r link.
urbantrees <- read.csv("https://maddiebrown.github.io/ANTH630/data/Urban_Forestry_Street_Trees_2024.csv")
Let’s examine the structure of our dataset.
str(urbantrees)
'data.frame': 211117 obs. of 54 variables:
$ X : num -77 -77 -77 -77 -77 ...
$ Y : num 38.9 38.9 38.9 38.9 38.9 ...
$ SCI_NM : chr "Quercus montana" "Acer rubrum" "Quercus robur fastigiata" "Tilia americana" ...
$ CMMN_NM : chr "Rock chestnut oak" "Red maple" "Columnar English oak" "American linden" ...
$ GENUS_NAME : chr "Quercus" "Acer" "Quercus" "Tilia" ...
$ FAM_NAME : chr "Fagaceae" "Sapindaceae" "Fagaceae" "Tiliaceae" ...
$ DATE_PLANT : chr "2018/02/01 18:50:34+00" "" "" "" ...
$ FACILITYID : chr "31982-090-3001-0269-000" "31982-100-3005-0155-000" "10150-300-3001-0050-000" "32691-092-3001-0105-000" ...
$ VICINITY : chr "922 C ST SE" "1017 C ST SE" "3029 15TH ST NW" "904 D ST SE" ...
$ WARD : int 6 6 1 6 6 1 6 1 6 1 ...
$ TBOX_L : num 99 8 6 9 8 99 12 9 9 12 ...
$ TBOX_W : num 7 4 3 4 4 4 4 3 4 5 ...
$ WIRES : chr "None" "None" "None" "None" ...
$ CURB : chr "Permanent" "Permanent" "Permanent" "Permanent" ...
$ SIDEWALK : chr "Permanent" "Permanent" "Permanent" "Permanent" ...
$ TBOX_STAT : chr "Plant" "Plant" "Plant" "Plant" ...
$ RETIREDDT : chr "" "" "" "" ...
$ DBH : num 5.7 17.7 10.9 13.4 11.9 9.3 1.6 5.5 24.5 21 ...
$ DISEASE : chr "" "" "" "" ...
$ PESTS : chr "" "" "" "" ...
$ CONDITION : chr "Excellent" "Fair" "Fair" "Good" ...
$ CONDITIODT : chr "2024/02/28 23:57:09+00" "2021/02/17 22:21:46+00" "2021/09/13 18:55:03+00" "2020/02/14 01:33:24+00" ...
$ OWNERSHIP : chr "UFA" "UFA" "UFA" "UFA" ...
$ TREE_NOTES : chr "Elevated street side. Feb 2024." "P dead wood only and r small mulberry at base, be careful of roots" "" "" ...
$ MBG_WIDTH : num 13.1 39.4 29.5 29.5 39.4 ...
$ MBG_LENGTH : num 19.7 45.9 46.5 45.9 45.9 ...
$ MBG_ORIENTATION : num 90 90 163 0 90 ...
$ MAX_CROWN_HEIGHT: num 18.9 45.9 37.4 41.5 32.6 ...
$ MAX_MEAN : num 14.3 30.7 21.3 22.6 21.2 ...
$ MIN_CROWN_BASE : num 0.0533 -0.1557 -0.2178 0.1589 -0.1809 ...
$ DTM_MEAN : num 82.3 81.2 202.9 77 81.1 ...
$ PERIM : num 65.6 183.7 170.6 164 177.2 ...
$ CROWN_AREA : num 215 1259 743 1130 1119 ...
$ CICADA_SURVEY : chr "" "" "" "" ...
$ ONEYEARPHOTO : chr "" "" "" "" ...
$ SPECIALPHOTO : chr "" "" "" "" ...
$ PHOTOREMARKS : chr "" "" "" "" ...
$ ELEVATION : chr "Unknown" "Unknown" "Unknown" "Unknown" ...
$ SIGN : chr "Unknown" "Unknown" "Unknown" "Unknown" ...
$ TRRS : int NA NA NA NA NA NA NA NA NA NA ...
$ WARRANTY : chr "2017-2018" "Unknown" "Unknown" "Unknown" ...
$ CREATED_USER : chr "" "" "" "" ...
$ CREATED_DATE : chr "" "" "" "" ...
$ EDITEDBY : chr "sward" "jchapman" "mmcphee" "sward" ...
$ LAST_EDITED_USER: chr "sward" "jchapman" "mmcphee" "sward" ...
$ LAST_EDITED_DATE: chr "2024/02/28 23:57:52+00" "2021/02/17 22:21:47+00" "2021/09/13 18:54:32+00" "2020/02/14 01:34:14+00" ...
$ GIS_ID : logi NA NA NA NA NA NA ...
$ GLOBALID : chr "{0B358D52-AAD4-41AC-B1AF-B19740DBC02A}" "{0F7845B3-E5DE-480B-96EC-B595354BCA5C}" "{EA1C7F1D-8FF6-4A3A-BFBD-0147BABCA5F7}" "{ADB853B2-E32F-4BB4-B949-DE7B5656DCD5}" ...
$ CREATOR : logi NA NA NA NA NA NA ...
$ CREATED : logi NA NA NA NA NA NA ...
$ EDITOR : logi NA NA NA NA NA NA ...
$ EDITED : logi NA NA NA NA NA NA ...
$ SHAPE : logi NA NA NA NA NA NA ...
$ OBJECTID : int 40100904 40100905 40100906 40100907 40100908 40100909 40100910 40100911 40100912 40101121 ...
We can also look at the first six rows.
head(urbantrees)
X Y SCI_NM CMMN_NM GENUS_NAME
1 -76.99281 38.88609 Quercus montana Rock chestnut oak Quercus
2 -76.99206 38.88599 Acer rubrum Red maple Acer
3 -77.03567 38.92727 Quercus robur fastigiata Columnar English oak Quercus
4 -76.99334 38.88417 Tilia americana American linden Tilia
5 -76.99838 38.88728 Acer platanoides Norway maple Acer
6 -77.03931 38.92800 Quercus lyrata Overcup oak Quercus
FAM_NAME DATE_PLANT FACILITYID VICINITY
1 Fagaceae 2018/02/01 18:50:34+00 31982-090-3001-0269-000 922 C ST SE
2 Sapindaceae 31982-100-3005-0155-000 1017 C ST SE
3 Fagaceae 10150-300-3001-0050-000 3029 15TH ST NW
4 Tiliaceae 32691-092-3001-0105-000 904 D ST SE
5 Sapindaceae 30060-020-3001-0101-000 208 6TH ST SE
6 Fagaceae 2011/02/17 05:00:00+00 14582-160-3005-0656-000 1653 HOBART ST NW
WARD TBOX_L TBOX_W WIRES CURB SIDEWALK TBOX_STAT RETIREDDT DBH
1 6 99 7 None Permanent Permanent Plant 5.7
2 6 8 4 None Permanent Permanent Plant 17.7
3 1 6 3 None Permanent Permanent Plant 10.9
4 6 9 4 None Permanent Permanent Plant 13.4
5 6 8 4 None Permanent Permanent Plant 11.9
6 1 99 4 None Permanent Permanent Plant 9.3
DISEASE PESTS CONDITION CONDITIODT OWNERSHIP
1 Excellent 2024/02/28 23:57:09+00 UFA
2 Fair 2021/02/17 22:21:46+00 UFA
3 Fair 2021/09/13 18:55:03+00 UFA
4 Good 2020/02/14 01:33:24+00 UFA
5 Good 2020/09/16 19:38:17+00 UFA
6 Hypoxylon Dead 2023/05/22 19:49:55+00 UFA
TREE_NOTES MBG_WIDTH
1 Elevated street side. Feb 2024. 13.12336
2 P dead wood only and r small mulberry at base, be careful of roots 39.37008
3 29.53926
4 29.52756
5 Arborist removed some deadwood and scheduled for pruning on 1/5/17 39.37008
6 29.52756
MBG_LENGTH MBG_ORIENTATION MAX_CROWN_HEIGHT MAX_MEAN MIN_CROWN_BASE DTM_MEAN
1 19.68504 90.0000 18.91814 14.26427 0.05331409 82.26296
2 45.93176 90.0000 45.90728 30.68867 -0.15571680 81.22527
3 46.50863 163.3008 37.41346 21.32403 -0.21777124 202.87526
4 45.93176 0.0000 41.53025 22.57074 0.15885049 77.00650
5 45.93176 90.0000 32.59466 21.19249 -0.18092014 81.08643
6 32.80840 0.0000 37.61407 18.87533 -0.57492221 187.02505
PERIM CROWN_AREA CICADA_SURVEY ONEYEARPHOTO SPECIALPHOTO PHOTOREMARKS
1 65.6168 215.2780
2 183.7270 1259.3763
3 170.6037 742.7091
4 164.0420 1130.2095
5 177.1654 1119.4456
6 144.3570 688.8896
ELEVATION SIGN TRRS WARRANTY CREATED_USER CREATED_DATE EDITEDBY
1 Unknown Unknown NA 2017-2018 sward
2 Unknown Unknown NA Unknown jchapman
3 Unknown Unknown NA Unknown mmcphee
4 Unknown Unknown NA Unknown sward
5 Unknown Unknown NA Unknown sward
6 Unknown Unknown NA 2010-2011 jmiller
LAST_EDITED_USER LAST_EDITED_DATE GIS_ID
1 sward 2024/02/28 23:57:52+00 NA
2 jchapman 2021/02/17 22:21:47+00 NA
3 mmcphee 2021/09/13 18:54:32+00 NA
4 sward 2020/02/14 01:34:14+00 NA
5 sward 2020/09/16 19:51:11+00 NA
6 jmiller 2023/05/22 19:50:08+00 NA
GLOBALID CREATOR CREATED EDITOR EDITED SHAPE
1 {0B358D52-AAD4-41AC-B1AF-B19740DBC02A} NA NA NA NA NA
2 {0F7845B3-E5DE-480B-96EC-B595354BCA5C} NA NA NA NA NA
3 {EA1C7F1D-8FF6-4A3A-BFBD-0147BABCA5F7} NA NA NA NA NA
4 {ADB853B2-E32F-4BB4-B949-DE7B5656DCD5} NA NA NA NA NA
5 {300EF1F5-F440-4E16-BBC0-69C6CDD772CA} NA NA NA NA NA
6 {0BEFB0A1-AAF4-4958-849C-CFBFBA3D4E78} NA NA NA NA NA
OBJECTID
1 40100904
2 40100905
3 40100906
4 40100907
5 40100908
6 40100909
In tidyverse, the basic operator for linking functions is
%>% or a pipe operator. We can use this to string many
functions together.
The basic function for subsetting columns/variables in tidyverse is
select().
urbantrees %>% select(CMMN_NM)
The basic function for selecting particular rows is
filter().
urbantrees %>% filter(CMMN_NM == "Red maple" & DISEASE == "Ganoderma Root Rot")
We can also select all the unique observations within a particular variable. For example, we might be intersted in knowing what all the unique ward names are.
urbantrees %>% distinct(WARD)
WARD
1 6
2 1
3 2
4 7
5 8
6 4
7 3
8 5
9 NA
10 10
11 0
12 9
13 88
14 99
We can also ask R to tell us how many distinct values there are within a variable.
n_distinct(urbantrees$FAM_NAME)
[1] 104
Recalling what we learned about subsetting dataframes, try to
complete the following tasks using base R and/or
tidyverse.
head().urbantrees %>% filter(GENUS_NAME=="Quercus") %>% head()
X Y SCI_NM CMMN_NM GENUS_NAME
1 -76.99281 38.88609 Quercus montana Rock chestnut oak Quercus
2 -77.03567 38.92727 Quercus robur fastigiata Columnar English oak Quercus
3 -77.03931 38.92800 Quercus lyrata Overcup oak Quercus
4 -77.00198 38.88539 Quercus palustris Pin oak Quercus
5 -77.04009 38.93254 Quercus phellos Willow oak Quercus
6 -77.04090 38.92535 Quercus palustris Pin oak Quercus
FAM_NAME DATE_PLANT FACILITYID VICINITY
1 Fagaceae 2018/02/01 18:50:34+00 31982-090-3001-0269-000 922 C ST SE
2 Fagaceae 10150-300-3001-0050-000 3029 15TH ST NW
3 Fagaceae 2011/02/17 05:00:00+00 14582-160-3005-0656-000 1653 HOBART ST NW
4 Fagaceae 30030-030-3001-0237-000 OPP 319 3RD ST SE
5 Fagaceae 16890-178-3005-0043-000 1737 PARK RD NW
6 Fagaceae 15408-165-3005-0467-000 1741 LANIER PL NW
WARD TBOX_L TBOX_W WIRES CURB SIDEWALK TBOX_STAT RETIREDDT DBH
1 6 99 7 None Permanent Permanent Plant 5.7
2 1 6 3 None Permanent Permanent Plant 10.9
3 1 99 4 None Permanent Permanent Plant 9.3
4 6 9 4 None Permanent Permanent Plant 24.5
5 1 12 5 None Permanent Permanent Plant 21.0
6 1 99 5 None Permanent Flexipave Plant 28.1
DISEASE PESTS CONDITION CONDITIODT OWNERSHIP
1 Excellent 2024/02/28 23:57:09+00 UFA
2 Fair 2021/09/13 18:55:03+00 UFA
3 Hypoxylon Dead 2023/05/22 19:49:55+00 UFA
4 Fair 2020/11/16 21:32:38+00 UFA
5 Excellent 2022/11/18 21:24:48+00 UFA
6 Fair 2022/08/18 19:26:54+00 UFA
TREE_NOTES
1 Elevated street side. Feb 2024.
2
3
4 Bread loaf-sized Inonatus at base. Three.“Black crust” Kretzschmeria conk, fist-sized, on root flare, edge of sidewalk. Grew one inch DBH since 2017. Another shelf conk at 15’ up. Dieback sprinkled thru crown, June 2019.
5
6 P. Beginning of bls potentiallyWash gas disrupted soil
MBG_WIDTH MBG_LENGTH MBG_ORIENTATION MAX_CROWN_HEIGHT MAX_MEAN MIN_CROWN_BASE
1 13.12336 19.68504 90.00000 18.91814 14.26427 0.05331409
2 29.53926 46.50863 163.30076 37.41346 21.32403 -0.21777124
3 29.52756 32.80840 0.00000 37.61407 18.87533 -0.57492221
4 39.37008 65.61680 90.00000 67.73044 56.71571 0.01390713
5 38.23960 78.39118 150.94540 54.32866 35.59290 -0.13329457
6 55.73578 83.25091 53.74616 61.28306 41.10743 -1.44659974
DTM_MEAN PERIM CROWN_AREA CICADA_SURVEY ONEYEARPHOTO SPECIALPHOTO
1 82.26296 65.6168 215.2780
2 202.87526 170.6037 742.7091
3 187.02505 144.3570 688.8896
4 72.45985 216.5354 1668.4045
5 198.98500 249.3438 1636.1128
6 186.02665 295.2756 2755.5584
PHOTOREMARKS ELEVATION SIGN TRRS WARRANTY CREATED_USER CREATED_DATE
1 Unknown Unknown NA 2017-2018
2 Unknown Unknown NA Unknown
3 Unknown Unknown NA 2010-2011
4 Unknown Unknown NA Unknown
5 Unknown Unknown NA Unknown
6 Unknown Unknown NA
EDITEDBY LAST_EDITED_USER LAST_EDITED_DATE GIS_ID
1 sward sward 2024/02/28 23:57:52+00 NA
2 mmcphee mmcphee 2021/09/13 18:54:32+00 NA
3 jmiller jmiller 2023/05/22 19:50:08+00 NA
4 sward sward 2020/11/16 21:32:41+00 NA
5 jmiller jmiller 2022/11/18 21:23:51+00 NA
6 mmcphee mmcphee 2022/08/18 19:26:19+00 NA
GLOBALID CREATOR CREATED EDITOR EDITED SHAPE
1 {0B358D52-AAD4-41AC-B1AF-B19740DBC02A} NA NA NA NA NA
2 {EA1C7F1D-8FF6-4A3A-BFBD-0147BABCA5F7} NA NA NA NA NA
3 {0BEFB0A1-AAF4-4958-849C-CFBFBA3D4E78} NA NA NA NA NA
4 {CFA5BDF4-B306-4D54-A501-FCE33E3C5146} NA NA NA NA NA
5 {A09B6E85-1A6C-4A13-8011-E93255AEAF21} NA NA NA NA NA
6 {4B386921-E1E9-455D-B878-8488AD418224} NA NA NA NA NA
OBJECTID
1 40100904
2 40100906
3 40100909
4 40100912
5 40101121
6 40101124
urbantrees %>% filter(FAM_NAME=="Rosaceae") %>% distinct(CMMN_NM)
CMMN_NM
1 Bradford callery pear
2 Cherry
3 Shadblow serviceberry
4 Prunus x yedoensis
5 Cherry (Snowgoose)
6 Purple leaf plum
7 Crabapple
8 Alleghany serviceberry
9 Yoshino cherry
10 Chokecherry
11 Okame cherry
12 Kwanzan cherry
13 Downy serviceberry
14
15 Arnold crabapple
16 Golden rain tree
17 Serviceberry
18 Autumn brilliance service berry
19 Donald Wyman Crabapple
20 Adirondack Crabapple
21 Whitehouse callery pear
22 Crimson Cloud hawthorn
23 Honeylocust
24 Crabapple (Harvest Gold)
25 Crape myrtle
26 Radiant crabapple
27 Washington hawthorn
28 Eastern redbud
29 Japanese Apricot
30 Lavalle hawthorn
31 Redbud
32 Other (See Notes)
33 Snowdrift crabapple
34 Prunus x yodoensis
35 American hornbeam
36 Canada Red Chekecherry
37 Winter King Green hawthorn
38 Blackgum
39 Hackberry
40 Snowgoose cherry
41 Ivory Silk Japanese tree lilac
42 Trident maple
43 Chinese pistache
44 Thunder cloud plum
45 Higan Cherry
46 Swamp white oak
47 Kentucky coffeetree
48 Flowering Dogwood
49 Silver maple
50 Yellowwood
51 Red horsechestnut
52 Hardy Rubber Tree
53 Hedge maple
54 River birch
55 Moonglow Sweet Bay Magnolia
56 Elm
57 Autumn Brilliance serviceberry
58 Lilac
59 Chinese flame tree
60 Sweetbay magnolia
61 Bald cypress
62 Deodar cedar
63 Scarlet oak
64 Autumn Brilliance Apple serviceberry
65 Staghorn sumac
66 Japanese zelkova
67 Green Vase Japanese zelkova
68 American sycamore
69 Chinese elm
70 Shademaster honeylocust
71 Dura heat' river birch
72 Carolina silverbell
73 Cornelian Cherry
74 Black Cherry
75 Bur oak
76 Southern magnolia
77 Tuliptree
78 Katsuratree
79 Persimmon
80 Autumn Brilliance Serviceberry
81 Thunder cloud plum
82 Sweetgum
83 Red oak
84 Willow oak
85 London plane tree
urbantrees %>% filter(CMMN_NM=="Bur oak") %>% distinct(FAM_NAME)
FAM_NAME
1 Fagaceae
2 Fagaceae
3 Sapindaceae
4 Ulmaceae
5
6 Rosaceae
7 Null
In order to answer questions about our data, we need to summarize it in various ways. Below are two ways to make a table of the counts of the number of trees that have various diseases.
table(urbantrees$DISEASE)
Armillaria Root Rot B&B BLS
209039 35 2 279
Butt Rot DED Ganoderma Root Rot Hypoxylon
152 144 441 222
jchapman jconlon jmiller mlehtonen
5 1 14 1
mmcphee msampson None present Powdery Mildew
1 4 191 31
Root Rot sdoan smckim sward
74 3 1 8
Trunk Root Trunk Rot
40 429
urbantrees %>% group_by(DISEASE) %>% count() %>% arrange(desc(n))
# A tibble: 22 × 2
# Groups: DISEASE [22]
DISEASE n
<chr> <int>
1 "" 209039
2 "Ganoderma Root Rot" 441
3 "Trunk Rot" 429
4 "BLS" 279
5 "Hypoxylon" 222
6 "None present" 191
7 "Butt Rot" 152
8 "DED" 144
9 "Root Rot" 74
10 "Trunk Root" 40
# ℹ 12 more rows
In tidyverse we can also create new summarized
dataframes, such as the one below that tells us the mean height of the
trees as well as the tallest height and the genus of the tallest
tree.
urbantrees %>% summarise(meanheight=mean(MAX_CROWN_HEIGHT, na.rm=T), maxheight=max(MAX_CROWN_HEIGHT, na.rm=T), tallestspecies=urbantrees[max(urbantrees$MAX_CROWN_HEIGHT, na.rm=T), "GENUS_NAME" ])
meanheight maxheight tallestspecies
1 36.66681 182.9099 Ulmus
| operator in your filter() function
to keep all rows matching both conditions.urbantrees %>% group_by(GENUS_NAME) %>% count()
table(urbantrees$GENUS_NAME)
urbantrees %>% group_by(WARD) %>% count() %>% arrange(desc(n))
urbantrees %>% group_by(WARD, CMMN_NM) %>% filter(CMMN_NM == "Pawpaw" | CMMN_NM == "Hickory") %>% count()
Now that we’ve made some summary tables of our tree data, we might want to download these tables to our computers. Let’s export this table as a .csv file.
diseasecounts <- urbantrees %>% group_by(DISEASE) %>% count() %>% arrange(desc(n))
write.csv(diseasecounts, "diseasecounts.csv")
The file will be saved into your working directory. You can see where
your working directory is set using the getwd() function.
More details on working directories can be found in An Introduction to
R.
ifelse() statementsAnother common form of logical testing in R is the
ifelse() statement. In this case, you pass a logical test
to R and if the output is true, a certain action is performed, then if
it is false, another action is performed. This can be used to make new
variables, subset data, color points on a graph and much more.
Let’s annotate the urban tree data according to whether or not the tree is in fair condition and located in ward 6.
head(ifelse(urbantrees$CONDITION == "Fair" & urbantrees$WARD == "6", "fair tree in ward 6", "other"))
[1] "other" "fair tree in ward 6" "other"
[4] "other" "other" "other"
# now we can add this to our tree dataset
urbantrees$wardsixfair <- ifelse(urbantrees$CONDITION == "Fair" & urbantrees$WARD == "6", "fair tree in ward 6", "other")
#and take a look at our new variable and double check that it worked as intended
urbantrees %>% select(CMMN_NM, CONDITION, WARD, wardsixfair) %>% head(10)
CMMN_NM CONDITION WARD wardsixfair
1 Rock chestnut oak Excellent 6 other
2 Red maple Fair 6 fair tree in ward 6
3 Columnar English oak Fair 1 other
4 American linden Good 6 other
5 Norway maple Good 6 other
6 Overcup oak Dead 1 other
7 Redmond American Linden Good 6 other
8 New Harmony elm Excellent 1 other
9 Pin oak Fair 6 fair tree in ward 6
10 Willow oak Excellent 1 other
ifelse() statements can also be nested. How might you
write code to output the annotation “fair tree in ward 6” for fair trees
in ward 6, as well as the annotation “good tree in ward 6” for good
trees in ward six. You can put these ifelse() statements in
the same line of code.
ifelse(urbantrees$CONDITION == "Fair" & urbantrees$WARD == "6", "fair tree in ward 6", ifelse(urbantrees$CONDITION == "Good" & urbantrees$WARD == "6", "good tree in ward 6", "other"))
For this tutorial, we will use ggplot2 to plot data. In
this package, you initialize a ggplot() object and then add
aesthetic layers such as color controls, lines, points or text
annotations.
First, we will make a basic scatterplot. This shows the perimeter of the crown by the mean crown height. Points are colored according to ward number.
ggplot(urbantrees, aes(PERIM, MAX_MEAN, color=as.factor(WARD))) + geom_point() + ggtitle("DC Tree Attributes")
There are multiple aesthetic parameters that can be customized in
ggplots. This includes: color, fill, linetype, size, shape, font, and
more. It just depends on which geom you are working with.
We will explore some of these graphical parameters further as this
tutorial introduces different geoms. Here
is a vignette about aesthetic customization in ggplot2.
geom_col()
geom_point()
geom_line()
geom_smooth()
geom_histogram()
geom_boxplot()
geom_text()
geom_density()
geom_errorbar()
geom_hline()
geom_abline()
Bar plots are great for showing frequencies or proportions across
different groups. For instance, we may want to calculate the number of
pawpaw trees per ward and then plot this in a bargraph with
ggplot2.
npawpawbyward <- urbantrees %>% group_by(WARD, CMMN_NM) %>% filter(CMMN_NM == "Pawpaw") %>% count()
ggplot(npawpawbyward, aes(x=WARD, y=n)) + geom_col()
"".reorder() function to control the
order of the x axis variable.coord_flip().labs() and ggtitle() functions.
ndisease <- urbantrees %>% group_by(CMMN_NM, DISEASE) %>% filter(CMMN_NM == "Red maple" & DISEASE != "") %>% count()
ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n)) + geom_col() + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")
R has many built-in colors. You can view them by using the
colors() function.
Let’s add color to our plot of maple tree diseases. You can directly assign a color as an aesthetic trait in ggplot or assign the colors to a variable.
geom_col() function of your previous plot code,
add in colors with both the fill= and color=
arguments.fill() to the disease variable from
within the aes() argument of your ggplot()
function. What happens?ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n)) + geom_col(fill="green", color="blue") + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")
ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n, fill=DISEASE)) + geom_col() + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")
Sometimes it is helpful to combine multiple plots into a single
output. The package ggpubr can help with this.
library(ggpubr)
barplot <- ggplot(ndisease, aes(x=reorder(ndisease$DISEASE, ndisease$n), y=n)) + geom_col(fill="green", color="blue") + coord_flip() + labs(x="Disease", y="Number of trees") + ggtitle("Prevalence of diseases in Red Maples in DC")
boxplot <- ggboxplot(urbantrees, "WARD", "DBH", outliers = FALSE, title="DBH of trees by Ward")
names(urbantrees)
[1] "X" "Y" "SCI_NM" "CMMN_NM"
[5] "GENUS_NAME" "FAM_NAME" "DATE_PLANT" "FACILITYID"
[9] "VICINITY" "WARD" "TBOX_L" "TBOX_W"
[13] "WIRES" "CURB" "SIDEWALK" "TBOX_STAT"
[17] "RETIREDDT" "DBH" "DISEASE" "PESTS"
[21] "CONDITION" "CONDITIODT" "OWNERSHIP" "TREE_NOTES"
[25] "MBG_WIDTH" "MBG_LENGTH" "MBG_ORIENTATION" "MAX_CROWN_HEIGHT"
[29] "MAX_MEAN" "MIN_CROWN_BASE" "DTM_MEAN" "PERIM"
[33] "CROWN_AREA" "CICADA_SURVEY" "ONEYEARPHOTO" "SPECIALPHOTO"
[37] "PHOTOREMARKS" "ELEVATION" "SIGN" "TRRS"
[41] "WARRANTY" "CREATED_USER" "CREATED_DATE" "EDITEDBY"
[45] "LAST_EDITED_USER" "LAST_EDITED_DATE" "GIS_ID" "GLOBALID"
[49] "CREATOR" "CREATED" "EDITOR" "EDITED"
[53] "SHAPE" "OBJECTID" "wardsixfair"
ggarrange(barplot, boxplot, nrow=2)