All the steps are outlined here, but here’s a link to the script that will run everything at once.
In which year did I observe the most individual birds? How many?
yearly.df = df %>%
#mutate(year = as.character(year)) %>%
group_by(year) %>%
summarise(yearly_total = sum(count), .groups="drop") %>%
mutate(year = as.numeric(year)) %>%
# max_bird_year = yearly.df$year[1] # if you arrange
max_bird_year = yearly.df$year [ which(yearly.df$yearly_total == max(yearly.df$yearly_total)) ] # if you don't arrange
cat("You observed the most individual birds in", max_bird_year )
## You observed the most individual birds in 2014
In that year how many different species of birds did I observe?
cat("You observed", length( unique( filter(df, year == max_bird_year)$scientific_name ) ), "species of bird in", max_bird_year )
## You observed 210 species of bird in 2014
In which state did I most frequently observe Red-winged Blackbirds?
b.bird = df %>% filter( common_name == "Red-winged Blackbird" ) %>%
mutate(state = substr(location, 4,5)) %>%
ungroup() %>%
group_by(state) %>%
summarise(Count = sum(count), .groups="drop") %>%
arrange( desc(Count) )
# many other states than just these ones, but did not observe R-w Bbirds here so they get filtered out.
cat("You observed the most Red-winged blackbird most frequently in", b.bird$state[1], ", with", b.bird$Count[1], "total birds.")
## You observed the most Red-winged blackbird most frequently in MO , with 596 total birds.
Filter observations for a duration between 5 and 200 minutes. Calculate the mean rate per checklist that I encounter species each year. Specifically, calculate the number of species in each checklist divided by duration and then take the mean for the year.
df %>% filter(duration >= 5, duration <= 200) %>%
group_by(list_ID, year) %>%
summarise(num_of_unique = length(common_name), .groups="drop") %>% # number of species in each checklist
group_by(year) %>%
summarise(num_of_lists = length(unique(list_ID)),
Mean_species = mean(num_of_unique), .groups="drop") %>% # mean for each year
## # A tibble: 13 x 3
## year num_of_lists Mean_species
## <int> <int> <dbl>
## 1 2003 2 4.5
## 2 2004 41 4.32
## 3 2009 1 8
## 4 2013 1 14
## 5 2014 77 19.1
## 6 2015 38 15.9
## 7 2016 6 19.3
## 8 2017 55 22.0
## 9 2018 22 17.7
## 10 2019 13 20.9
## 11 2020 40 17.1
## 12 2021 45 15.1
## 13 2022 14 13.6
Create a tibble that includes the complete observations for the top 10 most frequently observed species. First generate a top 10 list and then use this list to filter all observations. Export this tibble as a .csv file saved to a folder called “Results” folder within your R project and add link to the markdown document.
# generate a list of top ten most observed (i.e. highest count)
tops_df = df %>% group_by(common_name) %>%
summarise(num_observed = sum(count), .groups="drop") %>% # summarise across all lists/states/etc.
arrange( desc(num_observed) )
topten_list = tops_df$common_name[1:10]
topten_df = df %>% filter(common_name %in% topten_list)
knitr::kable(head(topten_df), "simple")
X | list_ID | common_name | scientific_name | date | time | count | duration | location | latitude | longitude | count_tot | month | year |
23 | S21177034 | Canada Goose | Branta canadensis | 2015-01-03 | 02:00 PM | 4 | 90 | US-IL | 39.92433 | -91.41487 | 11 | 1 | 2015 |
24 | S37097000 | Canada Goose | Branta canadensis | 2017-05-23 | 02:06 PM | 1 | 50 | US-VT | 44.09702 | -73.34205 | 44 | 5 | 2017 |
25 | S1740462 | Canada Goose | Branta canadensis | 2004-10-05 | 05:00 PM | 2 | 0 | US-VT | 43.78300 | -73.31543 | 12 | 10 | 2004 |
26 | S22598375 | Canada Goose | Branta canadensis | 2015-03-30 | 04:30 PM | 65 | 30 | US-VT | 43.78300 | -73.31543 | 81 | 3 | 2015 |
27 | S22635608 | Canada Goose | Branta canadensis | 2015-04-01 | 04:30 PM | 2 | 20 | US-VT | 43.78300 | -73.31543 | 16 | 4 | 2015 |
28 | S37075088 | Canada Goose | Branta canadensis | 2017-05-22 | 06:17 PM | 1 | 9 | US-VT | 43.78300 | -73.31543 | 61 | 5 | 2017 |
# write.csv(topten_df, "Results/top_ten_species.csv")