All the steps are outlined here, but here’s a link to the script that will run everything at once.


In which year did I observe the most individual birds? How many?

yearly.df = df %>%
  #mutate(year = as.character(year)) %>% 
  group_by(year) %>% 
  summarise(yearly_total = sum(count), .groups="drop") %>%
  mutate(year = as.numeric(year)) %>% 
  arrange(desc(yearly_total)) 

# max_bird_year = yearly.df$year[1] # if you arrange 

max_bird_year =  yearly.df$year [ which(yearly.df$yearly_total == max(yearly.df$yearly_total)) ]  # if you don't arrange

cat("You observed the most individual birds in", max_bird_year )
## You observed the most individual birds in 2014

In that year how many different species of birds did I observe?

cat("You observed", length(  unique( filter(df, year == max_bird_year)$scientific_name ) ), "species of bird in", max_bird_year  )
## You observed 210 species of bird in 2014

In which state did I most frequently observe Red-winged Blackbirds?

b.bird = df %>% filter( common_name == "Red-winged Blackbird" ) %>% 
  mutate(state = substr(location, 4,5)) %>% 
  ungroup() %>% 
  group_by(state) %>% 
  summarise(Count = sum(count), .groups="drop") %>%
  arrange( desc(Count) ) 

# many other states than just these ones, but did not observe R-w Bbirds here so they get filtered out. 

cat("You observed the most Red-winged blackbird most frequently in", b.bird$state[1], ", with", b.bird$Count[1], "total birds.")
## You observed the most Red-winged blackbird most frequently in MO , with 596 total birds.

Filter observations for a duration between 5 and 200 minutes. Calculate the mean rate per checklist that I encounter species each year. Specifically, calculate the number of species in each checklist divided by duration and then take the mean for the year.

df %>% filter(duration >= 5, duration <= 200) %>% 
  group_by(list_ID, year) %>%
  summarise(num_of_unique = length(common_name), .groups="drop") %>%  # number of species in each checklist 
  group_by(year) %>%
  summarise(num_of_lists = length(unique(list_ID)),
            Mean_species = mean(num_of_unique), .groups="drop") %>%  # mean for each year
  arrange(year) 
## # A tibble: 13 x 3
##     year num_of_lists Mean_species
##    <int>        <int>        <dbl>
##  1  2003            2         4.5 
##  2  2004           41         4.32
##  3  2009            1         8   
##  4  2013            1        14   
##  5  2014           77        19.1 
##  6  2015           38        15.9 
##  7  2016            6        19.3 
##  8  2017           55        22.0 
##  9  2018           22        17.7 
## 10  2019           13        20.9 
## 11  2020           40        17.1 
## 12  2021           45        15.1 
## 13  2022           14        13.6

Create a tibble that includes the complete observations for the top 10 most frequently observed species. First generate a top 10 list and then use this list to filter all observations. Export this tibble as a .csv file saved to a folder called “Results” folder within your R project and add link to the markdown document.

# generate a list of top ten most observed (i.e. highest count)
tops_df = df %>% group_by(common_name) %>% 
  summarise(num_observed = sum(count), .groups="drop") %>%  # summarise across all lists/states/etc. 
  arrange( desc(num_observed) )

topten_list = tops_df$common_name[1:10]

topten_df = df %>% filter(common_name %in% topten_list)
knitr::kable(head(topten_df), "simple")
X list_ID common_name scientific_name date time count duration location latitude longitude count_tot month year
23 S21177034 Canada Goose Branta canadensis 2015-01-03 02:00 PM 4 90 US-IL 39.92433 -91.41487 11 1 2015
24 S37097000 Canada Goose Branta canadensis 2017-05-23 02:06 PM 1 50 US-VT 44.09702 -73.34205 44 5 2017
25 S1740462 Canada Goose Branta canadensis 2004-10-05 05:00 PM 2 0 US-VT 43.78300 -73.31543 12 10 2004
26 S22598375 Canada Goose Branta canadensis 2015-03-30 04:30 PM 65 30 US-VT 43.78300 -73.31543 81 3 2015
27 S22635608 Canada Goose Branta canadensis 2015-04-01 04:30 PM 2 20 US-VT 43.78300 -73.31543 16 4 2015
28 S37075088 Canada Goose Branta canadensis 2017-05-22 06:17 PM 1 9 US-VT 43.78300 -73.31543 61 5 2017
# write.csv(topten_df, "Results/top_ten_species.csv")