SCBI data

Example

Let’s use data from the Smithsonian Conservation Biology Institute (SCBI) ForestGEO plot. They present their data at https://scbi-forestgeo.github.io/SCBI-ForestGEO-Data/:

This is the public data portal for the SCBI ForestGEO plot, which points to archive locations for our various data products (some in this repository, many elsewhere).

SCBI datasets are scattered across multiple organizations and repositories. I’ll use the ghr package to find and access some of those datasets, and the purrr package mostly to apply functions over multiple elements of a vector.

library(ghr)
library(purrr)

(I’ll also use fs to manipulate paths, and readr to read files. I’ll refer to functions from these packages using the syntax package::function().)

Climate

Climate data from SCBI is stored in the GitHub organization “forestgeo”, particularly in the repository “Climate”.

ghr_ls() lists GitHub directories in a way similar to how fs::dir_ls() lists local directories.

ghr_ls("forestgeo/Climate/Met_Station_Data/SCBI")
#> [1] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI" 
#> [2] "Met_Station_Data/SCBI/Front Royal weather station"
#> [3] "Met_Station_Data/SCBI/README.md"

I can use a regular expression (regexp) to focus, for example, on .csv files.

# All files
ghr_ls("forestgeo/Climate/Met_Station_Data/SCBI/Front Royal weather station")
#> [1] "Met_Station_Data/SCBI/Front Royal weather station/Front Royal_NOAA_11162015.csv"
#> [2] "Met_Station_Data/SCBI/Front Royal weather station/README.md"

# .csv files
ghr_ls(
  "forestgeo/Climate/Met_Station_Data/SCBI/ForestGEO_met_station-SCBI", 
  regexp = "[.]csv$"
)
#>  [1] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2009.csv"
#>  [2] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2010.csv"
#>  [3] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2011.csv"
#>  [4] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2012.csv"
#>  [5] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2013.csv"
#>  [6] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2014.csv"
#>  [7] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2015.csv"
#>  [8] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2016.csv"
#>  [9] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2017.csv"
#> [10] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2018.csv"
#> [11] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/SCB_Metdata_5min_2019.csv"
#> [12] "Met_Station_Data/SCBI/ForestGEO_met_station-SCBI/mettower_metadata.csv"

Species list

In addition to climate data, SCBI has a number of species-list datasets. These datasets are in the GitHub organization “SCBI-ForestGEO”, in the “SCBI-ForestGEO-Data” repository, and in the “species_lists” folder.

species_lists <- "SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists"
# Get the last part of the path
(subdirs <- fs::path_file(ghr_ls(species_lists)))
#> [1] "Full plant list"   "GenBank"           "Tree ecology"     
#> [4] "insects_pathogens"

Let’s explore the .csv files in each sub directory. To reduce duplication I first create the vector paths to store the path to all of the sub directories I want to explore. Then I apply ghr_ls() to each element in paths and use regexp to focus on .csv files only.

(paths <- fs::path(species_lists, subdirs))
#> SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists/Full plant list
#> SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists/GenBank
#> SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists/Tree ecology
#> SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists/insects_pathogens

paths %>% 
  map(~ ghr_ls(.x, regexp = "[.]csv$"))
#> Warning: Nothing in 'SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists/
#> GenBank' matches '[.]csv$'
#> [[1]]
#> [1] "species_lists/Full plant list/SCBI_all_sp_woody_&_herb.csv"
#> 
#> [[2]]
#> character(0)
#> 
#> [[3]]
#> [1] "species_lists/Tree ecology/SCBI_ForestGEO_sp_ecology.csv"         
#> [2] "species_lists/Tree ecology/SCBI_ForestGEO_sp_ecology_metadata.csv"
#> 
#> [[4]]
#> [1] "species_lists/insects_pathogens/insects_pathogens metadata.csv"
#> [2] "species_lists/insects_pathogens/insects_pathogens.csv"

Instead of the file names I can show the first few rows of each file. I use ghr_ls_download_url() to get download URLs of each .csv file in each sub directory. To iterate over all sub directories I use purrr::map().

download_urls <- paths %>% 
  map(~ ghr_ls_download_url(.x, regexp = "[.]csv$"))
#> Warning: Nothing in 'SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists/
#> GenBank' matches '[.]csv$'
download_urls
#> [[1]]
#> [1] "https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/species_lists/Full%20plant%20list/SCBI_all_sp_woody_%26_herb.csv"
#> 
#> [[2]]
#> character(0)
#> 
#> [[3]]
#> [1] "https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/species_lists/Tree%20ecology/SCBI_ForestGEO_sp_ecology.csv"         
#> [2] "https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/species_lists/Tree%20ecology/SCBI_ForestGEO_sp_ecology_metadata.csv"
#> 
#> [[4]]
#> [1] "https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/species_lists/insects_pathogens/insects_pathogens%20metadata.csv"
#> [2] "https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/species_lists/insects_pathogens/insects_pathogens.csv"

And finally I use readr::read_csv() to read each dataset into R. Again I use purrr::map() to iterate over each sub directory.

download_urls %>% 
  unlist() %>% 
  map(~ head(readr::read_csv(.x)))
#> [[1]]
#> # A tibble: 6 x 4
#>   FAMILY      `SCIENTIFIC NAME`    `COMMON NAME`             life_form 
#>   <chr>       <chr>                <chr>                     <chr>     
#> 1 Acanthaceae Ruellia strepens     limestone wild petunia    herbaceous
#> 2 Adoxaceae   Sambucus canadensis  American black elderberry woody     
#> 3 Adoxaceae   Sambucus pubens      Red elderberry            woody     
#> 4 Adoxaceae   Viburnum acerifolium Mapleleaf viburnum        woody     
#> 5 Adoxaceae   Viburnum prunifolium Black haw                 woody     
#> 6 Adoxaceae   Viburnum recognitum  Southern arrowood         woody     
#> 
#> [[2]]
#> # A tibble: 6 x 17
#>   family genus species author spcode canopy_position live_form
#>   <chr>  <chr> <chr>   <chr>  <chr>  <chr>           <chr>    
#> 1 Sapin… Acer  negundo Michx. acne   understory      tree     
#> 2 Sapin… Acer  platan… L.     acpl   canopy          tree     
#> 3 Sapin… Acer  rubrum  L.     acru   canopy          tree     
#> 4 Simar… Aila… altiss… (Mill… aial   canopy          tree     
#> 5 Rosac… Amel… arborea (Mich… amar   understory      small tr…
#> 6 Annon… Asim… triloba (L.) … astr   understory      small tr…
#> # … with 10 more variables: native_status <chr>, habitat <chr>,
#> #   successional_status <chr>, drought_tolerance <chr>,
#> #   deer_herbivory <chr>, flower_reproduction <chr>, fruit_type <chr>,
#> #   dispersal_vector <chr>, IUCN_status <chr>, References <chr>
#> 
#> [[3]]
#> # A tibble: 6 x 3
#>   Column Field        Description                                          
#>   <chr>  <chr>        <chr>                                                
#> 1 1 / A  family       Plant family name                                    
#> 2 2 / B  genus        Genus name                                           
#> 3 3 / C  species      Species epithet (it could include variety or subspec…
#> 4 4 / D  author       Author of species name as Flora of Virginia (2012)   
#> 5 5 / E  spcode       Species code used at SCBI, four characters refers to…
#> 6 6 / F  canopy_posi… Most common canopy position for individuals within a…
#> 
#> [[4]]
#> # A tibble: 6 x 4
#>   Column Field         Description                   Variable.Codes        
#>   <chr>  <chr>         <chr>                         <chr>                 
#> 1 1 / A  pest_pathoge… common name of pest or patho… -                     
#> 2 2 / B  pest_pathoge… scientific name of pest or p… -                     
#> 3 3 / C  tree_species… tree species affected by pes… -                     
#> 4 4 / D  pathogen_type indicates pathogen type       fungus, insect        
#> 5 5 / E  origin        indicates whether pest/patho… native, exotic, cosmo…
#> 6 6 / F  native range  Geographic origin of pest/pa… -                     
#> 
#> [[5]]
#> # A tibble: 6 x 17
#>   pest_pathogen_c… pest_pathogen_s… tree_species_af… pathogen_type origin
#>   <chr>            <chr>            <chr>            <chr>         <chr> 
#> 1 Emerald ash bor… Agrilus planipe… Fraxinus spp., … Insect        exotic
#> 2 Balsam woolly a… Adelgis piceae   Abies balsamea,… Insect        exotic
#> 3 Elongate hemloc… Fiorinia externa Tsuga spp        Insect        exotic
#> 4 Gypsy moth       Lymantria dispar Quercus and oth… Insect        exotic
#> 5 Hemlock woolly … Adelgis tsugae   Tsuga canadensi… Insect        exotic
#> 6 Woolly beech sc… Cryptococcus fa… Fagus spp.       Insect        exotic
#> # … with 12 more variables: `native range` <chr>,
#> #   year_introduced_north_america <chr>, Virginia_Blue_Ridge_status <chr>,
#> #   Virginia_source <chr>, Virginia_year_intro <chr>, SNP_year_ID <chr>,
#> #   SNP_notes <chr>, SCBI_year_ID <chr>, SCBI_observation_type <chr>,
#> #   SCBI_notes <chr>, general_notes <chr>, citation <chr>

Example

Climate

Species list

Contents