Setting up necessarry packages and scripts

# load packages ----
library(log4r)
## 
## Attaching package: 'log4r'
## The following object is masked from 'package:base':
## 
##     debug
library(TeachingDemos)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pracma)
## 
## Attaching package: 'pracma'
## 
## The following object is masked from 'package:purrr':
## 
##     cross
library(ggmosaic)

# load any additional packages here...

setwd("/Users/elyre/Desktop/Comp Bio/RemesBiol6100/Stupidity")
# source function files ----
source("barracudar/DataTableTemplate.R")
source("barracudar/AddFolder.R")
source("barracudar/BuildFunction.R")
source("barracudar/MetaDataTemplate.R")
source("barracudar/CreatePaddedLabel.R")
source("barracudar/InitiateSeed.R")
source("barracudar/SetUpLog.R")
source("barracudar/SourceBatch.R")

Question 2

Within each year’s folder, you will only be using a file from each year labeled “countdata” in its title. Using for loops, iterate through each year’s folders to gather the file names of these “countdata” .csv files.


```r
setwd("/Users/elyre/Desktop/Comp Bio/RemesBiol6100/Stupidity")
WorkingDirectory<-paste0(getwd(),"/NEON_count-landbird")

filelist <- list.files(WorkingDirectory ,pattern="BART")

# use a for loop for number of files that we're concerned with, pull out files

# paste() or paste0() function concatenates strings
# paste0("Here is ","the ","filepath: ", filelist[1]) # example

# make an empty vector
filenames <- c()

# make matrix of file names
for (i in 1:length(filelist)) {
  setwd(paste0(WorkingDirectory,"/", filelist[i])) # create new file path to go to
  
  filenames[i] <- list.files(pattern="countdata") # any time the file has "countdata" in it, it pulls it out
}

print(filenames)
```

```
## [1] "NEON.D01.BART.DP1.10003.001.brd_countdata.2015-06.basic.20231226T232626Z.csv"
## [2] "NEON.D01.BART.DP1.10003.001.brd_countdata.2017-06.basic.20231227T094709Z.csv"
## [3] "NEON.D01.BART.DP1.10003.001.brd_countdata.2018-06.basic.20231228T172744Z.csv"
## [4] "NEON.D01.BART.DP1.10003.001.brd_countdata.2019-06.basic.20231227T184129Z.csv"
## [5] "NEON.D01.BART.DP1.10003.001.brd_countdata.2020-06.basic.20231227T224944Z.csv"
## [6] "NEON.D01.BART.DP1.10003.001.brd_countdata.2020-07.basic.20231227T225020Z.csv"
## [7] "NEON.D01.BART.DP1.10003.001.brd_countdata.2021-06.basic.20231228T010546Z.csv"
## [8] "NEON.D01.BART.DP1.10003.001.brd_countdata.2022-06.basic.20231229T053256Z.csv"
```

Quesion 3 and 4 and 5

Starting with pseudo-code, generate functions for 1) Cleaning the data for any empty/missing cases, 2) Extract the year from each file name, 3) Calculate Abundance for each year (Total number of individuals found), 4) Calculate Species Richness for each year(Number of unique species found)

Starting with pseudo-code, generate functions for 1) Cleaning the data for any empty/missing cases, 2) Extract the year from each file name, 3) Calculate Abundance for each year (Total number of individuals found), 4) Calculate Species Richness for each year(Number of unique species found)

Create an initial empty data frame to hold the above summary statistics-you should have 4 columns, one for the file name, one for abundance, one for species richness, and one for year.

Using a for loop, run your created functions as a batch process for each folder, changing the working directory as necessary to read in the correct files, calculating summary statistics with your created functions, and then writing them out into your summary statistics data frame.

getinfo <- function(filelist,filenames) {
  
  metadata <- matrix(0,length(filelist),4)
  colnames(metadata) <- c("File","Year","Total # Individuals", "Species Richness")
  
  for (i in 1:length(filelist)) { # for every folder (there is only one good csv per folder)
    
    # go into the folder listed at filelist[i]
    setwd(paste0(WorkingDirectory,"/", filelist[i]))
    
    # get the csv data from filename
    csvdata <- read.csv(filenames[i]) # read in the csv from this folder, off of the list of good csvs!
    csvdata
    
    dfcsv <- data.frame(csvdata)
    dfcsv
    
    # GETTING INFORMATION
    
    # get rid of empty/missing cases
    dfcsv <- na.omit(dfcsv[,1:20]) # rows after 20 are either all NA or no NA, so this seems prudent if I still want to have stuff to work with
    dfcsv
    
    # extract the year from the file name
    year <- str_sub(filenames[i],43,46)
    
    
    # calculate total individuals found
    total_individuals <- nrow(dfcsv)
    
    # calculate total number of unique species
    species_richness <- length(unique(dfcsv[,12]))
    
    # fill in the empty vector for each csv
    metadata[i,1:4] <- c(filelist[i], year, total_individuals, species_richness)
    # number of items to replace is not a multiple of replacement length
    
  }
  
  return(metadata)
  
}

StatisticsDF<-getinfo(filelist,filenames)


print(StatisticsDF)
##      File                                                                     
## [1,] "NEON.D01.BART.DP1.10003.001.2015-06.basic.20240127T000425Z.RELEASE-2024"
## [2,] "NEON.D01.BART.DP1.10003.001.2017-06.basic.20240127T000425Z.RELEASE-2024"
## [3,] "NEON.D01.BART.DP1.10003.001.2018-06.basic.20240127T000425Z.RELEASE-2024"
## [4,] "NEON.D01.BART.DP1.10003.001.2019-06.basic.20240127T000425Z.RELEASE-2024"
## [5,] "NEON.D01.BART.DP1.10003.001.2020-06.basic.20240127T000425Z.RELEASE-2024"
## [6,] "NEON.D01.BART.DP1.10003.001.2020-07.basic.20240127T000425Z.RELEASE-2024"
## [7,] "NEON.D01.BART.DP1.10003.001.2021-06.basic.20240127T000425Z.RELEASE-2024"
## [8,] "NEON.D01.BART.DP1.10003.001.2022-06.basic.20240127T000425Z.RELEASE-2024"
##      Year   Total # Individuals Species Richness
## [1,] "2015" "453"               "40"            
## [2,] "2017" "411"               "34"            
## [3,] "2018" "512"               "36"            
## [4,] "2019" "372"               "39"            
## [5,] "2020" "447"               "43"            
## [6,] "2020" "50"                "16"            
## [7,] "2021" "869"               "45"            
## [8,] "2022" "578"               "37"