vignettes/search_azmpdata.Rmd
search_azmpdata.Rmd
This vignette will describe how to effectively analyze, both spatially and temporally, the data contained within the azmpdata
package. To do so, basic R functions combined with built-in functions, variable_lookup()
, azmpmap()
, area_indexer()
, and plot_availability()
will be explained. A suggested work flow is shown below to help the user navigate the package.
Listing all the available dataframes in a package can be a quick way to see a high level overview of the datasets. The output from the following code will display all the dataframe names as well as brief, high-level descriptions. This data will appear in a new window.
data(package = 'azmpdata')
You can browse through the help documentation for all datasets, and functions in the package by calling the code below. This information will come up in the R studio help window.
help(package = 'azmpdata')
Clicking on a specific dataframe name or function in the help window will bring you to a detailed help page. For dataframes, this help page will include variable names and definitions, as well as information about the source of the data and the proper citation.
R studio provides some built in capability to be able to search through the help files for a given package. The function help.search()
can be used to search through titles and descriptions of help files in a package.
help.search(pattern = 'section', package = 'azmpdata')
This code will bring up all the help files (in the R studio help window), which match the pattern
provided. Note, that this does not search through the entire text of the help file, for more details see lookup_variable()
in the next section.
This is the first step in the suggested work flow for the azmpdata
package. We have created a custom search function in azmpdata
which allows a user to search the package in two ways. One, search for keywords in the names of all variables, included in all datasets of the package. Two, search for keywords in the entirety of the package help text (including the details
sections, which often contains variable definitions and is not searched through when using help.search()
).
The output of this function will be a table with three columns, keyword
, or the search keyword that was provided, that has been matched (this is useful if providing multiple keywords), variable
, the name of the variable which matched your keyword search, and dataframe
which will be the name of the dataframe in which your matched variable is contained, or if searching through the help text, the name of the dataframe where your keyword was found in the help text.
This function can be provided with a single keyword,
variable_lookup('nitrate')
## # A tibble: 10 × 3
## keyword variable dataframe
## <chr> <chr> <chr>
## 1 nitrate integrated_nitrate_0_50 Derived_Annual_Sections
## 2 nitrate integrated_nitrate_0_50 Derived_Annual_Stations
## 3 nitrate integrated_nitrate_0_50 Derived_Occupations_Sections
## 4 nitrate integrated_nitrate_0_50 Derived_Occupations_Stations
## 5 nitrate integrated_nitrate_50_150 Derived_Annual_Sections
## 6 nitrate integrated_nitrate_50_150 Derived_Annual_Stations
## 7 nitrate integrated_nitrate_50_150 Derived_Occupations_Sections
## 8 nitrate integrated_nitrate_50_150 Derived_Occupations_Stations
## 9 nitrate nitrate Discrete_Occupations_Sections
## 10 nitrate nitrate Discrete_Occupations_Stations
or multiple keywords.
variable_lookup(c('nitrate', 'integrated'))
## # A tibble: 32 × 3
## keyword variable dataframe
## <chr> <chr> <chr>
## 1 integrated integrated_chlorophyll_0_100 Derived_Annual_Sections
## 2 integrated integrated_chlorophyll_0_100 Derived_Annual_Stations
## 3 integrated integrated_chlorophyll_0_100 Derived_Occupations_Sections
## 4 integrated integrated_nitrate_0_50 Derived_Annual_Sections
## 5 integrated integrated_nitrate_0_50 Derived_Annual_Stations
## 6 integrated integrated_nitrate_0_50 Derived_Occupations_Sections
## 7 integrated integrated_nitrate_0_50 Derived_Occupations_Stations
## 8 integrated integrated_nitrate_50_150 Derived_Annual_Sections
## 9 integrated integrated_nitrate_50_150 Derived_Annual_Stations
## 10 integrated integrated_nitrate_50_150 Derived_Occupations_Sections
## # … with 22 more rows
In order to search through the help text, use the option search_help
variable_lookup('nitrate', search_help = TRUE)
## Joining, by = c("keyword", "variable", "dataframe")
## # A tibble: 19 × 3
## keyword variable dataframe
## <chr> <chr> <chr>
## 1 nitrate integrated_nitrate_0_50 Derived_Annual_Sections
## 2 nitrate integrated_nitrate_0_50 Derived_Annual_Stations
## 3 nitrate integrated_nitrate_0_50 Derived_Occupations_Sections
## 4 nitrate integrated_nitrate_0_50 Derived_Occupations_Stations
## 5 nitrate integrated_nitrate_50_150 Derived_Annual_Sections
## 6 nitrate integrated_nitrate_50_150 Derived_Annual_Stations
## 7 nitrate integrated_nitrate_50_150 Derived_Occupations_Sections
## 8 nitrate integrated_nitrate_50_150 Derived_Occupations_Stations
## 9 nitrate nitrate Discrete_Occupations_Sections
## 10 nitrate nitrate Discrete_Occupations_Stations
## 11 nitrate <NA> area_indexer
## 12 nitrate <NA> Derived_Annual_Sections
## 13 nitrate <NA> Derived_Annual_Stations
## 14 nitrate <NA> Derived_Occupations_Sections
## 15 nitrate <NA> Derived_Occupations_Stations
## 16 nitrate <NA> Discrete_Occupations_Sections
## 17 nitrate <NA> Discrete_Occupations_Stations
## 18 nitrate <NA> plot_availability
## 19 nitrate <NA> variable_lookup
Note, that if the search returns a match to your keyword in the text of a help file, the output will not return a specific variable name in the variable
column (instead it will return NA), and will give you only the dataframe name in the dataframe
column.
Once you have your search results, you may want to call the dataframe which matched your search terms. This is very straight-forward if using our custom search function variable_lookup()
ans <- variable_lookup(keywords = 'wet weight', search_help = TRUE)
df <- get(ans$dataframe[1])
Once the user has a general idea of what is contained within the package using variable_lookup()
, the next logical step, step 2 of the workflow, is to locate where the samples are taken. To do so, we have created the azmpmap()
function. This function generates a leaflet plot of the geospatial files. The user then has the option to focus in on different stations, sections, and areas with the ability to double-click to see the name, depth, and associated dataframes. To try out azmpmap()
use the following code:
azmpmap()
At this point the user knows generally what is contained within the package from variable_lookup()
as well as where data are sampled using azmpmap()
. It is then recommended that the user utilize the information provided in the popups to dive deeper into their search, using area_indexer()
or step three of the workflow.area_indexer()
assembles a dataframe consisting of the years and areas where the azmpdata packages data has been collected, as well as the associated file where the data can be found. The user also has the ability to look at associated parameters and months with the specification.
For example, if a user was interested in the nitrate in station P5 from years 2010 to 2020, the following code would be used:
k <- area_indexer(parameters="nitrate", areanames="P5", years=c(2010:2020), fuzzyParameters = F)
head(k)
## year area section station datafile parameter
## 10510 2010 NA NA P5 Discrete_Occupations_Stations nitrate
## 10648 2011 NA NA P5 Discrete_Occupations_Stations nitrate
## 10765 2012 NA NA P5 Discrete_Occupations_Stations nitrate
## 10951 2013 NA NA P5 Discrete_Occupations_Stations nitrate
## 11033 2014 NA NA P5 Discrete_Occupations_Stations nitrate
## 11169 2015 NA NA P5 Discrete_Occupations_Stations nitrate
Basically, area_indexer()
is a good way to filter through the multiple dataframes to get exactly what the user is interested in.
The user now knows what is contained within the package, where data are sampled, and how to properly filter through the data. It is then recommended that the user completed step 4, plot_availability()
to understand when the data are sampled. This function generates a score card of samples per month based on areaType
, areaName
, and parameters
.
For example, if a user wanted to know the frequency (ie. samples/month) for the nitrate at station P5, they would type in the following:
plot_availability(areaType = "station", areaName="P5", parameters="nitrate")
The package azmpdata
contains a wealth of datasets to help analyze AZMP data. It is recommended that the users utilize the built in functions in the following order 1) variable_lookup()
, 2) azmpmap()
3) area_indexer()
, and 4) plot_availability()
to easily search through the data and efficiently analyze both spatially and temporally. For more information on base R search functions please feel free to do your own research. For questions about the custom azmpdata
search please contact developers or bring an issue to the github board (listed in the package description under BugReports).