library(azmpdata)
#> casaultb/azmpdata status:
#> (Package ver: 0.2019.0.9100) Up to date
#> (Data ver:2021-01-14 ) Up to date
#> azmpdata:: Indexing all available monthly azmpdata...This package should be updated on an annual basis or whenever there are major changes made to the data set. The update process should only be undertaken by developers or data managers. The following document describes the procedure for updating the data package
Add new raw data files
For physical data: Raw data files (in either .dat or .csv) can be
added to inst/extdata. It is best if this folder is kept
organized by variable, for example the
inst/extdata/airTemperature folder contains all the raw
airTemperature data. It is important that the raw data format be
identical to the previously used formats for that specific variable. The
format does vary between variables but should be kept the same between
years within a variable. If you are unsure of the correct format, please
open an existing data file and review the format before uploading new
raw data!
For biological data: Raw data files are uploaded to the FTP server at ‘ftp://ftp.dfo-mpo.gc.ca/AZMP_Maritimes/AZMP_Reporting/outputs’. The formatting of these data files is important and should be maintained between years.
For more details on the formatting of data, please see the
style_guide vignette.
Processing Scripts Data is loaded into the package
using individual scripts for each dataframe. These scripts are contained
in inst/extdata and organized by folder depending on the
data type. These scripts output package data objects.
Source all scripts To recreate all dataframes use
the R script ‘source_all.R’ this will regenerate all dataframes in the
package and update the datadate.txt file which identifies the data on
which data was last updated. This datadate.txt file is important for
versioning the package and ensuring users have the latest data. If
running dataframe scripts individually, please ensure datadate.txt is
updated using the data_update() function.
Ensuring all variables are included in documentation
If you have added new data to the package, which includes new variables.
They may not be properly documented and you should ensure that BOTH
sources of documentation are updated. The first source of documentation
is the package Roxygen documentation. This is contained in the script
R/data.R, where each dataframe is documented in a standard
format. New variables should be added to the appropriate dataframe in
the same format which requires the variable name and definition. If the
new variable’s citation information is different than what is currently
included with the dataframe, please add the new relevant citation
information as well. The second documentation source which should be
updated (note this order is not absolutely required), is the variable
look up table, inst/extdata/lookup/variable_look_up.csv.
This csv file is easiest to update through excel. If a new variable is
added, all columns should be filled out including variable name,
definition, the category of the data and which dataframe in which it is
included, as well as relevant citation information. It is very important
for the integrity of the package that BOTH of these sources are updated
to include any new varibale information so that users of the package can
have a complete context of each data product.Once the documentation has
been updated, the command devtools::document() should be
run so that any updates to the .Rd documentation files take
effect.