update_azmpdata

library(azmpdata)
#> casaultb/azmpdata status:
#>  (Package ver: 0.2019.0.9100) Up to date
#>  (Data ver:2021-01-14 ) Up to date
#>  azmpdata:: Indexing all available monthly azmpdata...

This package should be updated on an annual basis or whenever there are major changes made to the data set. The update process should only be undertaken by developers or data managers. The following document describes the procedure for updating the data package

Step 1

Add new raw data files

For physical data: Raw data files (in either .dat or .csv) can be added to inst/extdata. It is best if this folder is kept organized by variable, for example the inst/extdata/airTemperature folder contains all the raw airTemperature data. It is important that the raw data format be identical to the previously used formats for that specific variable. The format does vary between variables but should be kept the same between years within a variable. If you are unsure of the correct format, please open an existing data file and review the format before uploading new raw data!

For biological data: Raw data files are uploaded to the FTP server at ‘ftp://ftp.dfo-mpo.gc.ca/AZMP_Maritimes/AZMP_Reporting/outputs’. The formatting of these data files is important and should be maintained between years.

For more details on the formatting of data, please see the style_guide vignette.

Step 2

Processing Scripts Data is loaded into the package using individual scripts for each dataframe. These scripts are contained in inst/extdata and organized by folder depending on the data type. These scripts output package data objects.

Step 3

Source all scripts To recreate all dataframes use the R script ‘source_all.R’ this will regenerate all dataframes in the package and update the datadate.txt file which identifies the data on which data was last updated. This datadate.txt file is important for versioning the package and ensuring users have the latest data. If running dataframe scripts individually, please ensure datadate.txt is updated using the data_update() function.

Step 4

Ensuring all variables are included in documentation If you have added new data to the package, which includes new variables. They may not be properly documented and you should ensure that BOTH sources of documentation are updated. The first source of documentation is the package Roxygen documentation. This is contained in the script R/data.R, where each dataframe is documented in a standard format. New variables should be added to the appropriate dataframe in the same format which requires the variable name and definition. If the new variable’s citation information is different than what is currently included with the dataframe, please add the new relevant citation information as well. The second documentation source which should be updated (note this order is not absolutely required), is the variable look up table, inst/extdata/lookup/variable_look_up.csv. This csv file is easiest to update through excel. If a new variable is added, all columns should be filled out including variable name, definition, the category of the data and which dataframe in which it is included, as well as relevant citation information. It is very important for the integrity of the package that BOTH of these sources are updated to include any new varibale information so that users of the package can have a complete context of each data product.Once the documentation has been updated, the command devtools::document() should be run so that any updates to the .Rd documentation files take effect.

Emily Chisholm

Step 1

Step 2

Step 3

Step 4