Package 'ohcleandat' reference manual

Title:	One Health Data Cleaning and Quality Checking Package
Description:	This package provides useful functions to orchestrate analytics and data cleaning pipelines for One Health projects.
Authors:	Collin Schwantes [cre, aut] , Johana Teigen [aut] , Ernest Guevarra [aut] , Dean Marchiori [aut] , Melinda Rostal [aut] , EcoHealth Alliance [cph, fnd] (https://ror.org/02zv3m156)
Maintainer:	Collin Schwantes <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.12
Built:	2025-02-07 06:14:53 UTC
Source:	https://github.com/ecohealthalliance/ohcleandat

Autobot Function

Description

This compares two columns. Where there are differences, it extracts the values and compiles a correctly formatted validation log. This is intended to be used when an automated formatting correction is proposed in the data, but the actual updating of the records is required to happen via the validation log.

Usage

autobot(data, old_col, new_col, key)
autobot(data, old_col, new_col, key)

Arguments

`data`	data.frame or tibble
`old_col`	The existing column with formatting issues
`new_col`	The new column with corrections applied
`key`	column that uniquely identifies the records in data

Value

tibble formatted as validation log

Check existence of ID columns across two tables

Description

This returns rows in x without a match in y. Returning selected columns only. It is a this wrapper around dplyr::anti_join.

Usage

check_id_existence(x, y, by, select_cols, ...)
check_id_existence(x, y, by, select_cols, ...)

Arguments

`x`	data.frame or tibble containing match id to check for non existence in y
`y`	data.frame or tibble to check for non-existence of match id from x
`by`	character containing match id, or if named different, a named character vector like c("a" = "b")
`select_cols`	character vector of columns to select in the output. Note that during the join, columns with identical names in both data sets will have a suffix of .x or .y added to disambiguate. These need to be added to ensur the correct column is returned.
`...`	other variables passed to dplyr::anti_join

Value

tibble rows from x without a match in y

Examples

## Not run: 
check_id_existence(x,
                   y,
                   by =  c("Batch_ID" = "batch_id"),
                   select_cols = c("Batch_ID", "iDate", "Farm_ID"))

## End(Not run)
## Not run: 
check_id_existence(x,
                   y,
                   by =  c("Batch_ID" = "batch_id"),
                   select_cols = c("Batch_ID", "iDate", "Farm_ID"))

## End(Not run)

Class to Column Type lookup table

Description

A table that links classes to readr column types. Created from csv file of the same name in inst/

Usage

class_to_col_type
class_to_col_type

Format

`class_to_col_type`

A data frame with 9 rows and 3 columns:

col_type: Type of column as described in readr
col_class: Class of R object that matches that column type
col_abv: Abbreviation for that column type from readr

...

Details

class_to_col_type <- read.csv(file = "inst/class_to_col_type.csv") usethis::use_data(class_to_col_type,overwrite = TRUE)

Combine Validation Logs

Description

Checks for the existence of an existing validation log and appends new records from the current run.

Usage

combine_logs(existing_log, new_log)
combine_logs(existing_log, new_log)

Arguments

`existing_log`	tibble existing validation log
`new_log`	tibble newly generated validation log

Value

tibble appended validation log for upload

Correct data using validation log

Description

Takes a validation log and applies the required changes to the data

Usage

correct_data(validation_log, data, primary_key)
correct_data(validation_log, data, primary_key)

Arguments

`validation_log`	tibble a validation log
`data`	tibble the original unclean data
`primary_key`	character the quoted column name for the unique identifier in data

Value

tibble the semi-clean data set

Create Free Text Log

Description

Creates custom validation log for 'other: explain' free text responses that may contain valid multi-choice options.

Usage

create_freetext_log(response_data, form_schema, url, lookup)
create_freetext_log(response_data, form_schema, url, lookup)

Arguments

`response_data`	data.frame ODK questionnaire response data
`form_schema`	data.frame ODK flattened form schema data
`url`	The ODK submission URL excluding the uuid identifier
`lookup`	a tibble formatted as a lookup to match questions with their free text responses. The format must match the output of `othertext_lookup()`. This function can be passed to this function argument as a convenient handler for this value.

Details

This function needs to link a survey question with its corresponding free text response. Users can use the othertext_lookup() function to handle this, or provide their own tibble in the same format. See below: tibble::tribble( ~name, ~other_name, question_1, question_1_other )

Value

data.frame validation log

Examples

## Not run: 
# Using othertext_lookup helper
test_a <- create_freetext_log(response_data = animal_owner_semiclean,
                              form_schema = animal_owner_schema,
                              url = "https://odk.xyz.io/#/projects/5/forms/project/submissions",
                              lookup = ohcleandat::othertext_lookup(questionnaire = "animal_owner")
                              )

# using custom lookup table
mylookup <- tibble::tribble(
  ~name, ~other_name,
  "f2_species_own", "f2a_species_own_oexp"
  )

  test_b <- create_freetext_log(response_data = animal_owner_semiclean,
                                form_schema = animal_owner_schema,
                                url = "https://odk.xyz.io/#/projects/5/forms/project/submissions",
                                lookup = mylookup
                                )

## End(Not run)

## Not run: 
# Using othertext_lookup helper
test_a <- create_freetext_log(response_data = animal_owner_semiclean,
                              form_schema = animal_owner_schema,
                              url = "https://odk.xyz.io/#/projects/5/forms/project/submissions",
                              lookup = ohcleandat::othertext_lookup(questionnaire = "animal_owner")
                              )

# using custom lookup table
mylookup <- tibble::tribble(
  ~name, ~other_name,
  "f2_species_own", "f2a_species_own_oexp"
  )

  test_b <- create_freetext_log(response_data = animal_owner_semiclean,
                                form_schema = animal_owner_schema,
                                url = "https://odk.xyz.io/#/projects/5/forms/project/submissions",
                                lookup = mylookup
                                )

## End(Not run)

Create Validation Log for Questionnaire data

Description

Create Validation Log for Questionnaire data

Usage

create_questionnaire_log(data, form_schema, pkey, rule_set, url)
create_questionnaire_log(data, form_schema, pkey, rule_set, url)

Arguments

`data`	data fame Input data to be validated
`form_schema`	data frame The ODK form schema data
`pkey`	character A character vector giving the column name of the primary key or unique row identifier in the data
`rule_set`	a rule set of class validator from the validate package
`url`	The ODK submission URL excluding the uuid identifier

Value

a data frame formatted as a validation log for human review

Create a "rules" file from a template

Description

Creates a rules file from a template to show general structure of the rule file.

Usage

create_rules_from_template(
  name,
  dir = "R",
  open = TRUE,
  showWarnings = FALSE,
  overwrite_file = FALSE
)
create_rules_from_template(
  name,
  dir = "R",
  open = TRUE,
  showWarnings = FALSE,
  overwrite_file = FALSE
)

Arguments

`name`	String. Name of rule set function e.g. create_rules_my_dataset
`dir`	String. Name of directory where file should be created. If it doesnt exist, a folder will be created.
`open`	Logical. Should the file be opened?
`showWarnings`	Logical. Should dir.create show warnings?
`overwrite_file`	Logical. Should a rules file with the same name be overwritten?

Value

String. File path of newly created file

Examples

## Not run: 
# create a ruleset and immediately open it
    create_rules_from_template(name = "create_rules_field_data")
# create a ruleset and don't open it
    create_rules_from_template(name = "create_rules_lab_data", open = FALSE)
# create a ruleset and store it in a different folder
    create_rules_from_template(name = "create_rules_lab_data",
    dir = "/path/to/rulesets" open = FALSE)
    
## End(Not run)
## Not run: 
# create a ruleset and immediately open it
    create_rules_from_template(name = "create_rules_field_data")
# create a ruleset and don't open it
    create_rules_from_template(name = "create_rules_lab_data", open = FALSE)
# create a ruleset and store it in a different folder
    create_rules_from_template(name = "create_rules_lab_data",
    dir = "/path/to/rulesets" open = FALSE)
    
## End(Not run)

Create Structural Metadata from a dataframe

Description

This is the metadata that describes the data themselves. This metadata can be generated then joined to pre-existing metadata via field names.

Usage

create_structural_metadata(
  data,
  primary_key = "",
  foreign_key = "",
  additional_elements = tibble::tibble()
)
create_structural_metadata(
  data,
  primary_key = "",
  foreign_key = "",
  additional_elements = tibble::tibble()
)

Arguments

`data`	Any named object. Expects a table but will work superficially with lists or named vectors.
`primary_key`	Character. name of field that serves as a primary key
`foreign_key`	Character. Field or fields that are foreign keys
`additional_elements`	Empty tibble with structural metadata elements and their types.

Details

The metadata table produced has the following elements

name = The name of the field. This is taken as is from data.
description = Description of that field. May be provided by controlled vocabulary
units = Units of measure for that field. May or may not apply
term_uri = Universal Resource Identifier for a term from a controlled vocabulary or schema
comments = Free text providing additional details about the field
primary_key = TRUE or FALSE, Uniquely identifies each record in the data
foreign_key = TRUE or FALSE, Allows for linkages between data sets. Uniquely identifies records in a different data set

Value

dataframe with standard metadata requirements

Examples

## Not run: 
df <- data.frame(a = 1:10, b = letters[1:10])
df_metadata  <- ohcleandat::create_structural_metadata(df)
write.csv(df_metadata,"df_metadata.csv")


Additional elements can be added via a tibble
additional_elements <- tibble::tibble(table_name = NA_character_,
created_by = NA_character_,
updated = NA
)
df_metadata  <- ohcleandat::create_structural_metadata(df,
    additional_elements = additional_elements)

# lets pretend we are using a dataset which already has
## in airtable, you can add field descriptions directly
## in the base. We want those exported and properly formatted
## in our ohcleandat workflow

 base <- "appMyBaseID"
 table_name <- "My Table"

 airtable_metadata  <- airtabler::air_generate_metadata_from_api(base = base,
    field_names_to_snake_case = FALSE ) |>
    dplyr::filter(table_name == {table_name}) |>
    dplyr::select(field_name,field_desc,primary_key)

 airtable_df <- airtabler::fetch_all(base = base, table_name = table_name)

 airtable_df_metadata <- ohcleandat::create_structural_metadata(airtable_df)

 metadata_joined <- dplyr::left_join(airtable_df_metadata,airtable_metadata,
 by = c("name"="field_name"))

 metdata_updated <- metadata_joined |>
 dplyr::mutate(description = field_desc,
               primary_key = primary_key.y,
               ) |>
 dplyr::select(-matches('\\.[xy]|field_desc'))

# ODK
# get all choices from ODK form

dotenv::load_dot_env()

ruODK::ru_setup(
  svc = "https://odk.server.org/v1/projects/5/forms/myproject.svc",
  un = Sys.getenv("ODK_USERNAME"),
  pw = Sys.getenv("ODK_PASSWORD"),
  tz = "GMT",
  odkc_version = "1.1.2")


schema <- ruODK::form_schema_ext()

schema$choices_flat <-schema$`choices_english_(en)` |>
  purrr::map_chr(\(x){
    if("labels" %in% names(x)){
      paste(x$labels,collapse = ", ")
    } else {
      ""
    }

  })

  data_odk <- ruODK::odata_submission_get()
  data_odk_rect <- ruODK::odata_submission_rectangle(data_odk)
  odk_metadata <- ohcleandat::create_structural_metadata(data_odk_rect)


  odk_metadata_joined  <- dplyr::left_join(odk_metadata,schema_simple,
  by = c("name" = "ruodk_name"))

  odk_metadata_choices <- odk_metadata_joined |>
  mutate(description = choices_flat) |>
  select(-choices_flat)



## End(Not run)

## Not run: 
df <- data.frame(a = 1:10, b = letters[1:10])
df_metadata  <- ohcleandat::create_structural_metadata(df)
write.csv(df_metadata,"df_metadata.csv")


Additional elements can be added via a tibble
additional_elements <- tibble::tibble(table_name = NA_character_,
created_by = NA_character_,
updated = NA
)
df_metadata  <- ohcleandat::create_structural_metadata(df,
    additional_elements = additional_elements)

# lets pretend we are using a dataset which already has
## in airtable, you can add field descriptions directly
## in the base. We want those exported and properly formatted
## in our ohcleandat workflow

 base <- "appMyBaseID"
 table_name <- "My Table"

 airtable_metadata  <- airtabler::air_generate_metadata_from_api(base = base,
    field_names_to_snake_case = FALSE ) |>
    dplyr::filter(table_name == {table_name}) |>
    dplyr::select(field_name,field_desc,primary_key)

 airtable_df <- airtabler::fetch_all(base = base, table_name = table_name)

 airtable_df_metadata <- ohcleandat::create_structural_metadata(airtable_df)

 metadata_joined <- dplyr::left_join(airtable_df_metadata,airtable_metadata,
 by = c("name"="field_name"))

 metdata_updated <- metadata_joined |>
 dplyr::mutate(description = field_desc,
               primary_key = primary_key.y,
               ) |>
 dplyr::select(-matches('\\.[xy]|field_desc'))

# ODK
# get all choices from ODK form

dotenv::load_dot_env()

ruODK::ru_setup(
  svc = "https://odk.server.org/v1/projects/5/forms/myproject.svc",
  un = Sys.getenv("ODK_USERNAME"),
  pw = Sys.getenv("ODK_PASSWORD"),
  tz = "GMT",
  odkc_version = "1.1.2")


schema <- ruODK::form_schema_ext()

schema$choices_flat <-schema$`choices_english_(en)` |>
  purrr::map_chr(\(x){
    if("labels" %in% names(x)){
      paste(x$labels,collapse = ", ")
    } else {
      ""
    }

  })

  data_odk <- ruODK::odata_submission_get()
  data_odk_rect <- ruODK::odata_submission_rectangle(data_odk)
  odk_metadata <- ohcleandat::create_structural_metadata(data_odk_rect)


  odk_metadata_joined  <- dplyr::left_join(odk_metadata,schema_simple,
  by = c("name" = "ruodk_name"))

  odk_metadata_choices <- odk_metadata_joined |>
  mutate(description = choices_flat) |>
  select(-choices_flat)



## End(Not run)

Create Translation Log

Description

Collates free text responses from 'other' and 'notes' fields in the survey data. Some language detection is performed and placed in the log notes section for possible translation.

Usage

create_translation_log(response_data, form_schema, url)
create_translation_log(response_data, form_schema, url)

Arguments

`response_data`	data.frame of ODK questionnaire responses
`form_schema`	data.frame or flattened ODK form schema
`url`	The ODK submission URL excluding the uuid identifier

Value

data.frame validation log

Examples

## Not run: 
create_translation_log(response_data = semi_clean_data,
                       form_schema = odk_schema_data,
                       url = "https://odk.xyz.io/#/projects/project-name/submissions"))

## End(Not run)

## Not run: 
create_translation_log(response_data = semi_clean_data,
                       form_schema = odk_schema_data,
                       url = "https://odk.xyz.io/#/projects/project-name/submissions"))

## End(Not run)

Create Validation Log

Description

Create Validation Log

Usage

create_validation_log(data, pkey, rule_set, ...)
create_validation_log(data, pkey, rule_set, ...)

Arguments

`data`	data fame Input data to be validated
`pkey`	character a character vector giving the column name of the primary key or unique row identifier in the data
`rule_set`	a rule set of class validator from the validate package
`...`	other arguments passed to validate::confront

Value

a data frame formatted as a validation log for human review

Detect Language

Description

A function that extracts the top guess of the language of a piece of text.

Usage

detect_language(text)
detect_language(text)

Arguments

text

character any text string

Details

Utilizes the stringi package encoding detector as the means to infer language.

Value

character estimate for language abbreviation

Examples

detect_language(text = "buongiorno")

detect_language(text = "buongiorno")

Download Drop Box Files

Description

Downloads files from dropbox into a given directory

Usage

download_dropbox(dropbox_path, dropbox_filename, download_path, ...)
download_dropbox(dropbox_path, dropbox_filename, download_path, ...)

Arguments

`dropbox_path`	character The formal folder path on dropbox
`dropbox_filename`	character The formal file name on dropbox
`download_path`	character Local file path to download file to
`...`	other arguments passed to rdrop2::drop_download

Value

returns file path if successful

Examples

## Not run: 
   download_dropbox(dropbox_path = "XYZ/Project-Datasets",
   dropbox_filename = "Project dataset as at 01-02-2024.xlsx",
   download_path = here::here("data"),
   overwrite = TRUE)

## End(Not run)

## Not run: 
   download_dropbox(dropbox_path = "XYZ/Project-Datasets",
   dropbox_filename = "Project dataset as at 01-02-2024.xlsx",
   download_path = here::here("data"),
   overwrite = TRUE)

## End(Not run)

Download Google Drive Files

Description

For a given Google Drive folder this function will find and download all files matching a given pattern.

Usage

download_googledrive_files(
  key_path,
  drive_path,
  search_pattern,
  MIME_type = NULL,
  out_path
)
download_googledrive_files(
  key_path,
  drive_path,
  search_pattern,
  MIME_type = NULL,
  out_path
)

Arguments

`key_path`	character path to Google authentication key
`drive_path`	character The Google drive folder path
`search_pattern`	character A search pattern for files in the Google drive
`MIME_type`	character Google Drive file type, file extension, or MIME type.
`out_path`	character The local file directory for files to be downloaded to

Details

Note: This relies on the googledrive::drive_ls() function which uses a search function and is not deterministic when recursively searching. Please pay attention to what is returned.

Value

a character vector of files downloaded

Examples

## Not run: 
  download_googledrive_files(
  key_path = here::here("./key.json"),
  drive_path = "https://drive.google.com/drive/u/0/folders/asdjfnasiffas8ef7y7y89rf",
  search_pattern = ".*\\.xlsx",
  out_path = here::here("data/project_data/")
  )

## End(Not run)

## Not run: 
  download_googledrive_files(
  key_path = here::here("./key.json"),
  drive_path = "https://drive.google.com/drive/u/0/folders/asdjfnasiffas8ef7y7y89rf",
  search_pattern = ".*\\.xlsx",
  out_path = here::here("data/project_data/")
  )

## End(Not run)

Dropbox Upload

Description

Upload a local file to dropbox and handle authentication. Automatically zips files over 300mb by default.

Usage

dropbox_upload(log, file_path, dropbox_path, compress = TRUE)
dropbox_upload(log, file_path, dropbox_path, compress = TRUE)

Arguments

`log`	dataframe. Validation Log for OH cleaning pipelines. Will work with any tabular data.
`file_path`	character. local file path for upload
`dropbox_path`	character. relative dropbox path
`compress`	logical. Should files over 300mb be compressed?

Details

This is a wrapper of rdrop2::drop_upload() which first reads in a local CSV file and then uploads to a DropBox path.

Value

performs drop box upload

Examples

## Not run: 
    dropbox_upload(
    kzn_animal_ship_semiclean,
    file_path = here::here("outputs/data.csv"),
    dropbox_path = "XYZ/Data/semi_clean_data"
    )

## End(Not run)

## Not run: 
    dropbox_upload(
    kzn_animal_ship_semiclean,
    file_path = here::here("outputs/data.csv"),
    dropbox_path = "XYZ/Data/semi_clean_data"
    )

## End(Not run)

Expand Frictionless Metadata with structural metadata

Description

Loops over elements in the structural metadata and adds them to the frictionless metadata schema. Will overwrite existing values and remove any fields from the datapackage metadata not listed in the structural metadata.

Usage

expand_frictionless_metadata(
  structural_metadata,
  resource_name,
  resource_path,
  data_package_path,
  prune_datapackage = TRUE
)
expand_frictionless_metadata(
  structural_metadata,
  resource_name,
  resource_path,
  data_package_path,
  prune_datapackage = TRUE
)

Arguments

`structural_metadata`	Dataframe. Structural metadata from `create_structural_metadata` or `update_structural_metadata`
`resource_name`	Character. Item within the datapackage to be updated
`resource_path`	Character. Path to csv file
`data_package_path`	Character. Path to datapackage.json file
`prune_datapackage`	Logical. Should properties not in the structural metadata be removed?

Value

Updates the datapackage, returns nothing

Examples

## Not run: 

# read in file
data_path <- "my/data.csv"
data <- read.csv(data_path)

# create structural metadata
data_codebook  <- create_structural_metadata(data)

# update structural metadata
write.csv(data_codebook,"my/codebook.csv", row.names = FALSE)

data_codebook_updated <- read.csv("my/codebook.csv")

# create frictionless package - this is done automatically with the
# deposits package
my_package <-
 create_package() |>
 add_resource(resource_name = "data", data = data_path)

 write_package(my_package,"my")

expand_frictionless_metadata(structural_metadata = data_codebook_updated,
                            resource_name = "data",
                            resource_path = data_path,
                            data_package_path = "my/datapackage.json"
                            )


## End(Not run)

## Not run: 

# read in file
data_path <- "my/data.csv"
data <- read.csv(data_path)

# create structural metadata
data_codebook  <- create_structural_metadata(data)

# update structural metadata
write.csv(data_codebook,"my/codebook.csv", row.names = FALSE)

data_codebook_updated <- read.csv("my/codebook.csv")

# create frictionless package - this is done automatically with the
# deposits package
my_package <-
 create_package() |>
 add_resource(resource_name = "data", data = data_path)

 write_package(my_package,"my")

expand_frictionless_metadata(structural_metadata = data_codebook_updated,
                            resource_name = "data",
                            resource_path = data_path,
                            data_package_path = "my/datapackage.json"
                            )


## End(Not run)

Get Dropbox Validation Logs

Description

Downloads existing validation logs that are stored on dropbox

Usage

get_dropbox_val_logs(file_name, folder, path_name)
get_dropbox_val_logs(file_name, folder, path_name)

Arguments

`file_name`	character file name with extension of the validation log. Note that file may have been zipped on upload if its over 300mb. This file will be automatically unzipped on download so provide the file extenstion for the compressed file, not the zipped file. E.g. "val_log.csv" even if on dropbox its stored as "val_log.zip".
`folder`	character the folder the log is saved in on drop box. Can be NULL if not in subfolder.
`path_name`	character the default drop box path

Details

This function will check if the log exists and return NULL if not. Else it will locally download the file to 'dropbox_validations' directory and read in to the session.

Value

tibble a Validation Log

Examples

## Not run: 
 get_dropbox_val_logs(file_name = "log.csv", folder = NULL)

## End(Not run)

## Not run: 
 get_dropbox_val_logs(file_name = "log.csv", folder = NULL)

## End(Not run)

Get ODK Questionnaire Schema Info

Description

This function handles the authentication and pulling of questionnaire form schema information.

Usage

get_odk_form_schema(
  url,
  un = Sys.getenv("ODK_USERNAME"),
  pw = Sys.getenv("ODK_PASSWORD"),
  odkc_version = Sys.getenv("ODKC_VERSION")
)
get_odk_form_schema(
  url,
  un = Sys.getenv("ODK_USERNAME"),
  pw = Sys.getenv("ODK_PASSWORD"),
  odkc_version = Sys.getenv("ODKC_VERSION")
)

Arguments

`url`	character The survey URL
`un`	character The ODK account username
`pw`	character The ODK account password
`odkc_version`	character The ODKC Version string

Details

This is a wrapper around the ruODK package. It handles the setup and authentication. See https://github.com/ropensci/ruODK

Value

data frame of survey responses

Examples

## Not run: 
    get_odk_form_schema(url ="https://odk.xyz.io/v1/projects/5/forms/survey.svc",
    un = Sys.getenv("ODK_USERNAME"),
    pw = Sys.getenv("ODK_PASSWORD"),
    odkc_version = Sys.getenv("ODKC_VERSION"))

## End(Not run)

## Not run: 
    get_odk_form_schema(url ="https://odk.xyz.io/v1/projects/5/forms/survey.svc",
    un = Sys.getenv("ODK_USERNAME"),
    pw = Sys.getenv("ODK_PASSWORD"),
    odkc_version = Sys.getenv("ODKC_VERSION"))

## End(Not run)

Get ODK Questionnaire Response Data

Description

This function handles the authentication and pulling of responses data for ODK Questionnaires. The raw return list is 'rectangularized' into a data frame first. See the ruODK package for more info on how this happens.

Usage

get_odk_responses(
  url,
  un = Sys.getenv("ODK_USERNAME"),
  pw = Sys.getenv("ODK_PASSWORD"),
  odkc_version = Sys.getenv("ODKC_VERSION")
)
get_odk_responses(
  url,
  un = Sys.getenv("ODK_USERNAME"),
  pw = Sys.getenv("ODK_PASSWORD"),
  odkc_version = Sys.getenv("ODKC_VERSION")
)

Arguments

`url`	character The survey URL
`un`	character The ODK account username
`pw`	character The ODK account password
`odkc_version`	character The ODK version

Details

This is a wrapper around the ruODK package. It handles the setup and authentication. See https://github.com/ropensci/ruODK

Value

data.frame of flattened survey responses

Examples

## Not run: 
    get_odk_responses(url ="https://odk.xyz.io/v1/projects/5/forms/survey.svc",
    un = Sys.getenv("ODK_USERNAME"),
    pw = Sys.getenv("ODK_PASSWORD"),
    odkc_version = Sys.getenv("ODKC_VERSION"))

## End(Not run)
## Not run: 
    get_odk_responses(url ="https://odk.xyz.io/v1/projects/5/forms/survey.svc",
    un = Sys.getenv("ODK_USERNAME"),
    pw = Sys.getenv("ODK_PASSWORD"),
    odkc_version = Sys.getenv("ODKC_VERSION"))

## End(Not run)

Get Precision

Description

Get Precision

Usage

get_precision(x, func = c, ...)
get_precision(x, func = c, ...)

Arguments

`x`	Numeric. Vector of gps points
`func`	Function. Apply some function to the vector of precisions. Default is c so that all values are returned
`...`	Additional arguments to pass to func.

Value

output of func - likely a vector

Author(s)

Nathan Layman

Examples


x <- c(1,100,1.11)
get_precision(x,func = min)


x <- c(1,100,1.11)
get_precision(x,func = min)

Get Species Letter

Description

This function maps the relationship between animal species and hum_anim_id codes. This is for use in id_checker()

Usage

get_species_letter(
  species = c("human", "cattle", "small_mammal", "sheep", "goat")
)
get_species_letter(
  species = c("human", "cattle", "small_mammal", "sheep", "goat")
)

Arguments

species

character The species identifier. See argument options

Value

character The hum_anim_id code

Guess the column type

Description

uses column class to set readr column type

Usage

guess_col_type(data, default_col_abv = "c")
guess_col_type(data, default_col_abv = "c")

Arguments

`data`	data.frame Data who column types you would like to guess
`default_col_abv`	string. Column type abbreviation from `readr::cols()`. Use "g" to guess the column type.

Value

character vector of column abbreviations

Examples

data <- data.frame(time = Sys.time(),
char = "hello", num = 1, log = TRUE,
date = Sys.Date(), list_col = list("hello") )

guess_col_type(data)

## change default value of default column abbreviation

guess_col_type(data, default_col_abv = "g")


data <- data.frame(time = Sys.time(),
char = "hello", num = 1, log = TRUE,
date = Sys.Date(), list_col = list("hello") )

guess_col_type(data)

## change default value of default column abbreviation

guess_col_type(data, default_col_abv = "g")

ID Checker

Description

General function for checking and correcting ID columns.

Usage

id_checker(col, type = c("animal", "hum_anim", "site"), ...)
id_checker(col, type = c("animal", "hum_anim", "site"), ...)

Arguments

`col`	The vector of ID's to be checked
`type`	The ID type, see argument options for allowable settings
`...`	other function arguments passed to get_species_letter

Details

In order to use the autobot process for correcting ID columns, a new 'corrected' column is created by the user using the id_checker() function. It will take an existing vector of ID's, and an ID type (animal, mosquito, etc) and apply the bespoke corrections. This can then be consumed by the autobot log.

Value

vector of corrected ID's

Examples

## Not run: 
# with a species identifier
    data |> mutate(animal_id_new = id_checker(animal_id, type = "animal", species = "cattle"))
    data |> mutate(farm_id_new = id_checker(farm_id, type = "site"))
    
## End(Not run)
## Not run: 
# with a species identifier
    data |> mutate(animal_id_new = id_checker(animal_id, type = "animal", species = "cattle"))
    data |> mutate(farm_id_new = id_checker(farm_id, type = "site"))
    
## End(Not run)

Make the URLs for the reports

Description

Several HTML reports are emailed via an automated process. To do this a secure URL is to be generated as a download link. This function is to be used in an opinionated targets pipeline.

Usage

make_report_urls(aws_deploy_target, pattern = "")
make_report_urls(aws_deploy_target, pattern = "")

Arguments

`aws_deploy_target`	List. Output from aws_s3_upload
`pattern`	String. Regex pattern for matching file paths

Value

character URL for report

Author(s)

Collin Schwantes

Get make a zip file path

Description

Take a file path, remove the extension, replace the extension with .zip

Usage

make_zip_path(file_path)
make_zip_path(file_path)

Arguments

file_path

character.

Value

character. String where extension is replaced by zip

Examples


file_path <- "hello.csv"
make_zip_path(file_path)

file_path_with_dir <- "foo/bar/hello.csv"
make_zip_path(file_path_with_dir)

file_path <- "hello.csv"
make_zip_path(file_path)

file_path_with_dir <- "foo/bar/hello.csv"
make_zip_path(file_path_with_dir)

Obfuscate GPS

Description

This function fuzzes gps points by first adding error then rounding to a certain number of digits.

Usage

obfuscate_gps(
  x,
  precision = 2,
  fuzz = 0.125,
  type = c("lat", "lon"),
  func = min,
  ...
)

obfuscate_lat(x, precision = 2, fuzz = 0.125)

obfuscate_lon(x, precision = 2, fuzz = 0.125)
obfuscate_gps(
  x,
  precision = 2,
  fuzz = 0.125,
  type = c("lat", "lon"),
  func = min,
  ...
)

obfuscate_lat(x, precision = 2, fuzz = 0.125)

obfuscate_lon(x, precision = 2, fuzz = 0.125)

Arguments

`x`	Numeric. Vector of gps points
`precision`	Integer. Number of digits to keep. See `round` for more details
`fuzz`	Numeric. Positive number indicating how much error to introduce to the gps measurements. This is used to generate the random uniform distribution `runif(1,min = -fuzz, max = fuzz)`
`type`	Character. One of "lat" or "lon"
`func`	Function. Function used in `get_precision`
`...`	Additional arguments for func.

Value

Numeric. A vector of fuzzed and rounded GPS points

Numeric vector

Examples


# make data
gps_data  <- data.frame(lat = c(1.0001, 10.22223, 4.00588),
                        lon = c(2.39595, 4.506930, -60.09999901))

# Default obfuscation settings correspont to roughly a 27 by 27 km area
gps_data$lat |>
  obfuscate_gps(type = "lat")

# Obfuscation can be made more or less precise by changing the number of
# decimal points included or modifying the amount of fuzz (error)
# introduced
gps_data$lon |>
  obfuscate_gps(precision = 4, fuzz = 0.002, type = "lon")

### working at the poles
gps_data_poles  <- data.frame(lat = c(89.0001, 89.22223, -89.8881),
                              lon = c(2.39595, 4.506930, -60.09999901))


gps_data_poles$lat |>
  obfuscate_gps(fuzz = 1, type = "lat")


### working at the 180th meridian
gps_data_180  <- data.frame(lat = c(2, 3, 4),
                            lon = c(179.39595, -179.506930, -178.09999901))
gps_data_180$lon |>
  obfuscate_gps(fuzz = 1, type = "lon")

### working NA GPS data
gps_data_180  <- data.frame(lat = c(2, 3, 4),
                            lon = c(179.39595, NA, -178.09999901))
gps_data_180$lon |>
  obfuscate_gps(fuzz = 1, type = "lon")

### GPS is on the fritz!
## Not run: 
gps_data_fritz <- data.frame(lat = c(91, -91, 90),
                             lon = c(181.0001, -181.9877, -178.09999901))
gps_data_fritz$lon |>
  obfuscate_gps(fuzz = 1, type = "lon")

gps_data_fritz$lat |>
  obfuscate_gps(fuzz = 1, type = "lat")

## End(Not run)

# make data
gps_data  <- data.frame(lat = c(1.0001, 10.22223, 4.00588),
                        lon = c(2.39595, 4.506930, -60.09999901))

# Default obfuscation settings correspont to roughly a 27 by 27 km area
gps_data$lat |>
  obfuscate_gps(type = "lat")

# Obfuscation can be made more or less precise by changing the number of
# decimal points included or modifying the amount of fuzz (error)
# introduced
gps_data$lon |>
  obfuscate_gps(precision = 4, fuzz = 0.002, type = "lon")

### working at the poles
gps_data_poles  <- data.frame(lat = c(89.0001, 89.22223, -89.8881),
                              lon = c(2.39595, 4.506930, -60.09999901))


gps_data_poles$lat |>
  obfuscate_gps(fuzz = 1, type = "lat")


### working at the 180th meridian
gps_data_180  <- data.frame(lat = c(2, 3, 4),
                            lon = c(179.39595, -179.506930, -178.09999901))
gps_data_180$lon |>
  obfuscate_gps(fuzz = 1, type = "lon")

### working NA GPS data
gps_data_180  <- data.frame(lat = c(2, 3, 4),
                            lon = c(179.39595, NA, -178.09999901))
gps_data_180$lon |>
  obfuscate_gps(fuzz = 1, type = "lon")

### GPS is on the fritz!
## Not run: 
gps_data_fritz <- data.frame(lat = c(91, -91, 90),
                             lon = c(181.0001, -181.9877, -178.09999901))
gps_data_fritz$lon |>
  obfuscate_gps(fuzz = 1, type = "lon")

gps_data_fritz$lat |>
  obfuscate_gps(fuzz = 1, type = "lat")

## End(Not run)

Look-up table for 'Other' questions

Description

Provides a look up table matching ODK survey questions with their free text response question.

Usage

othertext_lookup(questionnaire = c("animal_owner"))
othertext_lookup(questionnaire = c("animal_owner"))

Arguments

questionnaire

The ODK questionnaire. Used to ensure the correct look up table is found.

Details

In many ODK surveys, a multiple choice question can have a response for 'other' where the respondent can add free text as a response. There is no consistent link in the response data to match the captured responses and the other free-text collected. This function provides a manual look up reference so free text responses can be compared to the original questions in the validation workflow.

This function can be expanded by providing a tibble with two columns: name and other_name which maps the question name in ODK to the question name containing 'other' or 'free text'.

Value

tibble

Examples

othertext_lookup(questionnaire = c("animal_owner"))


othertext_lookup(questionnaire = c("animal_owner"))

Prune field properties in a data package

Description

method to remove properties from the metadata for a dataset in a datapackage

Usage

prune_datapackage(my_data_schema, structural_metadata)
prune_datapackage(my_data_schema, structural_metadata)

Arguments

`my_data_schema`	list. schema object from frictionless
`structural_metadata`	dataframe. structural metadata for a dataset

Value

pruned data_schema -

Reads all tabs from an excel workbook

Description

For a given excel file, this will detect all sheets, and iteratively read all sheets and place them in a list.

If primary keys are added, the primary key is the triplet of the file, sheet name, and row number e.g. "file_xlsx_sheet1_1". Row numbering is based on the data ingested into R. R automatically skips empty rows at the beginning of the spreadsheet so id 1 in the primary key will belong to the first row with data.

Usage

read_excel_all_sheets(
  file,
  add_primary_key_field = FALSE,
  primary_key = "primary_key"
)
read_excel_all_sheets(
  file,
  add_primary_key_field = FALSE,
  primary_key = "primary_key"
)

Arguments

`file`	character. File path to an excel file
`add_primary_key_field`	Logical. Should a primary key field be added?
`primary_key`	character. The column name for the unique identifier to be added to the data.

Value

list

Note

The primary key method is possible because Excel forces sheet names to be unique.

Examples

 ## Not run: 
# Adding primary key field
read_excel_all_sheet(file = "test_pk.xlsx",add_primary_key_field = TRUE)

# Don't add primary key field
read_excel_all_sheet(file = "test_pk.xlsx")

    
## End(Not run)

## Not run: 
# Adding primary key field
read_excel_all_sheet(file = "test_pk.xlsx",add_primary_key_field = TRUE)

# Don't add primary key field
read_excel_all_sheet(file = "test_pk.xlsx")

    
## End(Not run)

Read Google Sheets Data

Description

For a given sheet id, this handles authentication and reads in a specified sheet, or all sheets.

Usage

read_googlesheets(
  key_path,
  sheet = "all",
  ss,
  add_primary_key_field = FALSE,
  primary_key = "primary_key",
  ...
)
read_googlesheets(
  key_path,
  sheet = "all",
  ss,
  add_primary_key_field = FALSE,
  primary_key = "primary_key",
  ...
)

Arguments

`key_path`	character path to Google authentication key json file
`sheet`	Sheet to read, in the sense of "worksheet" or "tab".
`ss`	Something that identifies a Google Sheet such as drive id or URL
`add_primary_key_field`	Logical. Should a primary key field be added?
`primary_key`	character. The column name for the unique identifier to be added to the data.
`...`	other arguments passed to `googlesheets4::range_read()`

Value

tibble

Examples

## Not run: 
read_googlesheets(ss = kzn_animal_ship_sheets, sheet = "all",)

## End(Not run)

## Not run: 
read_googlesheets(ss = kzn_animal_ship_sheets, sheet = "all",)

## End(Not run)

Utility function to identify records for deletion

Description

Filters for records matching a given string.

Usage

remove_deletions(x, val = "Delete")
remove_deletions(x, val = "Delete")

Arguments

`x`	input vector
`val`	The value to check for inequality. Defaults to 'Delete'

Details

To be used within dplyr::filter(). The function returns a logical vector with TRUE resulting from values that are not equal to the val argument. Also protects from NA values.

Used within verbs such as tidyselect::all_of() this can work effectively across all columns in a data frame. See examples

Value

logical vector

Examples

## Not run: 
data |> filter(if_all(everything(), remove_deletions))

## End(Not run)
## Not run: 
data |> filter(if_all(everything(), remove_deletions))

## End(Not run)

Get items that differ between x and y

Description

Unlike setdiff, this function creates the union of x and y then removes values that are in the intersect, providing values that are unique to X and values that are unique to Y.

Usage

set_diff(x, y)
set_diff(x, y)

Arguments

`x`	a set of values.
`y`	a set of values.

Value

Unique values from X and Y, NULL if no unique values.

Examples

a <- 1:3
b <- 2:4

set_diff(a,b)
# returns 1,4

x <- 1:3
y <- 1:3

set_diff(x,y)
# returns NULL

a <- 1:3
b <- 2:4

set_diff(a,b)
# returns 1,4

x <- 1:3
y <- 1:3

set_diff(x,y)
# returns NULL

Update descriptive metadata in frictionless datapackage

Description

This function overwrites the descriptive metadata associated with a frictionless datapackage. It does NOT validate the metadata, or check for conflicts with existing descriptive metadata. It is very easy to create invalid metadata.

Usage

update_frictionless_metadata(descriptive_metadata, data_package_path)
update_frictionless_metadata(descriptive_metadata, data_package_path)

Arguments

`descriptive_metadata`	List of descriptive metadata terms.
`data_package_path`	Character. Path to datapackage.json file

Value

invisibly writes datapackage.json

Examples

## Not run: 
descriptive_metadata <- list (
title = "Example Dataset",
description = "This is the abstract but it needs more detail",
creator = list (list (name = "A. Person"), list (name = "B. Person"),
list (name = "C. Person"),list (name = "F. Person"))
# , accessRights = "open"
)
update_frictionless_metadata(descriptive_metadata = descriptive_metadata,
                             data_package_path = "data_examples/datapackage.json"
)

## End(Not run)
## Not run: 
descriptive_metadata <- list (
title = "Example Dataset",
description = "This is the abstract but it needs more detail",
creator = list (list (name = "A. Person"), list (name = "B. Person"),
list (name = "C. Person"),list (name = "F. Person"))
# , accessRights = "open"
)
update_frictionless_metadata(descriptive_metadata = descriptive_metadata,
                             data_package_path = "data_examples/datapackage.json"
)

## End(Not run)

Update structural metadata

Description

Appends rows and/or columns to existing metadata, change primary key and/or adds foreign keys.

Usage

update_structural_metadata(
  data,
  metadata,
  primary_key = "",
  foreign_key = "",
  additional_elements = tibble::tibble()
)
update_structural_metadata(
  data,
  metadata,
  primary_key = "",
  foreign_key = "",
  additional_elements = tibble::tibble()
)

Arguments

`data`	Any named object. Expects a table but will work superficially with lists or named vectors.
`metadata`	Data frame. Output from `create_structural_metadata`
`primary_key`	Character. OPTIONAL Primary key in the data
`foreign_key`	Character. OPTIONAL Foreign key or keys in the data
`additional_elements`	data frame. OPTIONAL Empty tibble with structural metadata elements and their types.

Value

data.frame

Note

See vignette on metadata for examples

Validation Correction Checks

Description

Validation correction tests to be run on data before and after validation to test expectations.

Usage

validation_checks(validation_log, before_data, after_data, idcol)
validation_checks(validation_log, before_data, after_data, idcol)

Arguments

`validation_log`	tibble Validation log
`before_data`	tibble Data before corrections
`after_data`	tibble Data after corrections
`idcol`	character the primary key for the 'after_data'

Details

As part of the OH cleaning pipelines, raw data is converted to 'semi-clean' data through a process of upserting records from an external Validation Log. To ensure these corrections were made as expected, some checks are performed in this function.

If no existing log exists > no changes are make to data
- Same variables
- same Rows
- No unequal values
If log exists but no changes are recommended > no changes to data.
- Same variables
- same Rows
- No unequal values
Log exists and changes recommended > number of changes are same as log
- Same variables
- same Rows
- Number of changing records in data match records in log
Correct fields and records are being updated
- Checks before and after variables and rows are the same
- Checks the variable names and row indexes are the same in the logs and the changed data.

Value

NULL if passed or stops with error

Examples

## Not run: 
    validation_checks(
    validation_log = kzn_animal_ship_existing_log,
    before_data = kzn_animal_ship,
    after_data = kzn_animal_ship_semiclean,
    idcol = "animal_id"
    )

## End(Not run)

## Not run: 
    validation_checks(
    validation_log = kzn_animal_ship_existing_log,
    before_data = kzn_animal_ship,
    after_data = kzn_animal_ship_semiclean,
    idcol = "animal_id"
    )

## End(Not run)

Package 'ohcleandat'

Help Index

Autobot Function

Description

Usage

Arguments

Value

Check existence of ID columns across two tables

Description

Usage

Arguments

Value

See Also

Examples

Class to Column Type lookup table

Description

Usage

Format

class_to_col_type

Details

See Also

Combine Validation Logs

Description

Usage

Arguments

Value

Correct data using validation log

Description

Usage

Arguments

Value

Create Free Text Log

Description

Usage

Arguments

Details

Value

See Also

Examples

Create Validation Log for Questionnaire data

Description

Usage

Arguments

Value

Create a "rules" file from a template

Description

Usage

Arguments

Value

Examples

Create Structural Metadata from a dataframe

Description

Usage

Arguments

Details

Value

Examples

Create Translation Log

Description

Usage

Arguments

Value

Examples

Create Validation Log

Description

Usage

Arguments

Value

Detect Language

Description

Usage

Arguments

Details

Value

See Also

Examples

Download Drop Box Files

Description

Usage

Arguments

`class_to_col_type`