Skip to contents

A container describing the datasets associated with a project

Details

This helper helper class can be used to:

  1. Utilities – Dataset-related utility methods, e.g. confirming access permissions, collecting the newest refresh tables etc

  2. Parameterisation – General interface to parameterise a set of datasets to be used by a user-defined function and/or method

  3. Validation – Can be used by user-defined operations to validate the inclusion of projects

  4. Documentation – An inline documentation of the datasets associated with a project

Several options are available to modify this instance's behaviour that can either be modified globally or on a per call basis:

  1. SAILDB.QUIET: Determines whether the DatasetContainer methods will log debug information

  2. SAILDB.NO.WARN: Determines whether warnings will be logged to the console

  3. SAILDB.THROW.ERRORS: Specifies whether the current thread should be halted when an error is encountered; you are expected to wrap your saildb::Connection calls with an error handler if you deactivate this option

Public fields

reference

(list)
A private reference to the datasets contained by this instance

Active bindings

reference

(list)
A private reference to the datasets contained by this instance

datasets

(character|NA)
A read-only field describing the datasets contained by this instance

Methods


Method new()

Initialise a new DatasetContainer

Usage

Arguments

...

A set of datasets to be contained by this instance, all arguments should be named unless given as a DatasetContainer$ref. See the $intliaze Example section

Returns

A new DatasetContainer instance

Examples

\dontrun{
# Initialise the container...
datasets = DatasetContainer$new(
  # Some reference table unknown to \code{SAILDB.METADATA}; character string references must be a named argument and can't be a reserved name - see \code{DatasetContainer$is.reserved}
  some.table               = 'SAILREFRV.SOME_TABLE',
  # Some project table unknown \code{SAILDB.METADATA}; refresh dates should be appended to the name as usual - these can be refreshed using the \code{DatasetContainer$pull.refresh} method
  other.table              = 'SAILXXXV.OTHER_TABLE_20240905',
  # Include some reference table, no schema is needed since this is retrieved from SAILREFRV. Note that the argument name _must_ match \code{DatasetContainer$ref}'s \code{$ref} property if you are using a named argument
  sailref.wimd2019.sm.area = DatasetContainer$ref('sailref.wimd2019.sm.area'),
  # Include some reference table from a project schema; and, as above, the argument name must match the reference name
  adde.deaths              = DatasetContainer$ref('adde.deaths', 'SAILXXXV'),
  # Include a specific refresh of some reference table from a project schema; and, as above, the argument name must match
  abde.births              = DatasetContainer$ref('abde.births', 'SAILXXXV', date='20240905'),
  DatasetContainer$ref('wdsd.pers', 'SAILXXXV', '20240905')
)
}


Method get()

Get the table reference name of the dataset in the shape of [SCHEMA].[TABLE]

Usage

DatasetContainer$get(dataset)

Arguments

dataset

(character)
The name of the dataset to select

Returns

Either (a) a character string representing the table reference if contained by this instance; otherwise (b) an NA value


Method set()

Sets the key-value pair for the given dataset

Usage

DatasetContainer$set(dataset, reference)

Arguments

dataset

(character)
The name of the dataset to update

reference

(character|DatasetContainer$ref)
The table reference in the shape of [SCHEMA].[TABLE], or a reference structure generated by the DatasetContainer$ref static method

Returns

The reference to this instance which can be used for chaining, e.g. dataset$set('gp.event', 'schema.table')$set('pedw.spell', 'schema.table')


Method retrieve()

Retrieves information relating to the specified dataset(s)

Usage

DatasetContainer$retrieve(
  dataset,
  properties = NA,
  stop.on.error = getOption("SAILDB.THROW.ERRORS", TRUE),
  suppress.warnings = getOption("SAILDB.NO.WARN", FALSE)
)

Arguments

dataset

(character|vector|list)
The name of the dataset, or a vector/list of dataset names

properties

(list|vector)
An optional flat list or vector of characters specifying which properties to retrieve, see the DatasetContainer$get example section for more information

stop.on.error

(logical)
Whether to return a FALSE logical when an error is encountered instead of stopping the execution of the parent thread; defaults to option(SAILDB.THROW.ERRORS=TRUE)

suppress.warnings

(logical)
Whether to suppress warnings; defaults to option(SAILDB.NO.WARN=FALSE)

Details

KeyParentDetails
dateTop-levelThe selected table date
schemaTop-levelThe dataset schema
relationTop-levelThe dataset type
referenceTop-levelThe table [SCHEMA].[TABLE] reference
datasetTop-levelThe dataset object
ref$datasetThe key used to contain this reference
alt$datasetWhether this dataset prepends a prefix to its table name
tag$datasetHow dates are appended to the table name
name$datasetThe name of the dataset
origin$datasetWhere this dataset originated
static$datasetWhether this dataset is copied to a project, or is hosted in its origin schema

Returns

Either:

  1. If the dataset is present and no properties provided – a list describing the dataset (or a list of lists if multiple datasets were selected)

  2. If the dataset is present with the given properties – a key-value pair list containing the information requested (or a list of lists if multiple datasets were selected)

  3. If no dataset by this name is present – a NA value


Method require.datasets()

Tests whether the given datasets have been defined within the DatasetContainer

Usage

DatasetContainer$require.datasets(datasets, assert = TRUE)

Arguments

datasets

(character|list|vector)
A scalar character describing a single dataset, or a list/vector of characters describing which datasets should be refreshed. If the specified dataset element is named, e.g. list(abde.births='ABDE_BIRTHS_20240630'), this method will test both that the abde.births is present & that it refers to the ABDE_BIRTHS_20240630 table. Note that including a schema, e.g. SAIL####V.ABDE_BIRTHS_20240630, will test both the schema and the dataset name.

assert

(logical|NA)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults to TRUE

Returns

A logical describing whether the datasets are contained by this instance


Method require.like()

Tests whether a dataset's reference refers to a specific relation type

Usage

DatasetContainer$require.like(datasets, relation, assert = TRUE)

Arguments

datasets

(character|list|vector)
The name of the dataset, or a list/vector of characters, specifying dataset(s) to test

relation

(character)
A scalar character a relation comparator(s) which can be one of: BASE, REFERENCE, SESSION, PROJECT, WORKSPACE, ENCRYPTION or UNKNOWN - see SAILDB.DEF$DREF.RELATION for more details

assert

(logical|NA)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults to TRUE

Details

TypeSchema regexDetails
BASE^BASEA base schema from which datasets are cloned
REFERENCE^SAIL[A-Z][\w]+$A dataset reference schema containing the datasets
SESSION^SESSION$A global temporary table for this session
PROJECT^SAIL\d+V$A project table cloned from BASE, REFERENCE or ENCRYPTION
WORKSPACE^SAILW\d+V$A project workspace containing user-defined tables
ENCRYPTION^SAILX\d+V$A project-level encryption table

Returns

A logical describing whether the dataset is


Method require.access()

Tests whether the client has the privileges to interface with the datasets defined within this container.

Note that 'privileges' could describe one of the following privileges: SELECT, UPDATE, INSERT etc. Please see IBM's documentation on user privileges here for more information, or see SAILDB.DEF$PRIVILEGES

Usage

DatasetContainer$require.access(db, privileges, assert = TRUE)

Arguments

db

(saildb::Connection)
An active, valid saildb::Connection database connection

privileges

(list)
A named list in which the key describes the dataset, and where the value describes which privileges are required, e.g. list(gp.events=c('INSERT', 'SELECT'), pedw.spell='ALL'). Note that ALL expands to all privileges described by SAILDB.DEF$PRIVILEGES

assert

(logical|NA)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults to TRUE

Returns

A logical reflecting whether the client has the privileges to interface with the datasets


Method require.shape()

Tests whether the given dataset(s) meet the selected criteria by ensuring each of its columns and their datatypes match those specified

Usage

DatasetContainer$require.shape(db, criteria, assert = TRUE)

Arguments

db

(saildb::Connection)
An active, valid saildb::Connection database connection

criteria

(list)
A list of column criteria for one or more dataset(s), e.g. list(gp.event=list(ALF_PE='BIGINTEGER')), in which each key of each inner list describes of the column name(s) and the value(s) specify the required column's type. Do note that types are fuzzy matched to exclude matching on size unless explicitly specified, e.g. list(some.table=list(SOME_TEXT_COL='VARCHAR')) v.s. list(some.table=list(SOME_TEXT_COL='VARCHAR(200)')). Please do note that names, columns and types are case sensitive.

assert

(logical|NA)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults to TRUE

Returns

A logical reflecting whether the criteria has been met


Method [()

Subscript operator overload to retrieve one or more table references derived from the indexed dataset(s)

Usage

DatasetContainer$[(datasets)

Arguments

datasets

(character|vector)
The datasets to index

Returns

The dataset references, if contained by this instance

Examples

\dontrun{
some.container[c('gp.event', 'other.table')]
}


Method [[()

Subscript operator overload to retrieve one or more table references derived from the indexed dataset(s)

Usage

DatasetContainer$[[(dataset)

Arguments

dataset

(character)
The dataset to index

Returns

A dataset reference, if contained by this instance

Examples

\dontrun{
some.container[['gp.event']]
}


Method length()

Compute the length of the datasets contained by this instance

Usage

DatasetContainer$length(...)

Arguments

...

Optional varargs from the length call

Returns

An integer describing the number of datasets referenced by this instance


Method clone()

The objects of this class are cloneable with this method.

Usage

DatasetContainer$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `DatasetContainer$new`
## ------------------------------------------------

if (FALSE) { # \dontrun{
# Initialise the container...
datasets = DatasetContainer$new(
  # Some reference table unknown to \code{SAILDB.METADATA}; character string references must be a named argument and can't be a reserved name - see \code{DatasetContainer$is.reserved}
  some.table               = 'SAILREFRV.SOME_TABLE',
  # Some project table unknown \code{SAILDB.METADATA}; refresh dates should be appended to the name as usual - these can be refreshed using the \code{DatasetContainer$pull.refresh} method
  other.table              = 'SAILXXXV.OTHER_TABLE_20240905',
  # Include some reference table, no schema is needed since this is retrieved from SAILREFRV. Note that the argument name _must_ match \code{DatasetContainer$ref}'s \code{$ref} property if you are using a named argument
  sailref.wimd2019.sm.area = DatasetContainer$ref('sailref.wimd2019.sm.area'),
  # Include some reference table from a project schema; and, as above, the argument name must match the reference name
  adde.deaths              = DatasetContainer$ref('adde.deaths', 'SAILXXXV'),
  # Include a specific refresh of some reference table from a project schema; and, as above, the argument name must match
  abde.births              = DatasetContainer$ref('abde.births', 'SAILXXXV', date='20240905'),
  DatasetContainer$ref('wdsd.pers', 'SAILXXXV', '20240905')
)
} # }


## ------------------------------------------------
## Method `DatasetContainer$[`
## ------------------------------------------------

if (FALSE) { # \dontrun{
some.container[c('gp.event', 'other.table')]
} # }


## ------------------------------------------------
## Method `DatasetContainer$[[`
## ------------------------------------------------

if (FALSE) { # \dontrun{
some.container[['gp.event']]
} # }