DatasetContainer
SAIL-DatasetContainer.Rd
A container describing the datasets associated with a project
Details
This helper helper class can be used to:
Utilities – Dataset-related utility methods, e.g. confirming access permissions, collecting the newest refresh tables etc
Parameterisation – General interface to parameterise a set of datasets to be used by a user-defined function and/or method
Validation – Can be used by user-defined operations to validate the inclusion of projects
Documentation – An inline documentation of the datasets associated with a project
Several options are available to modify this instance's behaviour that can either be modified globally or on a per call basis:
SAILDB.QUIET: Determines whether the
DatasetContainer
methods will log debug informationSAILDB.NO.WARN: Determines whether warnings will be logged to the console
SAILDB.THROW.ERRORS: Specifies whether the current thread should be halted when an error is encountered; you are expected to wrap your
saildb::Connection
calls with an error handler if you deactivate this option
Active bindings
reference
(
list
)
A private reference to the datasets contained by this instancedatasets
(
character|NA
)
A read-only field describing the datasets contained by this instance
Methods
Method new()
Initialise a new DatasetContainer
Usage
DatasetContainer$new(...)
Arguments
...
A set of datasets to be contained by this instance, all arguments should be named unless given as a
DatasetContainer$ref
. See the$intliaze
Example section
Examples
\dontrun{
# Initialise the container...
datasets = DatasetContainer$new(
# Some reference table unknown to \code{SAILDB.METADATA}; character string references must be a named argument and can't be a reserved name - see \code{DatasetContainer$is.reserved}
some.table = 'SAILREFRV.SOME_TABLE',
# Some project table unknown \code{SAILDB.METADATA}; refresh dates should be appended to the name as usual - these can be refreshed using the \code{DatasetContainer$pull.refresh} method
other.table = 'SAILXXXV.OTHER_TABLE_20240905',
# Include some reference table, no schema is needed since this is retrieved from SAILREFRV. Note that the argument name _must_ match \code{DatasetContainer$ref}'s \code{$ref} property if you are using a named argument
sailref.wimd2019.sm.area = DatasetContainer$ref('sailref.wimd2019.sm.area'),
# Include some reference table from a project schema; and, as above, the argument name must match the reference name
adde.deaths = DatasetContainer$ref('adde.deaths', 'SAILXXXV'),
# Include a specific refresh of some reference table from a project schema; and, as above, the argument name must match
abde.births = DatasetContainer$ref('abde.births', 'SAILXXXV', date='20240905'),
DatasetContainer$ref('wdsd.pers', 'SAILXXXV', '20240905')
)
}
Method get()
Get the table reference name of the dataset in the shape of [SCHEMA].[TABLE]
Method set()
Sets the key-value pair for the given dataset
Method retrieve()
Retrieves information relating to the specified dataset(s)
Arguments
dataset
(
character|vector|list
)
The name of the dataset, or a vector/list of dataset namesproperties
(
list|vector
)
An optional flat list or vector of characters specifying which properties to retrieve, see theDatasetContainer$get
example section for more informationstop.on.error
(
logical
)
Whether to return aFALSE
logical when an error is encountered instead of stopping the execution of the parent thread; defaults tooption(SAILDB.THROW.ERRORS=TRUE)
suppress.warnings
(
logical
)
Whether to suppress warnings; defaults tooption(SAILDB.NO.WARN=FALSE)
Details
Key | Parent | Details |
date | Top-level | The selected table date |
schema | Top-level | The dataset schema |
relation | Top-level | The dataset type |
reference | Top-level | The table [SCHEMA].[TABLE] reference |
dataset | Top-level | The dataset object |
ref | $dataset | The key used to contain this reference |
alt | $dataset | Whether this dataset prepends a prefix to its table name |
tag | $dataset | How dates are appended to the table name |
name | $dataset | The name of the dataset |
origin | $dataset | Where this dataset originated |
static | $dataset | Whether this dataset is copied to a project, or is hosted in its origin schema |
Returns
Either:
If the dataset is present and no properties provided – a list describing the dataset (or a list of lists if multiple datasets were selected)
If the dataset is present with the given properties – a key-value pair list containing the information requested (or a list of lists if multiple datasets were selected)
If no dataset by this name is present – a
NA
value
Method require.datasets()
Tests whether the given datasets have been defined within the DatasetContainer
Arguments
datasets
(
character|list|vector
)
A scalar character describing a single dataset, or a list/vector of characters describing which datasets should be refreshed. If the specified dataset element is named, e.g.list(abde.births='ABDE_BIRTHS_20240630')
, this method will test both that theabde.births
is present & that it refers to theABDE_BIRTHS_20240630
table. Note that including a schema, e.g.SAIL####V.ABDE_BIRTHS_20240630
, will test both the schema and the dataset name.assert
(
logical|NA
)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults toTRUE
Method require.like()
Tests whether a dataset's reference refers to a specific relation type
Arguments
datasets
(
character|list|vector
)
The name of the dataset, or a list/vector of characters, specifying dataset(s) to testrelation
(
character
)
A scalar character a relation comparator(s) which can be one of:BASE
,REFERENCE
,SESSION
,PROJECT
,WORKSPACE
,ENCRYPTION
orUNKNOWN
- seeSAILDB.DEF$DREF.RELATION
for more detailsassert
(
logical|NA
)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults toTRUE
Details
Type | Schema regex | Details |
BASE | ^BASE | A base schema from which datasets are cloned |
REFERENCE | ^SAIL[A-Z][\w]+$ | A dataset reference schema containing the datasets |
SESSION | ^SESSION$ | A global temporary table for this session |
PROJECT | ^SAIL\d+V$ | A project table cloned from BASE , REFERENCE or ENCRYPTION |
WORKSPACE | ^SAILW\d+V$ | A project workspace containing user-defined tables |
ENCRYPTION | ^SAILX\d+V$ | A project-level encryption table |
Method require.access()
Tests whether the client has the privileges to interface with the datasets defined within this container.
Note that 'privileges' could describe one of the following privileges: SELECT
, UPDATE
, INSERT
etc. Please see IBM's documentation on
user privileges here for more information, or see SAILDB.DEF$PRIVILEGES
Arguments
db
(
saildb::Connection
)
An active, validsaildb::Connection
database connectionprivileges
(
list
)
A named list in which the key describes the dataset, and where the value describes which privileges are required, e.g.list(gp.events=c('INSERT', 'SELECT'), pedw.spell='ALL')
. Note thatALL
expands to all privileges described bySAILDB.DEF$PRIVILEGES
assert
(
logical|NA
)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults toTRUE
Method require.shape()
Tests whether the given dataset(s) meet the selected criteria by ensuring each of its columns and their datatypes match those specified
Arguments
db
(
saildb::Connection
)
An active, validsaildb::Connection
database connectioncriteria
(
list
)
A list of column criteria for one or more dataset(s), e.g.list(gp.event=list(ALF_PE='BIGINTEGER'))
, in which each key of each inner list describes of the column name(s) and the value(s) specify the required column's type. Do note that types are fuzzy matched to exclude matching on size unless explicitly specified, e.g.list(some.table=list(SOME_TEXT_COL='VARCHAR'))
v.s.list(some.table=list(SOME_TEXT_COL='VARCHAR(200)'))
. Please do note that names, columns and types are case sensitive.assert
(
logical|NA
)
An optional logical that describes whether to throw an error and to stop the execution of the current thread if one or more of the datasets aren't present; defaults toTRUE
Method [()
Subscript operator overload to retrieve one or more table references derived from the indexed dataset(s)
Method [[()
Subscript operator overload to retrieve one or more table references derived from the indexed dataset(s)
Method length()
Compute the length of the datasets contained by this instance
Examples
## ------------------------------------------------
## Method `DatasetContainer$new`
## ------------------------------------------------
if (FALSE) { # \dontrun{
# Initialise the container...
datasets = DatasetContainer$new(
# Some reference table unknown to \code{SAILDB.METADATA}; character string references must be a named argument and can't be a reserved name - see \code{DatasetContainer$is.reserved}
some.table = 'SAILREFRV.SOME_TABLE',
# Some project table unknown \code{SAILDB.METADATA}; refresh dates should be appended to the name as usual - these can be refreshed using the \code{DatasetContainer$pull.refresh} method
other.table = 'SAILXXXV.OTHER_TABLE_20240905',
# Include some reference table, no schema is needed since this is retrieved from SAILREFRV. Note that the argument name _must_ match \code{DatasetContainer$ref}'s \code{$ref} property if you are using a named argument
sailref.wimd2019.sm.area = DatasetContainer$ref('sailref.wimd2019.sm.area'),
# Include some reference table from a project schema; and, as above, the argument name must match the reference name
adde.deaths = DatasetContainer$ref('adde.deaths', 'SAILXXXV'),
# Include a specific refresh of some reference table from a project schema; and, as above, the argument name must match
abde.births = DatasetContainer$ref('abde.births', 'SAILXXXV', date='20240905'),
DatasetContainer$ref('wdsd.pers', 'SAILXXXV', '20240905')
)
} # }
## ------------------------------------------------
## Method `DatasetContainer$[`
## ------------------------------------------------
if (FALSE) { # \dontrun{
some.container[c('gp.event', 'other.table')]
} # }
## ------------------------------------------------
## Method `DatasetContainer$[[`
## ------------------------------------------------
if (FALSE) { # \dontrun{
some.container[['gp.event']]
} # }