Sherlock Holmes was a consulting detective who had spectacular powers of
deduction and logical reasoning. Within sherlock's causal segmentation
framework, sherlock_calculate
takes data from a segmentation "case",
the roles of the different variables, and specifications for assessing the
conditional treatment effects required for deriving a segmentation. Being
the workhorse, this function is the most demanding, as it computes all of
the nuisance parameters required for subsequent analyses. The complementary
functions watson_segment
and mycroft_assess
can
be used once Sherlock has consulted on the causal segmentation case.
sherlock_calculate(data_from_case, baseline, exposure, outcome, segment_by,
ids = NULL, treatment_cost = NULL, cv_folds = 5L,
split_type = c("inner", "outer"), ps_learner, or_learner, cate_learner,
use_cv_selector = FALSE)
Arguments
data_from_case |
Rectangular input data, whether a data.frame ,
data.table , or tibble . |
baseline |
A character vector specifying the column names in
data_obs that correspond to the baseline covariates (conditioning
set). These variables should temporally precede the exposure and outcome. |
exposure |
A character string (of length one) specifying the
column in data_obs corresponding to the exposure or treatment. This
variable should follow those in baseline in time but precede the
response variable outcome . |
outcome |
A character string (of length one) specifying the
column in data_obs corresponding to the response variable. |
segment_by |
A character vector specifying the column names in
data_obs that correspond to the covariates over which segmentation
should be performed. This should be a strict subset of baseline . |
ids |
A character string (of length one) specifying the column
in data_obs that gives observation-level IDs. The default value of
NULL assumes that all rows of data_obs are independent. |
treatment_cost |
A character string (of length one) specifying
the column in data_obs that provides the cost associated to treating
the given unit. The default value of NULL assumes that all units are
equally costly to treat. |
cv_folds |
A numeric specifying the number of cross-validation
folds to be used for sample-splitting when estimating nuisance parameters. |
split_type |
A character string (of length one) indicating the
sample-splitting "level" at which estimation of the CATE is performed. The
choices are "inner", for estimation of the CATE within folds (i.e., at the
the same level at which nuisance parameters are estimated), and "outer", in
which case the CATE is estimated at the "full-sample" level. |
ps_learner |
Either an instantiated learner object (class inheriting
from Lrnr_base ), from sl3, or a list of
specifications, or a constant rate between 0 and 1, to be used for
estimation of the propensity score (the probability of receiving treatment,
conditional on covariates). If list : each entry may be an
instantiated learner object, or can be a list where one item is an
instantiated learner object whose modeling requires specification, and the
other item is a list of character vectors, where each vector specifies an
interaction term. If constant rate, this rate represents the population
probability of being assigned to treatment in an A/B tests. Note that the
outcome of this estimation task is strictly binary and that algorithms or
ensemble models should be set up accordingly. |
or_learner |
Either an instantiated learner object (class inheriting
from Lrnr_base ), from sl3, or a list of
specifications, to be used for estimation of the outcome regression (the
mean of the response variable, conditional on exposure and covariates). If
list : each entry can be an instantiated learner object, or can be a
list where one item is an instantiated learner object whose modeling
requires specification, and the other item is a list of character vectors,
where each vector specifies an interaction term. |
cate_learner |
Either an instantiated learner object (class inheriting
from Lrnr_base ), from sl3, or a list of
specifications, to be used to estimate the CATE, based on a regression of a
doubly robust pseudo-outcome on the specified segmentation covariates. If
list : each entry can be an instantiated learner object, or can be a
list where one item is an instantiated learner object whose modeling
requires specification, and the other item is a list of character vectors,
where each vector specifies an interaction term. Note that the outcome of
this estimation task is derived from the other nuisance parameter estimates
and should be expected to always be continuous-valued, so algorithms or
ensemble models should be set up accordingly. |
use_cv_selector |
If TRUE , then will use cross-validation to
choose the best among a list of learners when fitting ps_learner ,
or_learner or cate_learner . If FALSE (default), then
the default metalearner for the outcome type (from sl3) will be used.
This argument will not be ignored for a learner that is not a list,
but is instead an instantiated learner object. |