Sherlock Consults By Inspecting the Data and Evaluating Conditional Effects

Sherlock Holmes was a consulting detective who had spectacular powers of deduction and logical reasoning. Within sherlock's causal segmentation framework, sherlock_calculate takes data from a segmentation "case", the roles of the different variables, and specifications for assessing the conditional treatment effects required for deriving a segmentation. Being the workhorse, this function is the most demanding, as it computes all of the nuisance parameters required for subsequent analyses. The complementary functions watson_segment and mycroft_assess can be used once Sherlock has consulted on the causal segmentation case.

sherlock_calculate(data_from_case, baseline, exposure, outcome, segment_by,
  ids = NULL, treatment_cost = NULL, cv_folds = 5L,
  split_type = c("inner", "outer"), ps_learner, or_learner, cate_learner,
  use_cv_selector = FALSE)

Arguments

data_from_case	Rectangular input data, whether a `data.frame`, `data.table`, or `tibble`.
baseline	A `character` vector specifying the column names in `data_obs` that correspond to the baseline covariates (conditioning set). These variables should temporally precede the exposure and outcome.
exposure	A `character` string (of length one) specifying the column in `data_obs` corresponding to the exposure or treatment. This variable should follow those in `baseline` in time but precede the response variable `outcome`.
outcome	A `character` string (of length one) specifying the column in `data_obs` corresponding to the response variable.
segment_by	A `character` vector specifying the column names in `data_obs` that correspond to the covariates over which segmentation should be performed. This should be a strict subset of `baseline`.
ids	A `character` string (of length one) specifying the column in `data_obs` that gives observation-level IDs. The default value of `NULL` assumes that all rows of `data_obs` are independent.
treatment_cost	A `character` string (of length one) specifying the column in `data_obs` that provides the cost associated to treating the given unit. The default value of `NULL` assumes that all units are equally costly to treat.
cv_folds	A `numeric` specifying the number of cross-validation folds to be used for sample-splitting when estimating nuisance parameters.
split_type	A `character` string (of length one) indicating the sample-splitting "level" at which estimation of the CATE is performed. The choices are "inner", for estimation of the CATE within folds (i.e., at the the same level at which nuisance parameters are estimated), and "outer", in which case the CATE is estimated at the "full-sample" level.
ps_learner	Either an instantiated learner object (class inheriting from `Lrnr_base`), from sl3, or a `list` of specifications, or a constant rate between 0 and 1, to be used for estimation of the propensity score (the probability of receiving treatment, conditional on covariates). If `list`: each entry may be an instantiated learner object, or can be a list where one item is an instantiated learner object whose modeling requires specification, and the other item is a list of character vectors, where each vector specifies an interaction term. If constant rate, this rate represents the population probability of being assigned to treatment in an A/B tests. Note that the outcome of this estimation task is strictly binary and that algorithms or ensemble models should be set up accordingly.
or_learner	Either an instantiated learner object (class inheriting from `Lrnr_base`), from sl3, or a `list` of specifications, to be used for estimation of the outcome regression (the mean of the response variable, conditional on exposure and covariates). If `list`: each entry can be an instantiated learner object, or can be a list where one item is an instantiated learner object whose modeling requires specification, and the other item is a list of character vectors, where each vector specifies an interaction term.
cate_learner	Either an instantiated learner object (class inheriting from `Lrnr_base`), from sl3, or a `list` of specifications, to be used to estimate the CATE, based on a regression of a doubly robust pseudo-outcome on the specified segmentation covariates. If `list`: each entry can be an instantiated learner object, or can be a list where one item is an instantiated learner object whose modeling requires specification, and the other item is a list of character vectors, where each vector specifies an interaction term. Note that the outcome of this estimation task is derived from the other nuisance parameter estimates and should be expected to always be continuous-valued, so algorithms or ensemble models should be set up accordingly.
use_cv_selector	If `TRUE`, then will use cross-validation to choose the best among a list of learners when fitting `ps_learner`, `or_learner` or `cate_learner`. If `FALSE` (default), then the default metalearner for the outcome type (from sl3) will be used. This argument will not be ignored for a `learner` that is not a list, but is instead an instantiated learner object.