Linden and Yarnold1 recently proposed classification
tree analysis (CTA), a machine‐learning procedure, as an alternative to
conventional methods for analyzing mediation effects in
treatment-outcome research. They note that CTA may have a number of
advantages in this regard. It requires no assumptions about the
distribution of variables or the functional form of the best-fitting
model, for example, thus affording greater potential flexibility in
identifying complex forms of association among variables with varying
scales of measurement (e.g., binary and continuous). The authors further
argue that CTA, unlike conventional approaches to testing mediation,
“will not generate a model if a treatment-mediator-outcome relationship
does not exist” (p. 359) and, conversely, that CTA “will
systematically identify a treatment-by-mediator interaction if it
exists, as well as any other interaction between variables.” (p. 359).
Using data from the Job Search Intervention Study (JOBS II), they find
that structural equation modeling, a conventional approach to testing
mediation, failed to indicate support for job-search self-efficacy as a
mediator of the effects of the intervention on whether the study
participant was reemployed at follow-up. In contrast, the authors
conclude that CTA applied to the same set of variables revealed that
job-search self-efficacy was a mediator of intervention effects on
employment among those in the treatment group.
There are, however, two problematic aspects of the authors’ approach.
First, as they point out, causal inference of a mediational pathway
depends on the assumption that the associations involved exist net of
potential confounding variables. In a randomized control trial such as
JOBS II, confounding of the association between the mediator and outcome
is of particular concern.2 The authors seek to address
this concern by conducting a CTA that includes a set of 9 potential
confounders (e.g., income, age, initial level of depressive symptoms) as
candidate predictors. This approach does not ensure statistical control
for these variables, however, for at least two reasons. First, there is
no guarantee that the variables involved will actually be included in
the resulting classification tree; failing to meet the criterion for
inclusion does not rule out the possibility that a given potential
confounder or combination of such confounders nonetheless share an
association with the mediator and outcome to an extent that renders
their association non-significant. This concern proves pertinent to the
authors’ analysis as only 3 of the potential confounders earn entry into
the resulting classification tree for reemployment status. A second
concern with the approach taken by the authors is that whatever
covariates are included in the classification tree may be included in
positions that are inadequate for the purpose of controlling for
confounding of the mediator-outcome association. Gender, for example, in
their analysis is included in the control group branch of the tree,
whereas job-search self-efficacy, the potential mediator, is included
only in the treatment group arm. As a further example, education is
included on the treatment group arm, but only after job-search
self-efficacy’s role in the model already has already been established
through its inclusion on a higher tier of the model. An alternative
approach for addressing confounding in the context of CTA would be to
residualize the candidate mediator on possible confounders and then fit
a classification tree with this adjusted variable. If the residualized
mediator continues to discriminate across levels of the outcome,
conditional on treatment status, it could be concluded that the
mediate-outcome portion of the potential mediational pathway of interest
is evident independent of the measured confounders. Applying this
approach, I find that the job-search self-efficacy continues to emerge
as a discriminating variable for reemployment in the treatment arm
branch of the classification tree that I fit with the same JOBS-II data.
However, it cannot be assumed that this more rigorous approach to taking
into account confounders will always yield the same result as one in
which they are merely included as candidate predictors within a CTA.
A more fundamental concern with the CTA approach to identifying
mediation employed by the authors is that the mediator in the
predictor-mediator and mediator-outcome segments of the mediational
pathway are not assured of having a consistent definition. This is a
result of the optimal cut-point on the mediator being determined
separately for the two segments, through an optimal discriminant
analysis for the treatment-mediator segment and CTA for the
mediator-outcome segment. Thus, in the case of the mediational pathway
for reemployment, treatment status predicts a dichotomous measure of
job-search self-efficacy determined by a cut-score of 3.92 (i.e., high
job-search self-efficacy corresponds to values above 3.92 and low to
values equal to and below 3.92), whereas the dichotomous measure of
job-search self-efficacy predicting reemployment is determined by a
different cut-score of 4.92. Yet, it is essential by definition that the
mediator in a mediational pathway that is influenced by the initial
variable in the pathway (in this case, treatment) be the same variable
that then influences the outcome (in this case, reemployment). In this
case, nearly one half of the sample (47.3%; n = 426) has a
different high-low classification on the job-search self-efficacy,
suggesting considering divergence between the versions of this variable
determined by the respective cut-scores. One approach to addressing this
concern would be to define a new mediator that reflects the overlapping
portion of the differing definitions. In the present example, this could
be a dichotomous measure of job-search self-efficacy defined by a
cut-score of greater 3.92, which would ensure that all those with scores
classified as relatively high on the measure continue to be classified
as such; alternatively, a cut-score of 4.92 could be used if priority is
given to ensuring that all those classified as low remain in this
category. Still another option would be to use the mid-point between the
two cut-scores of 4.42. One could then evaluate the mediational pathway
of interest using one or more of these options. Applying this approach
using PROC CAUSALMED in SAS3, allowing for the
suggested treatment-mediator interaction and including all covariates, I
find evidence of an indirect mediated pathway when using the lower bound
cut-off score of 3.92 for job-search self-efficacy (Odds Ratio for
natural indirect effect of .959, 95% CI limits of .893, .998), but not
when using the other two cut-off scores, although the differences in
estimates are admittedly small in this case (complete results are
available upon request).
To summarize, Linden and Yarnold (2018) make a significant contribution
by introducing CTA as a promising strategy for identifying mediated
effects in intervention research. Further refinements to their approach,
however, are recommended to more fully incorporate fundamental
assumptions that accompany all tests for mediation as well as to
evaluate and confirm potential mediational pathways using conventional
procedures.
References
1Linden A, Yarnold PR. Identifying causal mechanisms
in health care interventions using classification tree analysis. J
Eval Clin Pract. 2018;24:353–361.
https://doi.org/10.1111/jep.12848
2Imai K, Keele L, Tingley D. A general approach to
causal mediation analysis. Psychol Methods . 2010;15:309‐334.
3Yung, Y, Lamm, M, Zhang, W. Causal Mediation
Analysis with the CAUSALMED Procedure . Paper SAS1991-2018. Cary, NC:
SAS Institute Inc.; 2018.