Figure 3: Plot of pEC3 versus the predicted
rate determining barrier for aldehydes (triangle), ketones and 1,3
diones (circles). Data is colored by logP (red=high, green=low).
R2=0.36, n=16.
3.3 Quantitative molecular
model
Other factors such as clogP have been included in a number of QSAR
models related to LLNA. Given that the parameter also appeared possible
in further differentiating between active and inactive compounds as
shown in Figure 3, an additional 2-parameter linear regression model was
constructed on the data. The goal here is to produce a conservative
model using small numbers of descriptors that have known relevance to
skin sensitization. This is in part due to the REACH requirements for
simple, robust and interpretable QSAR models for use within a regulatory
framework, but also due to the small numbers of datapoints available for
such modelling exercises.
Starting with all 22 compounds, a multiple linear regression model was
constructed with the two variables resulting in equation 1. The model
results in an explained variance of 66% with a p value of 0.008,
considerably better than that observed with the energy barrier
associated with TS2 alone. The model predicts that as the barrier to
reaction decreases, pEC3 will increase. In addition, as the logP
decreases, the barrier to reaction will concomitantly decrease.
pEC3=-0.377(±0.138)*clogP – 0.127(±0.0436)*ETS2 + 5.69 (±1.30) Equation 1
n=22, r2=0.66, r2adj=0.38,. p=0.008)
Subsequently, the known SNAr compound
18 , was excluded from the analysis and the logP of
20 was corrected due to large difference between the
clogP value used here (1.82) and two other commonly used methods (0.58
for clogP[67] and 0.82 for
ACD[68]). The latter value,
intermediate between the two other value was used. SNAr
compounds 2 and 4 were not excluded as
they are not predicted to function via an SNAr
mechanism according to the predicted QSAR methodology of Promkatkaewet al. [54] The updated
multiple regression model was developed for the 21 compounds as shown in
equation 2. The model has slightly improved prediction statistics with
an r2=0.71 and a p-value of 0.002. Again, it is found
that both increasing logP and barrier height lead to decreased
sensitization.
pEC3=-0.425(±0.124)*clogP – 0.152(±0.0412)*ETS2 + 6.17 (±1.22) Equation 2
n=21, r2=0.71, r2adj =0.46, p=0.002
As a final exercise, the dataset of 22 compounds was partitioned into a
training (N=14) and test set (N=8). As with the work of Roberts et
al. , we generated the QSAR model on a set consisting primarily of
nonfunctional aldehydes and ketones to avoid confounding effects. A
small number of additional exemplars that help cover the full range in
pEC3 were also included (Table 2). All compounds were
predicted using equation 3. Compounds 2 , 4 and
18 were also predicted using our previously reported
model for SNAr
domain.[54] Again, a two-parameter
model was fitted using the training data resulting in equation 3. The
training set explained variance is somewhat lower than observed with the
larger combined set (r2=0.40), however the descriptor
coefficients are qualitatively similar. Prediction on the test set of
compounds show the compounds are quite well ranked
(r2=0.49). A noticeable outlier in figure 3 is
compound 17 which on further analysis of the structure
can also potentially function via the acyl reaction
domain.[5] This could account for its
low predicted activity from this Schiff-base derived model. When
compounds 20 (SNAr domain) and
17 (Acyl domain) are excluded from the test set,
r2 of 0.62 is observed.
pEC3 = -0.388(±0.150)*clogP – 0.172(±0.061)*ETS2 + 6.671 (±1.841) Equation 3
Training set (n=14, r2=0.49, r2adj=0.40, p=0.02),
Test set (n=8, r2=0.49), Test set (n=6 (ex 17 & 20), r2=0.62)