Figure 3: Plot of pEC3 versus the predicted rate determining barrier for aldehydes (triangle), ketones and 1,3 diones (circles). Data is colored by logP (red=high, green=low). R2=0.36, n=16.

3.3 Quantitative molecular model

Other factors such as clogP have been included in a number of QSAR models related to LLNA. Given that the parameter also appeared possible in further differentiating between active and inactive compounds as shown in Figure 3, an additional 2-parameter linear regression model was constructed on the data. The goal here is to produce a conservative model using small numbers of descriptors that have known relevance to skin sensitization. This is in part due to the REACH requirements for simple, robust and interpretable QSAR models for use within a regulatory framework, but also due to the small numbers of datapoints available for such modelling exercises.
Starting with all 22 compounds, a multiple linear regression model was constructed with the two variables resulting in equation 1. The model results in an explained variance of 66% with a p value of 0.008, considerably better than that observed with the energy barrier associated with TS2 alone. The model predicts that as the barrier to reaction decreases, pEC3 will increase. In addition, as the logP decreases, the barrier to reaction will concomitantly decrease.
pEC3=-0.377(±0.138)*clogP – 0.127(±0.0436)*ETS2 + 5.69 (±1.30)                                 Equation 1
n=22, r2=0.66, r2adj=0.38,. p=0.008)
Subsequently, the known SNAr compound 18 , was excluded from the analysis and the logP of 20 was corrected due to large difference between the clogP value used here (1.82) and two other commonly used methods (0.58 for clogP[67] and 0.82 for ACD[68]). The latter value, intermediate between the two other value was used. SNAr compounds 2 and 4 were not excluded as they are not predicted to function via an SNAr mechanism according to the predicted QSAR methodology of Promkatkaewet al. [54] The updated multiple regression model was developed for the 21 compounds as shown in equation 2. The model has slightly improved prediction statistics with an r2=0.71 and a p-value of 0.002. Again, it is found that both increasing logP and barrier height lead to decreased sensitization.
pEC3=-0.425(±0.124)*clogP – 0.152(±0.0412)*ETS2 + 6.17 (±1.22)                                Equation 2
n=21, r2=0.71, r2adj =0.46, p=0.002
 
As a final exercise, the dataset of 22 compounds was partitioned into a training (N=14) and test set (N=8). As with the work of Roberts et al. , we generated the QSAR model on a set consisting primarily of nonfunctional aldehydes and ketones to avoid confounding effects. A small number of additional exemplars that help cover the full range in pEC3 were also included (Table 2). All compounds were predicted using equation 3. Compounds 2 , 4 and 18 were also predicted using our previously reported model for SNAr domain.[54] Again, a two-parameter model was fitted using the training data resulting in equation 3. The training set explained variance is somewhat lower than observed with the larger combined set (r2=0.40), however the descriptor coefficients are qualitatively similar. Prediction on the test set of compounds show the compounds are quite well ranked (r2=0.49). A noticeable outlier in figure 3 is compound 17 which on further analysis of the structure can also potentially function via the acyl reaction domain.[5] This could account for its low predicted activity from this Schiff-base derived model. When compounds 20 (SNAr domain) and 17 (Acyl domain) are excluded from the test set, r2 of 0.62 is observed.
pEC3 = -0.388(±0.150)*clogP – 0.172(±0.061)*ETS2 + 6.671 (±1.841)         Equation 3
Training set (n=14, r2=0.49, r2adj=0.40, p=0.02),
Test set (n=8, r2=0.49), Test set (n=6 (ex 17 & 20), r2=0.62)