2.5 Statistical analysis
Descriptive analysis Median, interquartile range (IQR), percentile and mean ± standard deviation was used to characterize the air pollutants. Spearman’s correlation analysis was used to detect the magnitude of correlation between each environmental factor. Independent samples t-test or Mann-Whitney U rank sum test, Chi-square test or Fisher exact test were used to compare the differences in characteristics between the eczema group and the non-eczema group.
Generalized linear models (GLM) It is a direct generalization of the common linear models, and logistic regression analysis was implemented by the link function. The binary logistic model was used to estimate the effects of pollutant exposure during pregnancy on one-year-old eczema, two-year-old eczema, and cumulative eczema; and a multiple logistic model was used to estimate the effect of pollutant exposure during pregnancy and two years postnatally on no eczema, only eczema at age 1, only eczema at age 2, and persistent eczema (no eczema group: OR =1), the process was implemented through the ‘nnet’ package in the R software.
The distributed lag model (DLM) A sensitive period is a time when the effects of exposure on development and disease risk are stronger at one time than at others. In this study, the distributed lag model was used to find the sensitive window period of air pollutants on the onset of eczema. The vulnerable sensitive window period was identified by dividing the exposure time by week, based on the generalized linear model, and adding cross-sectional basis functions of the study variables to assess the lag effect of exposure factors and the relationship of exposure response. The process was implemented through the ‘dlnm’ package in R software.
Weighted quantile sum (WQS) model To evaluate the joint effect of exposure to air pollutants on the onset of eczema in children over different time periods(Garcia-Serna et al., 2022). Inclusion of all pollutants in the model and analysis of their positive association with childhood eczema, the WQS index (a weighted linear index) is obtained, which is considered as an overall mixture effect, while weight of each pollutant indicated how much a certain pollutant contributed to the WQS index, and the values of the weights range from 0 to 1. When constructing the model, the model parameter “q” was set to 4, which indicates that the effect is obtained after each quantile increase, the number of bootstrap (b) samples used in the parameter estimation of the model was set to 1000. The process was implemented through the ‘gWQS’ package in the R software.
Principal component analysis (PCA) PCA is a multivariate statistical method that classifies multiple air pollutants with correlations into a set of uncorrelated variables, called principal components (PCs), representing the most important characteristics of the raw data, which are arranged in descending order of variance. The process was implemented through the ‘factoextra’ package in R software.
Sensitivity analysis : To verify the robustness of the WQS model, firstly, the model parameter “q” was changed from 4 to 10, representing the joint effect of exposure to air pollutants on childhood eczema for each 10-percentile increase. Secondly, the number of bootstrap (b) samples used in the parameter estimation of the model were set to 100, 1000, 2000 and 3000 respectively. If there was no significant difference in the results of sensitivity analysis, it indicated the robustness of the model.
Variance Inflation Factor (VIF) was used to diagnose the covariate covariates in each multifactorial model, and statistical analysis was performed in this study with P <0.05 as the test level, and all statistical analyses were performed in R 4.1.2.