Methods
Case reports in NSW, Australia from the beginning of the epidemic in January 2020 to the peak of the epidemic at the end of March 2020 were accessed (NSW Government, 2020a). Cases reports in which the source of infection was determined to be locally-acquired, and in which a date of notification and postcode of residence was reported, were selected for inclusion into the study. A time-series of case reports was then created. Based on the reported postcode, for all reported cases in the time-series the closest weather observation station reporting rainfall, temperature and humidity for the period January to March 2020 was identified (NSW Government, 2020b). Daily observations of the following meteorological recordings were downloaded: rainfall (mm), and temperature (°C) and relative humidity (%) recorded at 9am and at 3pm (Australian Government Bureau of Meteorology, 2020). The median values for each day for all selected weather observation stations were estimated, and time-series of median rainfall, and 9am and 3pm temperature and relative humidity were created. Two additional time-series were then created by determining the daily difference between 9am and 3pm temperature, and between 9am and 3pm relative humidity. Thus, 7 predictor time-series were available for modelling.
A correlation matrix was used to select meteorological variables to avoid multicollinearity in the analysis. Variables with a correlation coefficient <0.6 were retained. Each remaining variable was included in a univariate generalized additive model (GAM) with daily number of reported cases as the dependent variable. Variables withp value <0.1 in univariate analyses were then included in a multivariate GAM, and the best fitting model based on Akaike Information Criterion (AIC) was selected using a backward algorithm. COVID-19 cases were assumed to follow a negative binomial distribution given that the variances of the daily cases reported were greater than their means. Meteorological variables were analyzed using a 14-day exponential moving average (EMA), based on the assumed incubation period of SARS-CoV-2. Natural splines of two degrees of freedom were also included to account for additional short-term trend. A sensitivity analysis was performed by modifying the EMA from 14 days to 10 and 21 days, respectively. R software (version 3.5.3, http://cran.r-project.org; R Foundation for Statistical Computing, Vienna, Austria) was used to perform all the statistical analyses and visualization.