Table 3: Confusion matrix GLM model fit on Original Kariki_Farm data without interaction detection using RST
The critical attributes necessary for prediction from the data provided as determined by the fit logistic regression model were Low_temp, Dewpoint_high, Windspeed_high, and windspeed_avg.
In the second experiment, we used Rough Sets to detect interaction terms for the model. Here the data was loaded, and Rough Set theory was used to determine the critical features using lower and upper approximations. After deducing the approximations, the next set of operations was to formulate the reduct (feature subset) from the lower/positive region of the approximations; the method employed here was the greedy heuristic method for feature selection which is a wrapper feature selection algorithm. The reduct formulated had 12 important variables: High temp, Avg temp, low temp, high dew-point, low dew-point, high humidity, low humidity, high wind-speed, avg wind-speed, high pressure, and low pressure. The accuracy was 84.25 %. This shows an increase in testing accuracy by 1.31%. The important attributes necessary for prediction from the data provided as determined by the fit logistic regression model were Dew-point_high, Winds-peed_high, wind-speed_avg, humidity_high, and High. hpa. As in this case, humidity and high pressure were not considered essential attributes in the prediction model in the first experiment.