Table 3: Confusion matrix GLM model fit on Original Kariki_Farm
data without interaction detection using RST
The critical attributes necessary for prediction from the data provided
as determined by the fit logistic regression model were Low_temp,
Dewpoint_high, Windspeed_high, and windspeed_avg.
In the second experiment, we used Rough Sets to detect interaction terms
for the model. Here the data was loaded, and Rough Set theory was used
to determine the critical features using lower and upper approximations.
After deducing the approximations, the next set of operations was to
formulate the reduct (feature subset) from the lower/positive region of
the approximations; the method employed here was the greedy heuristic
method for feature selection which is a wrapper feature selection
algorithm. The reduct formulated had 12 important variables: High temp,
Avg temp, low temp, high dew-point, low dew-point, high humidity, low
humidity, high wind-speed, avg wind-speed, high pressure, and low
pressure. The accuracy was 84.25 %. This shows an increase in testing
accuracy by 1.31%. The important attributes necessary for prediction
from the data provided as determined by the fit logistic regression
model were Dew-point_high, Winds-peed_high, wind-speed_avg,
humidity_high, and High. hpa. As in this case, humidity and high
pressure were not considered essential attributes in the prediction
model in the first experiment.