Table 1: Kariki Farm Dataset Dimension from wunderground.com
First, the data was loaded and preprocessed to ensure it was clean
and had no missing values. Next, we used Rough Set to detect the
interaction terms in the dataset. The process was achieved by first
discretizing the data, which changes numerical representations to nominal, which was necessary to enhance the evaluation and
management of data. Discretization uses a data transformation procedure
involving finding and cutting data sets and dividing the data into
intervals. Values lying within an interval are then mapped to the same
value. Doing this process will reduce the size of the attribute value
set (Hassanien, A. E., Abdelhafez, M. E., & Own, H. S. (2008)).
Next, the indiscernibility relation was used to determine which
variables in the dataset are indiscernible from the rest. From this
relation, we can now deduce the lower, upper, and boundary
approximations, which determine the lower, upper, and boundary regions, respectively. The lower region represents
attributes/variables belonging to the subset of interest. After deducing
the approximations, the next step is to formulate the reduct (feature
subset) from the lower/positive region of the approximations; the method
employed here will be the greedy heuristic method for feature selection
which is a wrapper feature selection algorithm. (Janusz, A., Ślęzak, D.
(2014)).
The experiment framework is shown in the diagram below: