ROUGHSET AS A FEATURE INTERACTION DETECTION MODEL
Rough Set theory is a knowledge discovery method highly applied in relational databases. Rough Set is a machine learning model that bases its functionality on the information granulation of the data it is working on. That is, it seeks to identify the interactions in the data even if it has incomplete or no prior information. Professor Pawlak first introduced it in 1982. Rough sets can be divided into two parts; the first part forms the concepts and rules through classification, while the second concerns knowledge discovery through target classification. Rough sets have been used in several types of research when coupled with machine learning methods; they have been used in preprocessing problems, feature selection, and instance selection (Bello, R 2017). Rough Set theory’s fundamental concepts are as explained below:
Indiscernibility Relation : is the relation between an object in a rough set where all the values are identical to the subset of the considered attributes (Rissino, S. et al. (2009))
Let A, P ⊆ A, the indiscernibility relation IND (P), can be defined as IND (P) = {(x, y) ∈U ×U: for all a∈ P, a(x) = a(y)}
A set is a grouping of objects which contain similar characteristics. (Rissino, S. et al. (2009))
When the boundary region is a non-empty set that is B(X)B(X), then the set is Called a Rough Set.
Approximations : are based on the THREE regions of the rough set theory, mainly lower approximation, upper approximation, and boundary approximation. (Rissino, S., et a (2009))
A lower approximation of a subset can be defined as the set of objects that positively belong to the target set. Let B ⊆ C and X ⊆ U, the B-lower approximation set of X, be the set of all elements of U, which can be with certainty classified as elements of X.
B(X) ={xU: B(x) ⊆ X}
Upper Approximation can be defined as a set of objects which possibly belonging the target set.
B(X) = {xU: B(x) ∩ X≠ φ }
Boundary Approximations can be classified as the collection of elementary sets of objects that cannot be decisively classified into X in B.
BNB(X) = B(X)B(X
Decision Table / Information system: this is the primary mode of storing data in rough sets and represents input data gathered from the domain or environment in which the rough sets will be implemented. (Rissino, S. et al. (2009))
Reduct: This process in Rough set theory involves dimensionality reduction through removing redundant or irrelevant attributes. This process is also the result of the feature selection process of the roughest technology, and the end products are decision tables called Decision Reducts. The Quick reduct method and greedy heuristic method can be used for reduct generation, which combines the greedy search method with a heuristic function such as entropy to find a minimal subset of features necessary for decision-making.