ROUGHSET AS A FEATURE INTERACTION DETECTION MODEL
Rough Set theory is a knowledge discovery method highly applied in
relational databases. Rough Set is a machine learning model that bases
its functionality on the information granulation of the data it is
working on. That is, it seeks to identify the interactions in the data
even if it has incomplete or no prior information. Professor Pawlak
first introduced it in 1982. Rough sets can be divided into two parts;
the first part forms the concepts and rules through classification,
while the second concerns knowledge discovery through target
classification. Rough sets have been used in several types of research
when coupled with machine learning methods; they have been used in
preprocessing problems, feature selection, and instance selection
(Bello, R 2017). Rough Set theory’s fundamental concepts are as
explained below:
Indiscernibility Relation : is the relation between an object in
a rough set where all the values are identical to the subset of the
considered attributes (Rissino, S. et al. (2009))
Let A, P ⊆ A, the indiscernibility relation IND (P), can be defined as
IND (P) = {(x, y) ∈U ×U: for all a∈ P, a(x) = a(y)}
A set is a grouping of objects which contain similar
characteristics. (Rissino, S. et al. (2009))
When the boundary region is a non-empty set that is B(X) ≠B(X), then the set is
Called a Rough Set.
Approximations : are based on the THREE regions of the rough set
theory, mainly lower approximation, upper approximation, and boundary
approximation. (Rissino, S., et a (2009))
A lower approximation of a subset can be defined as the
set of objects that positively belong to the target set. Let B ⊆ C and X
⊆ U, the B-lower approximation set of X, be the set of all elements of
U, which can be with certainty classified as elements of X.
Upper Approximation can be defined as a set of objects
which possibly belonging the target set.
B(X) = {x∈U: B(x) ∩ X≠ φ }
Boundary Approximations can be classified as the
collection of elementary sets of objects that cannot be decisively
classified into X in B.
Decision Table / Information system: this is the
primary mode of storing data in rough sets and represents input data
gathered from the domain or environment in which the rough sets will be
implemented. (Rissino, S. et al. (2009))
Reduct: This process in Rough set theory involves
dimensionality reduction through removing redundant or irrelevant
attributes. This process is also the result of the feature selection
process of the roughest technology, and the end products are decision
tables called Decision Reducts. The Quick reduct method and greedy
heuristic method can be used for reduct generation, which combines the
greedy search method with a heuristic function such as entropy to find a
minimal subset of features necessary for decision-making.