Figure 7
Subsequent packets are encrypted. Figure 6 illustrates this scenario. To
prevent such LE packets from skewing entropy calculation our algorithms
wait until N Sequential High EntropyPackets have been detected before
calculating entropy. Unfortunately, there is no clear way to estimate N,
so we determine the value of N experimentally. For our datasets N = 2
seems to work best.
We now describe briefly the flow-based and packet-based algorithms.
Recall that both algorithms aim at labeling a flow as HE or LE, but the
former does so by examining the entire flow data, where the latter
examines each packet separately.
1) Flow-based Entropy:
After detecting N Sequential High Entropy Packets we capture the payload
of all subsequent packets and then calculate the cumulative entropy of
the resulting data (including the initial HE packets). We then compare
the cumulative entropy with the threshold, as described earlier. If the
cumulative entropy is greater than the threshold, then the flow is
identified as HE, else it is LE.
2) Packet-based Entropy:
After detecting N Sequential high entropy packets, we calculate the
entropy for each packet and classify it as HE or LE. At the end of the
flow we count the number of HE and LE packets, denoted as N (HE) and N
(LE). If N (HE)/(N(HE)+N(LE)) is greater than our threshold, which is
named High Entropy Packet Percentage Threshold, then we consider the
flow as HE. Figure 7 illustrates this approach.