3.1.3 Entropy Background
Let w be a word composed of N characters over alphabet Σ = (a0, a1…am−1). For each ai∈Σ, we can count its occurrences, denoted as ni, and then the frequency fiof aiis calculated as ni/N. The entropy of w is estimated as:
Where MLE is the maximum likelihood estimator. When the characters are bytes, the maximum entropy is 8 (bits per byte). The above estimator is a little different from the original definition of entropy, which is:
However, the approximation\(H_{N}^{\text{MLE}}\left(w\right)\sim H(p)\)is validwhen N >> m. In practice, the condition N >> m couldbe an issue because some packets are short and less than m.[9] introduces an approach to solve this problem. Firstly, theauthors estimate the average sample entropy HN(p), which isdefined as average of \(H_{N}^{\text{MLE}}\left(w\right)\)over all words w of lengthN. When p is uniform distribution μ, HN(μ) can be calculatedas:
With c = N m and o (1) is less than 0.004. Similar to [9], we use the Monte-Carlo method to estimate the standard deviation SD (ˆHMLEN). Finally, we use HN(μ)− 4 ∗ SD( ˆH MLEN ) as the entropy threshold to decide whether the data under investigation is high or low entropy. Note that the above methodology applies to both the entire flow and individual packets. In the former case we consider data in the entire flow, while in the latter we consider the payload of each packet individually.
B. HE Flow Detection
We now present the flow- and packet-based algorithms to designate a flow as HE. We will then compare these algorithms by applying them on data of known, popular types. While they encrypt user data, many encryption protocols exchange packets at the beginning of a flow that are not encrypted (and thus low entropy). After the initial exchange,