A Project on Covert Timing Channels
1. Introduction
The goal of this project is to learn about how to design, detect, and implement a covert timing channel using a stream of packets generated by an application. We will do this project in steps. At this point we will get started with the first simple step. CourseNana.COM
2. Background
Covert communication is method of exchanging secret messages in which the communication is hidden. A related term is steganography which deals with methods to write/embed hidden messages in such a way that no one, other than the sender and the intended receiver, know the existence of the message. The word steganography is of Greek origin and means "concealed writing" from the Greek words steganos meaning "covered or protected", and graphei meaning "writing" [From Wikipedia]. Steganography and covert channels have a long history and was used in WWII to send secret messages to spies behind enemy lines. In the computer and network security, covert channels and steganography fall in the broad category of security through obscurity. CourseNana.COM
The advantage of a covert channel over cryptography is that messages do not attract attention to themselves. Plainly visible encrypted messages — no matter how unbreakable — will arouse suspicion. The very presence of encrypted messages may be incriminating in countries where encryption is illegal. In such cases the communication channel must itself be hidden and this is achieved using covert channels / steganography. Note that, cryptography protects the contents of a message. Covert communication on the other hand protects both the message and the communicating parties. CourseNana.COM
Typically, steganography refers to the concealment of information within a document file, image file, or program. Media files are ideal for steganographic transmission because of their large size. As a simple example, a sender might start with an innocuous image file and adjust the color of every 100th pixel to correspond to a letter in the alphabet. The overall change is so small that someone not specifically looking for it is unlikely to notice it. Another method is called the Least Significant Bit (LSB) substitution. In this method, the least significant bit of each pixel in a digital image is modified by the bits of the secret message. Since the LSB contributes very little to the overall (intensity/color/brightness) of each pixel, the change in the image will be imperceptible to the naked eye. CourseNana.COM
Covert channels are communication channels that are established over some overt medium. For example, we can uses a stream of network packets (for example stream of packets generated by a Skype call) as the overt carrier for a covert communication channel. As usual, we have our three characters Alice, Bob, and Eve. Alice and Bob are in a repressive country where all communication is monitored and they want to establish a covert channel to exchange secret messages. Eve is a warden who can look at all network packets and try to detect if any covert communication is being used to plan a uprising against the repressive state. CourseNana.COM
To setup a covert communication channel, Alice and Bob first initiate an overt application let say a (computer to computer) Skype call and they start a regular innocuous conversation. Their interactive conversation produces a stream of IP packets from Alice to Bob and Bob to Alice. For the time being let us only consider covert channel in one direction from Alice to Bob and hence only consider the IP packets stream from Alice to Bob. There are two ways in which Alice can send a secret message. She can replace some unused bits in the protocol header with the bits of the covert message. This is called a covert storage channel. These types of channels are easily detectable since the protocol header fields that are not used are well known to Eve and she can check bits to detect the covert channel, identify Alice and Bob and the covert message. The other method that Alice can uses is to alter the inter-packet delays of the IP packet using a pre-established shared key to modulate the bits of the secret message. This is called a covert timing channel and will be the focus of our study. CourseNana.COM
To make things more concrete lets consider a concrete example. For simplicity, we will assume that Alice has buffered a large number of the IP packets that she has generated (as a result of her talking). This is obviously not realistic [why?] but for this first step let's make this assumption. Each packet has two attributes 1) a sequence number and 2) the time when the packet was generated. Thus, P(n, tn) denotes packet n which was generated at time tn. We will assume that the first packet is numbered 1 and is generated at time 0, i.e., t1 = 0. Note that the time field gives the cumulative time. To obtain the inter-packet delay, we can take the time difference between the consecutive packets. This packet stream is the unmodified overt traffic. Alice and Bob have a priori decided that an inter-packet delay of 0.5 will be used to code 1 and an inter-packet delay of 0.1 will be used to code a 0. So if Alice wants to send the alphabet “b” (which isd 0110 0010) she will generate the following sequence of packets P(1, T0), P(2, T1=T0+0.1), P(3, T2=T1+0.5), P(3, T3 = T2+0.1), P(4, T4 = T3+0.1) for the first 4 bits starting from the LSB. This is shown in the Figure below. CourseNana.COM
If the timing between the packets are not altered by the network or by Eve, then Bob can observe the inter-packet delays, translate them to binary bits and then determine the corresponding character. In this assignment, we will try to design a method of modulating the bits into inter-packet delays such that Eve is not able to discover the channel. We will assume that the network or Eve will not modify the inter-packet delays. CourseNana.COM
4. A First Simple Design
Alice ad Bob are having a Skype call. When Alice talks, the Skype application generates packets. This is called the overt packet stream. In her computer Alice has created a packet buffer (in memory) which holds a bunch of packets (as many as required to send the secret message) and then a using aspecially designed packet scheduler releases the packet with interpacket delays that encode the secret message. The overall system in Alice's computer is shown in the Figure below. In this part of the project we will consider a few simple scheme for encoding a secret message in the inter-packet delays. CourseNana.COM
4.1 Secret Message
The secret message that Alice wants to send to Bob is ``this is a secret message\'' The characters are encoded using 8 bits ASCII. Write a code to convert the secret message into a sequence of bits. Include the spaces as well. You can use the bin and ord functions as shown below. CourseNana.COM
Include the code that generates the bit sequence in a file named "secret_message_bits" CourseNana.COM
4.2 Obtaining the Baseline
Alice has buffered a sequence of packets that was generated by Skype. This is given in a excel file. This called the overt packet stream. Write a Python code to plot the histogram of the inter-packet delays of the overt packet stream. This is the baseline. This statistical characteristic of the packet stream is known to Eve and she will use this to see if any given packet stream between Alice and Bob is suspicious. In Section 7 (Detection) we will learn of techniquess of how Eve will actually do this. CourseNana.COM
4.3 A Simple Modulation Scheme
Alice and Bob decide to use the following modulation scheme to map the bits to the inter-packet delay. A delay of 0.25 is used to encode a bit 0 and delay of 0.75 is used to encode a bit 1. Write a Python code that will generate the modified packet stream that contains the secret message. Note that you will not need all the packets that you are given. CourseNana.COM
4.4 Histogram of Inter-packet Delays
Plot the histogram of the inter-packet delay of the covert packet stream. Plot the histogram for the part of the packet stream that you need to encode the secret message. Will Eve be suspicious? CourseNana.COM
4.5 A Better Modulation Scheme
Alice and Bob decide to use the following modulation scheme. Let m, min, and max denote the median, min, and max of the inter-packet delay of the overt packet stream. If Alice needs to send a 0 she randomly generates a delay between m and min. If she want to send a 1 she randomly generates a delay between m and max. First, compute m, min, and max of the overt packet stream. Next, modify the code in 4.3, to generate the covert packet stream that contains the secret message. CourseNana.COM
4.6 Histogram of Inter-packet Delays
Plot the histogram of the inter-packet delays of the overt packet stream and that of the new covert packet stream. Again, plot the histogram of the part of the packet stream that you need to encode the secret message. Do you think Eve will be suspicious? CourseNana.COM
4.7 Answer the following questions.
1. How can you improve upon the method in 4.5?
2. We assumed the Alice will buffer up the packets and we mentioned that it was unrealistic. Why?
3. We have assumed that the network does not alter the inter-packet delays. What would be the problem if it did? Can you suggest methods to mitigate the effect of the changes of the inter-packet delay (noise)?
5. A Real Implementation
In this section, we make the implementation more realistics and address some engineering issues. CourseNana.COM
5.1 Overview
In the previous part, we assumed that Alice buffers as many packets she requires to transmit the secret message. For eaxmple, if the secret message is 32 bits then Alice will buffer 33 packets and release the packets with appropriate delays based on the encoding scheme to encode the secret message. If the secret message is small, this will work as Alice needs to buffer a small number of packets. However, this will not work if the secret message is long as this will require Alice to buffer large number of packets and this is unealistic (you have found and written good reasons in answering the related question in 4.7 above). CourseNana.COM
Alice follows the following strategy. Before starting to transmit the secret message she buffers i (0≥i≤B) packets and then starts to release the packets to transmit the secret message. In order to determine what should i be we need to discuss two system states that we need to worry about - buffer overflow and buffer underflow. Let's understand what these are, why these can occur and what are the factors that determine when they will occur. CourseNana.COM
- Buffer Overflow: This happens when the buffer already has B packets and another packet arrives from the application. Recall we have said that the number of packets in the buffer cannot exceed B. To build intuition as to when this happens let's consider a specific scenario. Suppose we have set B=10, m=32 and i=8. Since i=8, Alice will first buffer 8 packets. As soon as the 8th packet arrives from the application, she will start transmitting packets with inter-packet delays that encode the secret message. While she release the packet from the buffer, new packets may arrive from the application that will be appended to the buffer. So the number of packets in the buffer will keep changing - decrease when a packet is transmitted and increase when a packet is generated by the application. Suppose at some time there 7 packets in the buffer and before the next packet is to be transmitted 4 packets arrive in quick succession from the application. This will cause the number of packets in the buffer to go beyond 10 and that will be a violation of policy at most 10 packets can be buffered. Essentially, if packets arrive faster than they are transmitted out, there will be buffer overflow. If i is set close to B there is likely to be an overflow.
- Buffer Underflow: This happens when a packet must be transmitted (to encode a bit of the secret message) but there are no packets in the buffer. Recall the constraint that once Alice starts to transmit the secret message she cannot stop. Hence, if packets arrive slower than they are transmitted there is likely be buffer underflow. If i is set close to 0, there is likely to be an underflow.
5.2 Assumptions
We will consider that the source generates packet following well-known IPD distributions. Specifically, we will consider two cases a) Exponential and b) Uniform. The sender (Alice) also knows this distribution and follows it to inject the delay between the packets to embed the secret message. It is important to note that the source and the sender are independent. Hence, even though they follow the same distribution, the sequence of delays generated by the source will be different from the sequence of delays generated by the sender. CourseNana.COM
To embed a 0, the sender generates a delay between the minimum value (min) and the median. To embed a 1 the sender generates a delay between the median value and the maximum value (max). Note that for the Uniform distribution the min, max and median are easy to determine. For the Exponential distribution min is 0, the max is ∞. What is the median value of an Exponential distribution with rate parameter λ pkts/sec? CourseNana.COM
The secret message is a randomly generated sequence of 1s and 0s of size m bits and is given. We will consider two values m=16,32. CourseNana.COM
The sender has a buffer of size B and initially the sender buffers i packets before starting to send the secret message. CourseNana.COM
5.3 Project Steps
For buffer size B=20 we want to find out the probability of overflow and underflow, when the IPD follows the Exponential with λ=1 pkts/sec and i=2,6,10,14,18. Use message size m=16,32 bits. Tabulate the results. Remember that to determine the probability you need to run multiple (say 500) experiments for each parameter, i.e., for B=20,m=16,i=2 run 500 experiments and determine the probability of overflow and underflow. Similarly for other values of i and m. The max value of an Exponential distribution is ∞. For this study we can limit the max value to say 5 ~secs$. CourseNana.COM
Ignore this step For buffer size B=20 we want to find out the probability of overflow and underflow, when the IPD follows the Uniform distribution in the range (0,1) and i=2,6,10,14,18. Use message size m=16,32 bits. Tabulate the results. Remember that to determine the probability you need to run multiple (say 500) experiments for each parameter, i.e., for B=20,m=16,i=2 run 500 experiments and determine the probability of overflow and underflow. Similarly for other values of i and m. CourseNana.COM
- Propose methods to deal with buffer overflow and underflow.
5.4 Notes on Simulating the Implementation
For steps 1 and 2, since the source and the sender are independent processes, a proper way to simulate would be using a discrete event simulation module such as simpy in Python. However, we can simplify and just use standard Python. To do this, we can pre-generate the times when the source generates packets and store it in a list. Then we can write the code to simulate the buffer, the encoding scheme, and the sender. This can be done in a single "process." Based on this, following is a very rough set of steps to simulate the system. CourseNana.COM
For each experiment we can break it down to the following steps CourseNana.COM
Generate the random bit pattern of 1s and 0s of size m which is the secret message. CourseNana.COM
Generate a sequence of times when the source will generate the packets. This is based inter-packet delay (IPD) distribution of the packets generate by the source. You can intuit what is the worst case number of packets that you need. CourseNana.COM
For the buffer you need to keep some variables such as B: buffer size, i: the initial buffer size to start sending he secret message bits and CB: current buffer size. CourseNana.COM
- Do the experiment multiple times to calculate the different probabilities.
6 A Simple Analysis
In this section we will do a simple analysis of above approach using the Gambler Ruin's problem. Keep in mind that it is approxiumate. The idea to see how the implementation can be mapped into a well known problem. CourseNana.COM
6.1 Gambler's Ruin Problem
Two players A and B play a game which consists of a sequence of rounds; in each round they bet 1 dollar. If A wins the round, A gets 1 dollar from B, if he loses, A gives 1 dollar to B. The probability that A wins a round is p and hence the probability that A loses a round is q=1−p. Suppose A and B combined have N dollars of which A has i dollars and B N−i dollars and they continue to play until A has N dollars and wins the game (and hence B becomes bankrupt) or A has 0 dollars and loses the game (hence A becomes bankrupt). We want to find the probability Pi that A wins the game starting with i dollars. CourseNana.COM
The basic strategy is to condition on the first step and using the Law of Total Probability CourseNana.COM
Pi=pPi+1+qPi−1 1≤i≤N−1with P0=0 and PN=1. CourseNana.COM
Since p+q=1, we can rewrite the above equation as CourseNana.COM
(p+q)Pi=pPi+1+qPi−1 1≤i≤N−1pPi+1−pPi=qPi−qPi−1Pi+1−Pi=qp(Pi−Pi−1)We can show that: CourseNana.COM
Pi={1−(qp)i1−(qp)P1 if qp≠1iP1 if qp=1Considering qp≠1 and using the fact PN=1, we have CourseNana.COM
PN=1=1−(qp)N1−(qp)P1From which we obtain: CourseNana.COM
P1=1−(qp)1−(qp)NSimilarly, if qp=1, then PN=1=NP1 which implies that P1=1N and thus Pi=iN, now we have: CourseNana.COM
Pi={1−(qp)i1−(qp)N if qp≠1iN if qp=1Notes and Remarks
- We first show that p=q is a limiting case of p≠q. Let x=qp. We want to find the limit of Pi as x→1.
- Does the game go forever?
Consider the case with qp=1. P(A wins starting with i))=Pi=iN. Similarly, P(A loses starting with i)=P(B wins starting with N−i)=N−iN. Since the probability of A winning or B winning is equal to 1, then the probability of neither winning (game going forever) is 0. CourseNana.COM
Let i=N−i, i.e., i=N2 . Also let p=0.49 CourseNana.COM
3.1. N=20⇒Pi=0.4 CourseNana.COM
3.2. N=100⇒Pi=0.12 CourseNana.COM
3.3. N=200⇒Pi=0.02 CourseNana.COM
If p<q which implies x=qp>1, then CourseNana.COM
This implies that with probability 1, the Gambler will get "ruined" if she chooses to play the game for an "infinite" number of rounds. CourseNana.COM
6.2 A Simple Analysis
Coming back to our project, can you map the implementation that you did in Section 5 to the Gambler's ruin problem? Your goal is to create a table as in Section 5.6 based on the results that we have derived in the above section. There are many similarities but there are some limitations as well. CourseNana.COM
6.3 Model
Write a short paragraph how the implementation in Section 5 can be mapped to the Gambler's Ruin problem. What is the limitation of using the model? CourseNana.COM
6.5 Model Results
For the parameters given in Step 1 in Section 5.3 (i.e., for the case of the Exponential distribution) determine the overflow and underflow probabilities for different values of i using the Gambler's Ruin problem. CourseNana.COM
7 Detection
In this section we will investigate some simple appropaches to detect timimng channels. This is the task of Eve. CourseNana.COM
As mentioned in Section 4.2, we assume that Eve has a packet stream that is "clean" i.e., it is not modified by a timing channel. Eve can derive many features from this packet stream. One that we have focussed in Section 4.2 is the distribution of the inter-packet delay. (You may reflect on what other fetaures you could consider.) CourseNana.COM
Given the baseline inter-packet delay distribution, what Eve will do is the following CourseNana.COM
- Take a sample of the packet stream when Alice and Bob talk
- Extract the inter-packet delay distribution
- Compare with the baseline
- Flag if the distribution are different
There are many engineering issues with each of the above steps but here we will focus on Steps 3 and 4. The basic problem is the following. We have two sets of inter-packet delay samples. We want to know if they are from the same distribution. You can guess that this must be common problem and there must be many different approaches. Yes there are many methods. There are qualitative approcheas such as l comparing the boxplots, the cummulative distribution finctions, the histograms, the Q-Q plot, and the Kernel density functions. There are quantitative statistical test such the T-test, Mann–Whitney U Test, the Kolmogorov–Smirnov test, among others. For this project we will focus on the Q-Q plot. CourseNana.COM
7.1 Q-Q Plot
In Q-Q plot [Q stands for quantile], the quantiles of the two distributions are plotted against each other. If the distributions are the same, we should get a 45-degree line. CourseNana.COM
We will use two Python modules numpy and statsmodels. You must install these modules either using pip3 or conda. CourseNana.COM
The basic function is qqplot. It by default plots quantiles with respect to quantiles of the standard normal (Z) distributtion. CourseNana.COM
import numpy as np
import statsmodels.api as sm
import pylab as py
# Generate 100 samples from a standard normal (Z) distribution
sample_data1 = np.random.normal(0, 1, 100)
sm.qqplot(sample_data1, line ='45')
py.show()
[ I have not figured out how to resolve the warning. If someone does, please let me know]] CourseNana.COM
In the above code see what happens when you increase the number of samples to 1000 and then to 10000. CourseNana.COM
Next we generate sample from an Exponential distribution with rate λ=1 and get the Q-Q plot. CourseNana.COM
# Generate 100 samples from an exponential distribution with rate $\lambda =1 $
sample_data1 = np.random.exponential(1, 100)
sm.qqplot(sample_data1, line ='45')
py.show()
7.2 Q-Q Plot with 2 Samples
qqplot_2samples gives the Q-Q plots for the quantiles derived from two data sets. We will consider the size of both samples to be the same althougth it is not required. In following code see what happens as you increase the number of samples CourseNana.COM
# Generate 100 samples from an exponential distribution with rate $\lambda =1 $
sample_data1 = np.random.exponential(1, 100)
# Generate another 100 samples from the exponential distribution with rate $\lambda =1 $
sample_data2 = np.random.exponential(1, 100)
sm.qqplot_2samples(sample_data1, sample_data2, line ='45')
py.show()
7.3 A Simple Task
For this project, using the qqplot_2samples, get the Q-Q plot of the inter-packet delays generated in Section 4.5 with that of the baseline. Use only the inter-packet delays that contain the the secret message. Use the same number of inter-packet delays from the baseline traffic. CourseNana.COM
Include the code and that will generate the plot. CourseNana.COM