Adaptive Signal Processing and Machine Intelligence Coursework Prof. Danilo P. Mandic
Contents Guidelines
Classical and Modern Spectrum Estimation 1.1 Properties of Power Spectral Density (PSD) . . . . . . . . . . . . . . 1.2 Periodogram-based Methods Applied to Real–World Data . . . . . . 1.3 Correlation Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Spectrum of Autoregressive Processes . . . . . . . . . . . . . . . . . 1.5 Real World Signals: Respiratory Sinus Arrhythmia from RR-Intervals 1.6 Robust Regression . . . . . . . . . . . . . . . . . . . . . . . . . . .
Guidelines
The coursework comprises four assignments, whose individual scores yield 80% of the final mark. The remaining 20% accounts for presentation and organisation. Students are allowed to discuss the coursework but must code their own MATLAB scripts, produce their own figures and tables, and provide their own discussion of the coursework assignments.
General directions and notation:
◦ The simulations should be coded in MATLAB, a de facto standard in the implementation and validation of signal processing algorithms. ◦ The report should be clear, well-presented, and should include the answers to the assignments in a chronological order and with appropriate labelling. Students are encouraged to submit through Blackboard (in PDF format only), although a hardcopy submission at the undergraduate office will also be accepted. ◦ The report should document the results and the analysis in the assignments, in the form of figures (plots), tables, and equations, and not by listing MATLAB code as a proof of correct implementation. ◦ The students should use the following notation: boldface lowercase letters (e.g. x) for vectors, lowercase letters with a (time) argument (x(n)) for scalar realisations of random variables and for elements of a vector, and uppercase letters (X) for random variables. Column vectors will be assumed unless otherwise stated, that is, x ∈ RN ×1 . ◦ In this Coursework, the typewriter font, e.g. mean, is used for MATLAB functions.
Presentation:
◦ The length limit for the report is 42 pages. This corresponds to ten pages per assignment in addition to one page for front cover and one page for the table of contents, however, there are no page restrictions per assignment but only for the full-report (42 pages). ◦ The final mark also considers the presentation of the report, this includes: legible and correct figures, tables, and captions, appropriate titles, table of contents, and front cover with student information. ◦ The figures and code snippets (only if necessary) included in the report must be carefully chosen, for clarity and to meet the page limit. ◦ Do not insert unnecessary MATLAB code or the statements of the assignment questions in the report. ◦ For figures, (i) decide which type of plot is the most appropriate for each signal (e.g. solid line, non-connected points, stems), (ii) export figures in a correct format: without grey borders and with legible captions and lines, and (iii) avoid the use of screenshots when providing plots and data, use figures and tables instead. ◦ Avoid terms like good estimate, is (very) close, somewhat similar, etc - use formal language and quantify your statements (e.g. in dB, seconds, samples, etc). ◦ Note that you should submit two files to Blackboard: the report in PDF format and all the MATLAB code files compressed in a ZIP/RAR format. Name the MATLAB script files according to the part they correspond to (e.g. SEASP_Part_X-Y-Z.m).
Classical and Modern Spectrum Estimation
Background. For a discrete time deterministic sequence {x(n)}, with finite energy n=−∞ |x(n)|2 < ∞, the Discrete Time Fourier Transform (DTFT) is defined as ∞X X(ω) =x(n)e−ωn(DTFT). (1)
n=−∞ We often use the symbol X(ω) to replace the more cumbersome X(eω ). The corresponding inverse DTFT is given by Z π1X(ω)eωn dω(inverse DTFT). (2) x(n) = 2π −π This can be verified by substituting (2) into (1). The energy spectral density is then defined as S(ω) = |X(ω)|2
(Energy Spectral Density).
(3)
A straightforward calculation gives
(4) R∞ In the process, we have used the equality −∞ e ω(n−m) dω = δn,m (the Kronecker delta). Equation (4) can be now restated as Z π ∞ X 1 S(ω) (Parseval0 s theorem). (5) |x(n)|2 = 2π −π n=−∞\ For random sequences we cannot guarantee finite energy for every realisation (and hence no DTFT). However, a random signal usually has a finite average power, and can therefore be characterised by average power spectral density (PSD). We assume zero mean data, E{x(n)} = 0, so that the autocovariance function (ACF) of a random signal x(n) is defined as r(k) = E{x(k)x∗ (k − m)}
(Autocovariance function ACF).
(6)
The Power Spectral Density (PSD) is defined as the DTFT of the ACF in the following way P (ω) = ∞ X r(k)e−ωk
Definition 1 of Power Spectral Density.
(7)
k=−∞ The inverse DTFT of P (ω) is given by r(k) =h RiP∞π1(k−l)ωr(l)dω= r(k). l=−∞2π −π12πRπ−π P (ω)ekω dω, and it is readily verified that
Observe that r(0) =12πZπ
Since from (6) r(0) = E{|x(n)|2 } measures the (average) signal power, the name PSD for P (ω) is fully justified, as from (8) it represents the distribution of the (average) signal power over frequencies. The second definition of PSD is given by
Definition 2 of Power Spectral Density. (9)
1.1
Properties of Power Spectral Density (PSD)
Approximation in the definition of PSD. Show analytically and through simulations that the definition of PSD in (7) is equivalent to that in (9) under a mild assumption that the covariance sequence r(k) decays rapidly, that is [?] 1
(10)
k=−(N −1)
Provide a simulation for the case when this equivalence does not hold. Explain the reasons.
1.2 Periodogram-based Methods Applied to Real–World Data
Now consider two real–world datasets: a) The sunspot time series1 and b) an electroencephalogram (EEG) experiment. a) Apply one periodogram-based spectral estimation technique (possibly after some preprocessing) to the sunspot time series. Explain what aspect of the spectral estimate changes when the mean and trend from the data are removed (use the MATLAB commands mean and detrend). Explain how the perception of the periodicities in the data changes when the data is transformed by first applying the logarithm to each data sample and then subtracting the sample mean from this logarithmic data.
The basis for brain computer interface (BCI). b) The electroencephalogram (EEG) signal was recorded from an electrode located at the posterior/occipital (POz) region of the head. The subject observed a flashing visual stimulus (flashing at a fixed rate of X Hz, where X is some integer value in the range [11, . . . , 20]). This induced a response in the EEG, known as the steady state visual evoked potential (SSVEP), at the same frequency. Spectral analysis is required to determine the value of ‘X’. The recording is contained in the EEG_Data_Assignment1.mat file2 which contains the following elements: ◦ POz – Vector containing the EEG samples (expressed in Volts) obtained from the POz location on the scalp, ◦ fs – Scalar denoting the sampling frequency (1200 Hz in this case). Read the readme_Assignment1.txt file for more information. Apply the standard periodogram approach to the entire recording, as well as the averaged periodogram with different window lengths (10 s, 5 s, 1 s) to the EEG data. Can you identify the the peaks in the spectrum corresponding to SSVEP? There should be a peak at the same frequency as the frequency of the flashing stimulus (integer X in the range [11, . . . , 20]), known as the fundamental frequency response peak, and at some integer multiples of this value, known as the harmonics of the response. It is important to note that the subject was tired during the recording which induced a strong response within 8-10 Hz (so called alpha-rhythm), this is not the SSVEP. Also note that a power-line interference was induced in the recording apparatus at 50 Hz, and this too is not the SSVEP. To enable a fair comparison across all spectral analysis approaches, you should keep the number of frequency bins the same. Hint: It is recommended to have 5 DFT samples per Hz. How does the standard periodogram approach compare with the averaged periodogram of window length 10 s? Hint: Observe how straightforward it is to distinguish the estimated SSVEP peaks from other spurious EEG activity in the surrounding spectrum. In the case of averaged periodogram, what is the effect of making the window size very small, e.g. 1 s? 1 Included in MATLAB, use load sunspot.dat the EEG recording from http://www.commsp.ee.ic.ac.uk/~mandic/EEG_Data.zip
Download
Correlation Estimation
Unbiased correlation estimation and preservation of non-negative spectra. Recall that the correlation-based definition of the PSD leads to the so-called correlogram spectral estimator given by
where the estimated autocorrelation function rˆ(k) can be computed using the biased or unbiased estimators given by
n=k+1
Although it may seem that the unbiased estimate is more appropriate as its mean matches the true mean of PSD, observe that this estimate (despite being exact) can be highly erratic for larger lags k (close to N ), where fewer samples are available to estimate the PSD. As a consequence, the ACF may not be positive definite, resulting in negative PSD values. a) Write a MATLAB script which calculates both biased and unbiased ACF estimates of a signal and then use these ACF estimates to compute the corresponding correlogram in Eq. (11). Validate your code for different signals e.g. WGN, noisy sinusoidal signals and filtered WGN. Explain how the spectral estimates based on (12)-(13) differ from one another? In particular, how does the correlogram corresponding to the unbiased ACF estimates behave for large lags (i.e. k close to N )? Does the unbiased ACF estimate result in negative values for the estimated PSD?
[10]
Plotting the PSD in dB. Depending on the estimation approach, the spectral estimate Pˆ (ω) can be asymptotically unbiased with variance µP 2 (ω), where µ > 0 is a constant. When several realisations of a random signal are available, it is possible to present the estimate PSD as a confidence interval defined by Pˆ (ω) ± µˆ σP (ω) , where Pˆ (ω) and σ ˆP (ω) are respectively the mean and standard deviation of the estimated PSDs of the available observations. A drawback of this approach is that, as stated earlier, the standard deviation is proportional to the value of the PSD and therefore the confidence interval widens in zones where the PSD increases, and it is these parts that we are particularly interested in. Fig. 1 shows an overlay plot of 100 realisations of the PSD of two sinusoids immersed in i.i.d. WGN showing the mean (top), and the standard deviation of the set (bottom). For ease of presentation, by plotting the PSD estimates in decibels we observe a more condensed realisation due to the contraction property of the logarithm. b) Use your code from the previous section (only the biased ACF estimator) to generate the PSD estimate of several realisations of a random process and plot them as in Fig. 1. Generate different signals composed of sinusoids corrupted by noise and elaborate on how disperse are the different realisation of the spectral estimate. Hint: use the fft and fftshift commands in MATLAB.
c) Plot your estimates in dB, together with their associated standard deviation (again as in Fig. 1 for comparison). How much spread out are the estimates now? Comment on the benefits of this representation.
Frequency estimation by MUSIC. In order to accurately estimate the spectrum of closely-spaced sine waves using the periodogram, a large number of samples N is required since the frequency resolution of the periodogram is proportionate to 1/N . On the other hand, subspace methods assume a harmonic model consisting of a sum of sine waves, possibly complex, in additive noise. In this setting, the noise is also complex-valued.
For illustration, consider a complex-valued signal of 30 samples in length, generated using the following code: n = 0:30; noise = 0.2/sqrt(2)(randn(size(n))+1jrandn(size(n))); x = exp(1j2pi0.3n)+exp(1j2pi0.32n)+ noise; The signal consists of two complex exponentials (sine waves) with frequencies of 0.3 Hz and 0.32 Hz and additive complex white Gaussian noise. The noise has zero mean and variance of 0.2. The spectral estimate using the periodogram (rectangular window, 128 frequency bins and unit sampling rate) is shown in Fig. 2. Observe that the periodogram was not able to identify the two lines in the spectrum; this is due to the resolution of the periodogram being proportionate to 1/N , which is greater than the separation between the two frequencies. 6
PSD estimates (diff erent realisations and mean)
Fr equency [π r adians ]
Standard dev iation of the PSD estimate
Fr equency [π r adians ]
Figure 1: PSD estimates of two sinusoids immersed in noise. Top: An overlay plot of 100 realisations and their mean. Bottom: Standard deviation of the 100 estimates. Per iodogr am Power Spectr al Dens ity Es timate 15
Power /fr equency (dB/Hz)
Figure 2: Periodogram of two complex exponentials with closely-spaced frequencies. d) Familiarise yourself with the generation of complex exponential signals, and generate signals of different frequencies and length. Verify that by considering more data samples the periodogram starts showing the correct line spectra.
e) Use the following code to find the desired line spectra using the MUSIC method. [X,R] = corrmtx(x,14,’mod’); [S,F] = pmusic(R,2,[ ],1,’corr’); plot(F,S,’linewidth’,2); set(gca,’xlim’,[0.25 0.40]); grid on; xlabel(’Hz’); ylabel(’Pseudospectrum’); Explain the operation of the first three lines in the code using the MATLAB documentation and the lecture notes. What is the meaning of the input arguments for the functions corrmtx and pmusic? Does the spectrum estimated using the MUSIC algorithm provide more detailed information? State briefly the advantages and disadvantages of the periodogram and the MUSIC algorithms and comment on the bias and variance. How accurate would a general spectrum estimate be when using MUSIC? Spectrum of Autoregressive Processes
In many spectrum estimation applications, only short data lengths are available; thus, classical spectrum estimation techniques based on the Fourier transform will not be able to resolve frequency elements spaced close to one another. In order to solve this problem, we can use modern spectrum estimation methods based on the pole-zero modelling of the data. Consider a general ARMA(p, q) process given by y(n) = a1 y(n − 1) + · · · + ap y(n − p) + w(n) + b1 w(n − 1) + · · · + bq w(n − q) The power spectrum of y has the form Pq | k=1 bk e−jkω |2 Pp Py (e ) = |1 − k=1 ak e−jkω |2 jω
Thus, the power spectrum can be estimated through the parameters (a1 , ..., ap , b1 , .., bq ). The assumption of an underlying model for the data is the key difference between classical and modern spectrum estimation methods. For an AR process in particular, the power spectrum is the output of an all-pole filter given by The parameters σw and a = a1 . . . ap can be estimated by a set of (p + 1) linear equations
ap rx (p) rx (p − 1) . . . rx (0) where rx (k) could be calculated using the biased autocorrelation estimate 1 rx (k) = N
NX −1−k x(n + k)x(n) n=0
or the unbiased autocorrelation estimate rx (k) =
NX −1−k 1 x(n + k)x(n) N − k n=0
a) Based on your answers in Section 2.1, elaborate on the shortcomings of using the unbiased ACF estimate when finding the AR parameters?
b) Generate 1000 samples of data in MATLAB, according to the following equation
x(n) = 2.76x(n − 1) − 3.81x(n − 2) + 2.65x(n − 3) − 0.92x(n − 4) + w(n) where w ∼ N (0, 1) and discard the first 500 samples (x=x(500:end)) to remove the transient output of the filter. Estimate the power spectrum density of the signal using model orders p = 2, ..., 14 and comment on the effects of increasing the order of the (assumed) underlying model by comparing the estimation to the true Power Spectral Density. Only plot the results of the model orders which produced the best results. c) Repeat the experiment in b) for data length of 10, 000 samples. What happens to the PSD when the chosen model order is lower (under-modelling) or higher (over-modelling) than the correct AR(4) model order?
1.5 Real World Signals: Respiratory Sinus Arrhythmia from RR-Intervals
Important change Section 1.5 of the CW: • If you have taken Adaptive Signal Processing last year, then you already have your own ECG data from the wrists, and please proceed with this Assignment; • If you do not have your own ECG recordings, then we will provide the data. We will an email to the class with the data and explanations.
Respiratory sinus arrhythmia (RSA) refers to the modulation of cardiac function by respiratory effort. This can be readily observed by the speeding up of heart rate during inspiration (“breathing in”) and the slowing down of heart rate during expiration (“breathing out”). The strength of RSA in an individual can be used to assess cardiovascular health. Breathing at regular rates will highlight the presence of RSA in the cardiac (ECG) data. a) Apply the standard periodogram as well as the averaged periodogram with different window lengths (e.g. 50 s, 150 s ) to obtain the power spectral density of the RRI data. Plot the PSDs of the RRI data obtained from the three trials separately. b) Explain the differences between the PSD estimates of the RRI data from the three trials? Can you identify the peaks in the spectrum corresponding to frequencies of respiration for the three experiments? c) Plot the AR spectrum estimate for the RRI signals for the three trials3 . To find the optimal AR model order, experiment with your model order until you observe a peak in the spectrum (approximately) corresponding to the theoretical respiration rate. List the differences you observe between your estimated AR spectrum and the periodogram estimate in Part a).
1.6 Robust Regression
Load the file4 PCAPCR.mat which includes the data matrices X ∈ RN ×dx and Y ∈ RN ×dy , described below. Training Data Note Input variables, some of which are collinear. Each column represents N measurements of an input variable. Noise corrupted input matrix Xnoise = X + NX , where the elements of NX were drawn from a zero-mean Gaussian distribution. Output variables obtained from Y = XB + NY where the coefficient matrix B is unknown. Each column in Y represents N measurements of an output variable. The elements of NY were drawn from a zero-mean Gaussian distribution.
Signal Subspace
Noise Subspace
Figure 3: Principle of PCA: Illustration of the signal and noise subspaces. Using the Matlab command svd, obtain S the singular value decomposition for the matrices X and Xnoise . a) Plot the singular values of X and Xnoise (hint: use the stem command), and identify the rank of the input data X. Plot the square error between each singular value of X and Xnoise . Explain the effect of noise on the singular values, and state at what point would it become hard to identify the rank of the matrix Xnoise .
[5]
b) Using only the r most significant principal components (as determined by the identified rank), create a low-rank ˜ noise . Compare the difference (error) between the variables (columns) of the approximation of Xnoise , denoted by X ˜ noise . noiseless input matrix, X, and those in the noise corrupted matrix Xnoise and denoised matrix X
[5]
The output data are obtained as Y = XB + NY . The ordinary least squares (OLS) estimate for the unknown regression matrix, B, is then given ˆ OLS = (XT X)−1 XT Y B