Audio Watermarking and Speech Enhancement
Using Kalman Filter
Project Report
Name of the Course: Digital Signal Processing
Course Code: BECE303L
Slot: F1
Submitted by:
SENTHIL KUMAR – 21BEC1228
KABHILAN.T-21BEC1296
ASHWANTH.M-21BEC1703
SIDDARTH.T-21BEC1783
Faculty In charge:
DHEEREN KU MAHAPATRA
VIT CHENNAI
ACKNOWLEDGEMENT
I would like to express our gratitude to our project guide, Prof.
Dheeran Ku Mahapatra. This project has definitely thrown light
into this field and opened a new horizon for us. The project could
further inspire juniors and all other students to step up the model
to another level. The project was also successful in enriching our
practical knowledge which is a lacking quality amongst present-
day students. We are making this project not only for marks but
to also increase our knowledge.
ABSTRACT:
In this project, we have realized the importance of watermarking, which
enables copyright protection and verification. We have also used a Kalman
filter in order to enhance the watermarked audio. Noise removal is very
important because noise corrupts the speech and causes severe difficulties
in various communication environments. Degradation of speech severely
affects the ability of a person, whether impaired or normal hearing, to
understand what the speaker is saying. Here we used the Kalman filter
which is an efficient recursive filter that estimates the internal state of a
linear dynamic system from a series of noisy measurements. So, summing
up, we first watermarked the audio signal and in order to enhance the
watermarked audio signal, we used a Kalman filter.
INTRODUCTION:
Watermarking is the process of embedding information into a signal (e.g.,
audio, video or pictures) in a way that is difficult to remove. If the signal is
copied, then the information is also carried in the copy. Watermarking has
become increasingly important to enable copyright protection and
ownership verification. The Kalman filter is a mathematical power tool that
is playing an increasingly important role in computer graphics as we
include sensing of the real world in our systems. The good news is you
don’t have to be a mathematical genius to understand the effective use of
Kalman filters. Speech enhancement has been a hot research area in recent
years with the fast development of multimedia communications and other
applications. The presence of background noise in speech significantly
reduces the intelligibility of speech. Noise reduction or speech
enhancement algorithms are used to suppress such background noise and
improve the perceptual quality and intelligibility of speech. Removing
various types of noise is difficult due to the random nature of the noise and
the inherent complexities of the speech. Noise reduction techniques usually
have a trade-off between the amount of noise removal and speech
distortions introduced due to the processing of the speech signal. Several
techniques have been proposed for this purpose in the area of speech
enhancement, like spectral subtraction approach, wiener filter, Kalman
filter, weighted filter.
COMPONENTS REQUIRED:
MATLAB
Audio Signal
Watermarking Systems:
A watermarking system is usually divided into three distinct steps:
1. Embedding
2. Attack
3. Detection
In embedding, an algorithm accepts the host and the data to be embedded
and produces a watermarked signal. The term attack arises from copyright
protection application, where third parties may attempt to remove the
digital watermark through modification during transmission or storage.
Detection (often called extraction) is an algorithm which is applied to the
attacked signal to attempt to extract the watermark from it. If the signal
was unmodified during transmission, then the watermark still is present,
and it may be extracted.
The performances of the techniques are judged with respect to the
robustness and imperceptibility (inaudibility) of audio watermarking.
Inaudibility means the watermarked audio and original audio must be
identical in nature to listen. Robustness means the resistance of the
watermark against removal or degradation. The watermark should survive
intentional attacks such as random cropping, noise addition, re-
quantization, resampling, compression, filtering and its removal should
degrade original audio.
Least Significant Bit Method:
Each 8-bit pixel’s least significant bit is overwritten with a bit from the
watermark. In a digital image, information can be inserted directly into
every bit of image information. This method is based on the pixel value’s
Least Significant Bit (LSB) modifications.
Advantages:
It is a simple method.
It can survive transformations like cropping, undesirable noise or
compression.
Security will be enhanced.
Disadvantages:
A more sophisticated attack that could simply set the LSBs of each
pixel to 1 can fully defeat the watermark with a negligible impact
on the cover object.
This way, the embedded watermark can be modified by the
attacker.
Insertion of watermark:
Algorithm:
● Loading Data:
1. Using audioread, a host of original signals is acquired. The
function samples the amplitudes of the signals at a rate of and returns them
as a one-dimensional array. These values are then converted to unsigned 8-
bit integers, after shifting and scaling to ensure no loss of data.
2. Using imread, the image to be watermarked onto the signal is
read.
3. The size (number of pixels) of the image is obtained.
● Comparing lengths of host and image:
1. If the number of samples in the host< the number of bits in the
image, replacing every image bit onto the host is not possible, hence
watermarking cannot be done.
2. Otherwise, prepare the host by converting decimal form to
binary from.
● Preparing watermark:
1. The pixel information from the image is also converted to
binary form.
● The bits of each pixel in the image are stored as a one-dimensional
array.
● The bit array is inserted into the host by replacing the LSB of each
sample with a bit from the image.
● ‘audiowrite’ command is used to create the watermarked audio file.
MATLAB Code:
clc
clear
close all
[host, f] = audioread([Link]');
dt=1/f;
t = 0:dt:(length(host)*dt)-dt;
subplot (1,2,1)
plot (t, host)
title ('Original Audio')
host = uint8(255*(host + 0.5));
wm = imread('[Link]');
[r, c] = size(wm);
wm_l = length(wm(:))*8;
if length(host) < (length(wm(:))*8)
disp('your image pixel is not enough')
else
host_bin = dec2bin(host, 8);
wm_bin = dec2bin(wm(:), 8);
wm_str = zeros(wm_l, 1);
for j = 1:8
for i = 1:length(wm(:))
ind = (j-1)*length(wm(:)) + i;
wm_str(ind, 1) = str2double(wm_bin(i, j));
end
end
for i = 1:wm_l host_
bin(i, 8) = dec2bin(wm_str(i));
end
host_new = bin2dec(host_bin);
host_new = (double(host_new)/255 - 0.5);
subplot(1,2,2)
plot(t,host_new)
title('Watermarked Audio')
audiowrite('host_new.wav', host_new, f)
soundsc(host_new, f);
end
Image and audio used:
● The chosen image is of PNG format. A grayscale of an image
indicates the intensity of colour in each pixel, from 0 being black to 225
being white.
● The sampling rate of the audio file is 8000. The audio is of 1
minute.
Output:
From the above output, it is clear that the watermarking is
imperceptible to the naked eye. So, unless the watermark can
pass undetected unless it is looked for specifically. Once the signal
is watermarked, we enhance it using a Kalman Filter.
Kalman filter:
Fig: Flowchart of Kalman Filter
Fig: Mechanism of Kalman Filter in speech Enhancement
Working:
The main aim of the work is speech enhancement using Kalman filters.
Initially, we have taken an audio signal and then watermarked it using the
Least Significant Bit method. The watermarked audio is then verified and is
enhanced by removing any noise using the Kalman filter. As speech is not
stationary for a long time, we took small frames of speech by windowing.
Here in this work, we observed the algorithm by taking different
windowing techniques, Rectangular and Hamming. We took each frame
length to be 240 samples. Now the segmented noisy speech is saved as a
matrix where each row consists of the value of each window, where each
window is of 240 samples looping and taking one window at a time.
MATLAB Code:
clc
clear
close all
wm_sz = 20000;
px_sz = wm_sz/8;
im_sz = sqrt(px_sz);
host_new = audioread ('host_new.wav');
host_new = uint8(255*(host_new + 0.5));
host_bin = dec2bin(host_new, 8);
wm_bin_str = host_bin(1:wm_sz, 8);
wm_bin = reshape(wm_bin_str, px_sz , 8);
wm_str = zeros(px_sz, 1, 'uint8');
for i = 1:(px_sz)
wm_str(i, :) = bin2dec(wm_bin(i, :));
end
wm = reshape(wm_str, im_sz , im_sz);
figure (1);
imshow(wm)
[x fs] =audioread('host_new.wav');
orig=x;
no=orig;
N=length(x); % length of the input signal
F = zeros (5, N); % initialization of standard transition matrix
I = eye (5); % transition matrix
H = zeros (5, N);
sig = zeros (5, 5*N); % priori or posteri covariance matrix. K = zeros (5, N);
% kalman gain. XX = zeros (5, N); %
kalman coefficient for yy. y = zeros (1, N); % requiring signal (desired
signal)
XX = zeros (5, N); % kalman coefficient for yy
vv = zeros (1, N); % predicted state error vector
yy = zeros (1, N); % Estimated error sequence
Q = 0.0001*eye (5, 5); % Process Noise Covariance. R = 0.1; %
Measurement Noise Covariance
R = 0.1;
y=x (1: N); % y is the output signal produced. sig (1:5, 1:5) = 0.1*I;
for k=6: N
F(1:5,k)=-[y(k-1);y(k-2);y(k-3);y(k-4);y(k-5)];
H(1:5,k)=-[yy(k-1);yy(k-2);yy(k-3);yy(k-4);yy(k-5)];
K(1:5,k)=sig(1:5,5*k-29:5*k-25)*F(1:5,k)*inv(F(1:5,k)'*sig(1:5,5*k-29:5*k-
25)*F(1:5,k)+R);
%Kalman Gain
sig(1:5,5*k-24:5*k-20)=sig(1:5,5*k-29:5*k-25)-sig(1:5,5*k-29:5*k-
25)*F(1:5,k)*inv(F(1:5,k)'*sig(1:5,5*k-29:5*k-25)*F(1:5,k)
+R)*(F(1:5,k)'*sig(1:5,5*k-29:5*k- 25))+Q; % error covariance matrix
XX(1:5,k) =(I - K(1:5,k)*F(1:5,k)')*XX(1:5,k-1) + (K(1:5,k)*y(k)); %
posteriori value of estimateX(k)
orig (k) =y (k)-(F (1:3, k)'*XX (1:3, k)); % estimated speech signal
yy (k) = (H (1:5, k)'*XX (1:5, k)) + orig (k); % no. of coefficients per
iteration
end;
tt = [Link] length(x);
figure (2);
subplot (311);
plot(x);
title ('ORIGINAL SIGNAL');
subplot (313);
plot (orig);
title ('Enhanced Speech Signal');
figure (3);
plot (tt, x, tt, orig);
title ('Combined plot');
legend ('original','estimated');
audiowrite('host_new.wav', orig, fs)
soundsc(orig, fs);
Output of kalman filter:
Conclusion:
For watermarking, we used the LSB technique, because it has little effect on
the signal, and is a very simple method. It can easily survive
transformations like cropping, undesirable noise or compression. For
speech enhancement, we have used a Kalman filter, because it is time-
domain in nature. In our daily lives, the signals are not stationary and will
vary randomly. Hence, Kalman filter is suitable for both Stationary and non-
stationary signal.
Kalman filter - Wikipedia
[Link] ([Link])
Audio watermark - Wikipedia
Speech Enhancement Using Fast Adaptive Kalman Filtering Algorithm Along With
Weighting Filter ([Link])
Visible and Invisible Image Watermarking – IJERT
[Link]
Digital watermarking - Wikipedia
A survey: Digital audio watermarking techniques and applications | IEEE Conference
Publication | IEEE Xplore
thesis_KF.pdf ([Link])