Linear Systems Tutorial
authored by David Knill (1/17/98)
A linear
system is characterized by a linear transformation or operator, T(.) that maps
an input function, f(x), to an output function, g(x):
g(x) = T{f(x)},
In order to be linear, the transformation T(.) must
satisfy the principle of superposition. Superposition has two parts: additivity and homogeneity.
A transformation is additive if the transformation of the
sum of two functions is equal to the sum of the transformation of the
individual functions.
T{f1(x) + f2(x)} =
T{f1(x)} + T{f2(x)}
A transformation is homogeneous if the transformation of
a scaled version of a function is equal to the scaled transformation of the
function
T{a f(x)} = a T{f(x)}
Linear systems may be continuous or discrete; that is
they may have continuous functions as inputs and outputs (e.g. electrical
circuits), or they may have discrete
functions as inputs and outputs. Discrete functions are functions which
may be represented by sequences of
numbers (possibly infinite).
When modeling continuous systems, one is typically forced
to approximate them by discrete systems, in order to represent them in a
computer system. Moreover, since
no computer can represent an infinite sequence of numbers (with the
exception of periodic sequences), we are also forced to
approximate systems as being finite, discrete systems.
For such systems, the inputs and outputs can be represented by vectors. The transformation
computed by any linear discrete system can be represented by a matrix; that is,
if f is an input vector and g is the vector output of the system,
g
= Af,
where A is known as the system matrix.
Knowing that a system is linear is a great help in
building a compact model of the system which we can use to predict the output
of the system to any input - All we need do is specify the system matrix.
In this tutorial, we will use the specific example of
image formation in an optical system as an example through which to learn the
basics of linear systems. We will
begin by stepping through the example experiment described in Chapter 2
of Wandell - measuring the line spread function of the human eye.
From the point of view of linear systems analysis, we can
think of the goal of the experiment described in Wandell's book to be to
determine the system matrix for the human eye - a matrix which, when applied to
a discrete representation of an
input image gives as output a discrete representation of the pattern of light
which appears on the retina. To see how to do this quickly, note that if we
apply a matrix to the vector [1,0,0,...,0], we obtain the first column of the
matrix. This is shown in the following example (imagine that A is some unknown
system matrix)
A = [.1,0,0;.2,.1,0;.5,.2,.2;.3,.5,.5;0,.1,.3;0,0,0];
f1 = [1,0,0]';
A*f1
Note that the result is the first column of A.
Exercise 1
Find the
second and third columns of A using a similar method
Considering only one-dimensional images for the moment
(e.g. patterns of vertical lines), we can understand how images are formed at
the retina by measuring the pattern of light appearing at the retina in
response to each of a set of vertical lines of unit intensity placed at
different horizontal positions on a computer screen. Each of these lines is
represented by a vector, [0,0,...0,1,0,...,0.0], where the position of the 1 in
the vector corresponds to the position of the line.
Let's create an example system matrix for the human eye
A = line_spread (256);
A is a 256 x 256 matrix specifying the one-dimensional
retinal image of any one-dimnensional vector (represented by a 256-element
vector). Now lets create a test image for our model eye. We'll make an image of
a thin line
x = zeros(256,1);
x(128) = 1;
plot(x);
Applying the system matrix A to our test image we obtain
the image of the line
y = A*x;
plot (y);
Notice that our model eye has blurred the image of the
line.
Exercise 2
Which column
of the system matrix have we just estimated? Test your answer by plotting both
the chosen column of A and y. They should be the same.
To measure the acuity of the eye, we might try seeing how
far apart two lines have to be before their image can be distinguished from
that of a single bar (of unknown width). For example, let's make an image of a
pair of lines 2 units apart (for the simulation, 1 unit corresponds to 1/25 th
of an arc-minute).
x = zeros(256,1);
x(128) = 1;
x(130) = 1;
y = A*x;
plot (y);
We can look at these as real images
x = zeros(256,1);
x(128) = 1;
y = A*x;
test_im1 = x(:, ones(1, 256));
test_im2 = y(:, ones(1, 256));
test_im1 = test_im1*max(size(gray)) / (max(test_im1(:)));
test_im2 = test_im2*max(size(gray)) / (max(test_im2(:)));
image(test_im1'), colormap(gray);
figure(2)
image(test_im2'), colormap(gray);
(Make sure you understand the image manipulation commands
in
this block of code).
Exercise 3
a. Find out
how far apart the bars have to be in order for
you to be
able to tell that the image contains two lines rather than one thick line.
b. Now,
estimate two other columns of the system matrix (don't just print the columns).
Notice that the columns look like shifted versions of
each other. That is because the system is "shift-invariant".
Shift-invariant linear systems are an important class of linear systems,
because they have several simplifying properties. A shift-invariant linear
system has the property that simple shifts in the input map to simple shifts of
the
outputs. This allows us to describe the response of the
system by its response to a single stimulus. In our case, the logical candidate
for such a stimulus is a line in the middle of the image. This stimulus is
represented
by the vector [0,0,...,0,1,0,...,0,0]. Because of the
importance of this sequence in linear systems analysis, it is given a special
name - the impulse function. The response of the system to this input is called
the impulse response of the system. For shift-invariant linear systems, the
impulse response completely defines the system.
Note that we are implicitly treating the middle of the
vector (the 129th element) as center of our coordinate system (x = 0). We will use
this convention throughout.
CONVOLUTION
Rather than constructing a system matrix to predict the
output of a shift-invariant system, we can use an operation known as
convolution to do so. This has many advantages - first, it is usually
faster, second, the mechanics of
calculating a convolution more directly mirrors what the system is physically
doing, and third, convolution has a number of very useful mathematical properties
(which we will come to soon). The first step in introducing convolution is to
note that the system matrix we just defined is symmetric; that is, A' = A.
Exercise 4
Check this
assertion.
In fact, this is only true for impulse response functions
which are symmetric. We'll take up the case of asymmetric impulse response functions
later. To find the value of the i'th element of the output vector, one has to
compute the inner product of the i'th row of the system matrix with the input
vector (this is simply how matriox multiplication works. Since the system
matrix is symmetric, the i'th row is equal to the i'th column. Each column, however,
is simply a shifted version of the impulse response; therefore, each row is a
shifted version of the impulse response, and we can
compute the value of the i'th element of the output
vector by taking the inner product of the input vector with the
appropriately shifted version of
the impulse response function.
This just seems like a bit of algebraic manipulation;
however, notice that we have just described a simple operation which allows one
to predict the value of the output at any given point as the weighted sum of
the input
sequence. Happily, the weights to use for the sum are
given directly by the impulse response of the system. Moreover, this particular
way of describing an image formation system has more intuitive appeal then the
abstract system matrix formulation - it's logical to think of the luminance at
each point in the retinal image resulting from blurring of
the image; that is, the light from neighboring points in
a test image "spread" to the point, with the influence of neighboring
points being given by the weights assigned by the impulse response function.
In order to compute the output of a shift-invariant
linear system, one need merely "slide" the impulse response over the
input, taking the inner-product as you go, to compute the output. This is
demonstrated in the following demo. The demo simulates the convolution of a
simple one-dimensional image made up of discrete steps in luminance with a
gaussian impulse response function
gdemo3
In the following demo, we convolve a noise image with a
gaussian impulse response function
gdemo2
Generally, we use the term "filter" to describe
a shift-invariant linear system; that is, we say that the system filters the
input, or applies a filter to the input. The filter is characterized by it's
kernel, which, in each
of the cases just presented, is given by the impulse
response function.
At this point, we should go over a couple of important
technical details about shift-invariant linear systems and convolution.
First, what
happens if the impulse response function is not symmetric?
Let's create the system matrix for such a filter
A = line_spread_asymm (256);
If we plot the 129th row and the 129th column on the same
plot, we see that they are not the same (A != A').
plot(1:256, A(129,:), 1:256, A(:,129));
Exercise 5
Check that each column of A is a shifted version of the
same
function. Check that each row is also a shifted version
of one function.
The system represented by A is therefore shift-invariant,
but is not symmetric. In this case, the output of the system can still be
computed by convolution with a kernel, but the kernel of the filter represented
by A is related to the impulse response function by f(x) = I(-x), where I(x) is
the impulse response function and f(x) is the kernel of
the filter. This is the proper general relationship between a filter's kernel
and its impulse response function. For symmetric filters, I(-x) = I(x), so f(x)
= I(-x) = I(x).
The next thing to note about convolution is that it is
technically defined as a summation (for continuous signals, it is an integral)
from-infinity to infinity. Of course, we are using finite-length signals, as we
always must. This is not a big problem for representing the filter kernel,
which typically goes to zero fairly quickly as one moves away
from x=0. Such filters are spatially localized, a fact
which generally allows us to represent the kernel very compactly. For example, for
a Gaussian kernel with a standard deviation of 4, we can effectively represent
the kernel with a vector of length 24.
std_dev = 4;
mean = 0;
x = -12:11;
filt = gauss1(x, std_dev, mean);
plot(filt);
The limitation of being able to represent only finite
length vectors is a problem for the input signal. Essentially, we are always
stuck with the situation of being able to represent only a segment of the input
signal. How then, are we to calculate the output of the filter near the
boundaries of the finite-length signal? In the matrix representation of
convolution which we have used, we simply truncated the filter. When shifts of
the kernel bring non-zero values of the kernel across the boundaries of the
matrix, we simply truncated the function. In the demos, we do something
different. We pad the signals with zeros on either side - we must pad the
signal by half the size of the filter kernel on either side - and then slide
the kernel across the padded signal. This leads to "edge effects" in
the filtered output, caused by the hard edges imposed on the signal. When studying
the effects of a filter in a simulation like this, one must be careful to look
only at the output away from the edges.
Try the following example. It reads in an image, samples
a line from the image to create a one-dimensional signal, then convolves the
signal with the gaussian filter just created.
im = imread('morning.bmp');
im = double(im);
figure;
image(im);
imgline(im);
sig = getline(gcf);
figure;
convdemo(sig,filt);
This demo uses the function 'conv' to do the convolution.
Try the following
y=conv(sig,filt);
plot(y);
size(y), size(sig), size(filt)
To extract the filtered signal sample, extract the
sub-vector
y2=y(length(filt)/2+1:length(y)-length(filt)/2+1);
Now plot the original signal along with the filtered
signal
plot(1:240,y2,1:240,sig);
Notice the edge effects at the boundaries of the signal,
due to blurring with the padded zeros
One way to do the convolution would be to create a system
matrix based on the filter's kernel. Since most of the
entries in the matrix would be zero, however, this is a very
inefficient way to perform a convolution. Rather, for
each
point in the output vector, one can compute a weighted
sum
of the filter's kernel (in the case above, only 24
points) and the equivalently-sized neighborhood of the corresponding point in
the input vector.
Assignment:
Write
your own 1-dimensional convolution routine. Call
it myconv1D (filter, signal). It should take two vectors
as
arguments. The output y = myconv1D (filter, signal)
should
be an N dimensional vector (where N is the dimensionality
of the signal) computed by convolving the filter with a
zero-padded input vector.
You can test your function both by comparing its output
to that of the function 'conv' and by running 'convdemo' on it. To run
convdemo, use the call
convdemo (signal, filter, 'myconv1D');