\documentclass[11pt]{article}
\usepackage{fullpage}
\begin{document}
\thispagestyle{empty}
\begin{center}
\Huge\bf
Homework \#\,\,1\\
\Large\bf
15-486/782: Artificial Neural Networks\\
Dave Touretzky, Fall 2006\\[.2in]
\end{center}
\begin{itemize}
\item Due September 6, 2006.
\item Read HK\&P sections 5.1 through 5.4 first.
\item Software you need is in \verb+/afs/cs/academic/class/15782-f06/matlab/perceptron+
\item Answers must be typed. Handwritten answers will not be accepted.
\end{itemize}
\section*{Problems}
\begin{enumerate}
\item
Suppose you want to train a two-input perceptron on the following
classification problem:
$$ Patterns = \left[\begin{array}{ccc}
2 & 1 & 3 \\ 6 & 3 & 9 \end{array}\right]
\quad\quad\quad\quad\quad\quad
Desired = \left[\begin{array}{ccc} 0 & 1 & 1 \end{array}\right] $$
Prove that the perceptron cannot learn this task, using inequalities
expressed in terms of the weights $w_0$, $w_1$, and $w_2$.
\item The proof of the perceptron convergence theorem states that
if $\hat{w}$ is a weight vector that correctly classifies all
patterns, and $\overline{w}^{(\tau)}$ is the weight vector at step
$\tau$ of the algorithm, then $\hat{w}\cdot\overline{w}^{(\tau)}$
increases at each step. Modify the \verb+perceptron+ program to
demonstrate this by displaying the value of this dot product at each
step. Turn in your source code and a sample run.
Note: in order to do this you will need to know the correct weight
vector at the start of the run. You can calculate this vector
directly from the slope and y-intercept in the perceptron demo.
\item Run the \verb+bowl+ demo with learning rates of 0.1, 0.142, and 0.15.
Hand in a printout of each run. What can you say about the model's
behavior at each learning rate?
\item Consider the function $f(x,y) = \exp \left(-(0.6y-0.7)^2-(x-0.4)^2\right)$.
The following code will graph $f$ for you:
\begin{quotation}\tt\noindent
[x,y] = meshgrid(0 : 0.1 : 1);\\
z = exp(-(0.6*y-0.7).\verb_^_2-(x-0.4).\verb_^_2);\\
surf(x,y,z)\\
box on, rotate3d on
\end{quotation}
Problem: (a) Train an LMS network to approximate $f(x,y)$ over the
unit square ($0\leq x\leq 1$, $0\leq y\leq 1$). You may use the code
in \verb+lms3d.m+ to get started, if you wish. (b) What is the shape
of your approximation function? (c) By looking at the weights of your
trained network, you can see the first degree polynomial that the
neural network has devised to approximate $f$. Write down this
polynomial, and hand it in along with the code you wrote to solve this
problem.
\item How much information does it take to describe a two-input perceptron?
The classical description uses a vector of three real-valued
parameters: $\vec{w} = \langle w_0, w_1, w_2 \rangle$. But the
perceptron's decision boundary is a line, which can be uniquely
specified with just two parameters, e.g., slope and intercept.
Jack says: ``I claim a perceptron can be described with less
information than three real numbers. Here's how I would do it with
just two real values: set $s_0=w_0/w_2$, and $s_1=w_1/w_2$. From the
description $\langle s_0, s_1 \rangle$, I can construct a weight
vector $\langle s_0, s_1, 1\rangle$ that describes a line with the
same slope and intercept as the one described by $\vec{w}$. So it
should behave exactly the same as $\vec{w}$ for all inputs.''
Jill replies: ``I claim a perceptron requires more than just two real
numbers to describe. A line is not an inequality. Consider the case
where $w_2$ is negative. How will that effect the results of your
transformation?''
\begin{enumerate}
\item Whose claim is correct, and why?
\item How much information does it really take to correctly describe
a two-input perceptron? (Don't worry about the case where $w_2=0$.)
\end{enumerate}
\end{enumerate}
\end{document}