\slidehead{Outline \red \ } \bk
\centerline{[read Chapter 2]}
\centerline{[suggested exercises 2.2, 2.3, 2.4, 2.6]}
\bigskip
\bi
\item Learning from examples
\item General-to-specific ordering over hypotheses
\item Version spaces and candidate elimination algorithm
\item Picking new examples
\item The need for inductive bias
\ei
\bigskip
Note: simple approach assuming no noise, illustrates key concepts
\slidehead{Training Examples for {\em EnjoySport} \red \ } \bk
\begin{center}
\normalsize %\small
\begin{tabular}{|ccccccc|} \hline
Sky & Temp & Humid & Wind & Water & Forecst & EnjoySpt
\\ \hline
Sunny & Warm & Normal & Strong & Warm & Same & Yes \\
Sunny & Warm & High & Strong & Warm & Same & Yes \\
Rainy & Cold & High & Strong & Warm & Change & No \\
Sunny & Warm & High & Strong & Cool & Change & Yes \\
\hline
\end{tabular}
\bigskip
\bigskip
What is the general concept?
\end{center}
\slidehead{Representing Hypotheses \red \ } \bk
Many possible representations
\bigskip
Here, $h$ is conjunction of constraints on attributes
\bigskip
Each constraint can be
\bi
\item a specfic value (e.g., $Water=Warm$)
\item don't care (e.g., ``$Water=?$'')
\item no value allowed (e.g.,``Water=$\emptyset$'')
\ei
For example,
\begin{tabular}{ccccccc}
Sky & AirTemp & Humid & Wind & Water & Forecst
\\
$\langle Sunny$ & $?$ & $?$ & $Strong$ & $?$ & $Same \rangle$
\end{tabular}
\slidehead{Prototypical Concept Learning Task \red \ } \bk
\bi
\item
{\bf Given:}
\bi
\item
Instances $X$: Possible days, each described by the attributes
{\em Sky, AirTemp, Humidity, Wind, Water, Forecast}
\item
Target function $c$: $EnjoySport: X \rightarrow \{0,1 \}$
\item
Hypotheses $H$: Conjunctions of literals. E.g.
\[\langle ?, Cold, High, ?, ?, ? \rangle. \]
\item
Training examples $D$: Positive and negative examples of the target function
\[\langle x_1, c(x_1) \rangle , \ldots \langle x_m, c(x_m) \rangle\]
\end{itemize}
\item {\bf Determine:}
A hypothesis $h$ in $H$ such that $h(x)=c(x)$ for all $x$ in $D$.
%\begin{itemize}
%\item[] ?
%A hypothesis $h$ in $H$ such that $h(x)=c(x)$ for all $x$ in $X$.
%
%\item
%(A hypothesis $h$ in $H$ such that $h(x)=c(x)$ for all $x$ in $D$.)
%\end{itemize}
\ei
\newpage
\vspace*{2in}
\begin{description}
\item
{\bf The inductive learning hypothesis:} Any hypothesis found to approximate
the target function well over a sufficiently large set of training examples
will also approximate the target function well over other unobserved examples.
\end{description}
\slidehead{Instance, Hypotheses, and More-General-Than \red \ } \bk
\hbox{
\psfig{file=./bookps/vs-gen-spec.ps,width=7in}}
\slidehead{\pgm{Find-S} Algorithm \red \ } \bk
\be
\item Initialize $h$ to the most specific hypothesis in $H$
\item For each positive training instance $x$
\bi
\item For each attribute constraint $a_{i}$ in $h$
\bi
\item[] If the constraint $a_{i}$ in $h$ is satisfied by $x$
\item[] Then do nothing
\item[] Else replace $a_{i}$ in $h$ by the next more general constraint that is satisfied by $x$
\ei
\ei
\item Output hypothesis $h$
\ee
\slidehead{Hypothesis Space Search by \pgm{Find-S} \red \ } \bk
\hbox{
\psfig{file=./bookps/finds-gen-spec.ps,width=8in}}
\slidehead{Complaints about \pgm{Find-S} \red \ } \bk
\bi
\item Can't tell whether it has learned concept
\item Can't tell when training data inconsistent
\item Picks a maximally specific $h$ (why?)
\item Depending on $H$, there might be several!
\ei
\slidehead{Version Spaces \red \ } \bk
\begin{quote}
A hypothesis $h$ is {\bf consistent} with a set of training examples $D$ of
target concept $c$ if and only if $h(x)=c(x)$ for each training example
$\langle x, c(x) \rangle$ in $D$.
\[Consistent(h,D) \equiv (\forall \langle x, c(x) \rangle \in D)\ h(x)=c(x) \]
\end{quote}
\bigskip
\bigskip
\begin{quote}
The {\bf version space}, $VS_{H,D}$, with respect to
hypothesis space $H$ and training examples $D$, is the subset of hypotheses
from $H$ consistent with all training examples in $D$.
\[VS_{H,D} \equiv \{h \in H|Consistent(h,D)\} \]
\end{quote}
\slidehead{The \pgm{List-Then-Eliminate} Algorithm: \red \ } \bk
\begin{enumerate}
\item
$VersionSpace \leftarrow$ a list containing every hypothesis in $H$
\item
For each training example, $\langle x, c(x) \rangle$
\item[] remove from $VersionSpace$ any hypothesis $h$ for which $h(x) \neq c(x)$
\item Output the list of hypotheses in $VersionSpace$
\end{enumerate}
\slidehead{Example Version Space \red \ } \bk
\hbox{\psfig{file=./bookps/figure-vs3.ps,width=7in}}
\slidehead{Representing Version Spaces \red \ } \bk
\begin{description}
\item
The {\bf General boundary}, G, of version space $VS_{H,D}$ is the set of its
maximally general members
\bigskip
\item
The {\bf Specific boundary}, S, of version space $VS_{H,D}$ is the set of its
maximally specific members
\bigskip
\item Every member of the version space lies between these boundaries
\[VS_{H,D} = \{h \in H| (\exists s \in S)(\exists g \in G) (g \geq h \geq
s)\}\]
where $x \geq y$ means $x$ is more general or equal to $y$
\end{description}
\slidehead{ Candidate Elimination Algorithm \red \ } \bk
$G \la$ maximally general hypotheses in $H$
$S \la$ maximally specific hypotheses in $H$
For each training example $d$, do
\bi
\item If $d$ is a positive example
\bi
\item Remove from $G$ any hypothesis inconsistent with $d$
\item For each hypothesis $s$ in $S$ that is not consistent with $d$
\bi
\item Remove $s$ from $S$
\item Add to $S$ all minimal generalizations $h$ of $s$ such that
\be
\item $h$ is consistent with $d$, and
\item some member of $G$ is more general than $h$
\ee
\item Remove from $S$ any hypothesis that is more general than another
hypothesis in $S$
\ei
\ei
\item If $d$ is a negative example
\bi
\item Remove from $S$ any hypothesis inconsistent with $d$
\item For each hypothesis $g$ in $G$ that is not consistent with $d$
\bi
\item Remove $g$ from $G$
\item Add to $G$ all minimal specializations $h$ of $g$ such that
\be
\item $h$ is consistent with $d$, and
\item some member of $S$ is more specific than $h$
\ee
\item Remove from $G$ any hypothesis that is less general than another
hypothesis in $G$
\ei
\ei
\ei
\slidehead{Example Trace \red \ } \bk
\hbox{
\psfig{file=./bookps/vsexamp-initialize.ps,width=7in}}
\slidehead{What Next Training Example? \red \ } \bk
\hbox{\psfig{file=./bookps/figure-vs3.ps,width=7in}}
\slidehead{How Should These Be Classified? \red \ } \bk
\hbox{\psfig{file=./bookps/figure-vs3.ps,width=7in}}
\bigskip
\bi
\item[]
\[ \langle Sunny\ Warm\ Normal\ Strong\ Cool\ Change \rangle \]
\item[]
\[ \langle Rainy\ Cool\ Normal\ Light\ Warm\ Same \rangle \]
\item[]
\[ \langle Sunny\ Warm\ Normal\ Light\ Warm\ Same \rangle \]
\ei
\slidehead{What Justifies this Inductive Leap? \red \ } \bk
\[+\ \ \langle Sunny\ Warm\ Normal\ Strong\ Cool\ Change \rangle \]
\[+\ \ \langle Sunny\ Warm\ Normal\ Light\ Warm\ Same \rangle \]
\rule{7in}{.2mm}\vskip -3mm
\[S:\ \ \langle Sunny\ Warm\ Normal\ ?\ ?\ ? \rangle \]
\bigskip
Why believe we can classify the unseen
\[ \langle Sunny\ Warm\ Normal\ Strong\ Warm\ Same \rangle \]
\slidehead{An UNBiased Learner \red \ } \bk
\bi
\item[]
Idea: Choose $H$ that expresses every teachable concept (i.e., $H$ is the power
set of $X$)
\item[]
Consider $H' =$ disjunctions, conjunctions, negations over previous $H$.
E.g.,
\[ \langle Sunny\ Warm\ Normal\ ?\ ?\ ? \rangle \ \tor \ \neg \langle ?\ ?\ ?\ ?\ ?\ Change \rangle \]
\ei
What are $S$, $G$ in this case?
\bi
\item[]
S $\la$
\item[]
G $\la$
\ei
\slidehead{Inductive Bias \red \ } \bk
Consider
\bi
\item concept learning algorithm $L$
\item instances $X$, target concept $c$
\item training examples $D_c = \{\langle x, c(x) \rangle \}$
\item let $L(x_i,D_c)$ denote the classification assigned to the
instance $x_i$ by $L$ after training on data $D_c$.
\ei
\vspace*{.2in}
{\bf Definition}:
\begin{quote}The {\bf inductive bias}
of $L$ is any minimal set of assertions $B$ such that for any target concept
$c$ and corresponding training examples $D_c$
\end{quote}
\[
(\forall x_i \in X) [(B \tand D_c \tand x_i) \entails L(x_i,D_c)]
\]
where $A \entails B$ means $A$ logically entails $B$
\slidehead{Inductive Systems and Equivalent Deductive Systems \red \ } \bk
\centerline{\hbox{
\psfig{file=./bookps/figure-vs-bias-new.ps,width=5.6in}}}
\slidehead{Three Learners with Different Biases \red \ } \bk
\be
\item
{\em Rote learner:} Store examples, Classify $x$ iff it matches previously
observed example.
\item
{\em Version space candidate elimination algorithm}
\item
{\em Find-S}
\ee
\slidehead{Summary Points \red \ } \bk
\be
\item Concept learning as search through $H$
\item General-to-specific ordering over $H$
\item Version space candidate elimination algorithm
\item $S$ and $G$ boundaries characterize learner's uncertainty
\item Learner can generate useful queries
\item Inductive leaps possible only if learner is biased
\item Inductive learners can be modelled by equivalent deductive systems
\ee