\documentclass{article}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath, amsfonts}
\usepackage{enumerate}
\usepackage{graphicx}
\usepackage{titling}
\usepackage{float}
\usepackage{enumitem}
\usepackage{url}
\usepackage{xcolor}
\usepackage[colorlinks=true,urlcolor=blue]{hyperref}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Commands for customizing the assignment %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand \duedate {5 p.m. Wednesday, February 25, 2015}
\title{
10-601 Machine Learning: Homework 5 \\
\vspace{0.2cm}
\large{
Due \duedate{}
}
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Useful commands for typesetting the questions %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand \expect {\mathbb{E}}
\newcommand \mle [1]{{\hat #1}^{\rm MLE}}
\newcommand \map [1]{{\hat #1}^{\rm MAP}}
\newcommand \argmax {\operatorname*{argmax}}
\newcommand \argmin {\operatorname*{argmin}}
\newcommand \code [1]{{\tt #1}}
\newcommand \datacount [1]{\#\{#1\}}
\newcommand \ind [1]{\mathbb{I}\{#1\}}
\newcommand \bs [1]{\boldsymbol{#1}}
\newcommand{\HH}{{H}}
\newcommand{\comment}[1]{\textcolor{blue}{\textsc{\textbf{[#1]}}}}
%%%%%%%%%%%%%%%%%%%%%%%%%%
% Document configuration %
%%%%%%%%%%%%%%%%%%%%%%%%%%
% Don't display a date in the title and remove the white space
\predate{}
\postdate{}
\date{}
% Don't display an author and remove the white space
\preauthor{}
\postauthor{}
\author{}
%%%%%%%%%%%%%%%%%%
% Begin Document %
%%%%%%%%%%%%%%%%%%
\begin{document}
\maketitle
\section*{Instructions}
\begin{itemize}
\item {\bf Late homework policy:} Homework is worth full credit if
submitted before the due date, half credit during the next 48 hours,
and zero credit after that. You {\em must} turn in at least $n-1$
of the $n$ homeworks to pass the class, even if for zero credit.
\item {\bf Collaboration policy:} Homeworks must be done individually,
except where otherwise noted in the assignments. ``Individually''
means each student must hand in their own answers, and each student
must write and use their own code in the programming parts of the
assignment. You may collaborate with others on this problem set and
consult external sources. \textbf{\em However, you must write your
own solutions and fully list your collaborators/external references
for each problem.} We will be assuming that, as participants in a
graduate course, you will be taking the responsibility to make sure
you personally understand the solution to any work arising from such
collaboration.
\item {\bf Online submission:} You must submit your solutions online
on \href{https://autolab.cs.cmu.edu/courses/27/assessments/342}{autolab}.
We recommend that you use \LaTeX{} to type your solutions to the
written questions, but we will accept scanned solutions as well. On
the Homework 4 autolab page, you can download the
\href{https://autolab.cs.cmu.edu/courses/27/assessments/342/attachments/31}{template},
which is a tar archive containing a blank placeholder pdf for the
written questions and one Octave source file for each of the
programming questions. Replace each pdf file with one that contains
your solutions to the written questions and fill in each of the
Octave source files with your code. When you are ready to submit,
create a new tar archive of the top-level directory and submit your
archived solutions online by clicking the ``Submit File'' button.
You should submit a single tar archive identical to the template,
except with each of the Octave source files filled in and with the
blank pdf replaced by your solutions for the written questions. You
are free to submit as many times as you like (which is useful since
you can see the autograder feedback immediately).
\textbf{\emph{DO NOT}} change the name of any of the files or
folders in the submission template. In other words, your submitted
files should have exactly the same names as those in the submission
template. Do not modify the directory structure.
\end{itemize}
\section*{Problem 1: VC Dimension}
Recall that we call a set of points \emph{shattered} by a class of functions $H$ if all possible $\{-1,+1\}$ labelings of the points can be produced by some function in $H$. The \emph{Vapnik-Chervonenkis} (VC) dimension is the size of the largest set of points that can be shattered by the hypothesis space. See the \href{http://www.cs.cmu.edu/~ninamf/courses/601sp15/slides/08_Theory_2-9-2015.pdf}{lecture notes}, \href{http://youtu.be/3qbLJdjE3Nw}{video}, and \href{http://youtu.be/uoBw7DTZDQM}{recitation video} for more information.
In this problem, we will explore the hypothesis space where each hypothesis is a combination of two simpler hypotheses. More precisely, given two hypotheses $h_1$ and $h_2$, we define $h = h_1 \cap h_2$ as a new hypothesis that labels an example $+1$ only if both $h_1$ and $h_2$ give the label $+1$, otherwise, it is labeled $-1$. We can extend this to \emph{sets} of hypotheses: given two sets of hypotheses $H_1$ and $H_2$, define $H^* = \{ h_1 \cap h_2 : h_1 \in H_1, h_2 \in H_2 \}$ as the set of all intersections of hypothesis pairs from the two classes $H_1$ and $H_2$.
As an example, let $H_1$ be the set of classifiers in $\mathbb{R}$ that assigns the label $+1$ if the example is larger than some threshold $a$. Let $H_2$ be the set of classifiers in $\mathbb{R}$ that assigns the label $+1$ if the example is smaller than some threshold $b$. Then $H_*$ would be the set of all intervals $(a, b)$ in $\mathbb{R}$ that assigns $+1$ if the example is inside the interval. Another example is when $H_1$ and $H_2$ is the set of all (axis-aligned) squares in $\mathbb{R}^2$, $H^*$ is the set of all axis-aligned rectangles. This example is illustrated below. On the left, we have a single square classifier $h_1$; in the middle we again have a square classifier $h_2$; and on the right, we have $h_1 \cap h_2$, which is a rectangle classifier.
\begin{figure}[H]
\centering
\includegraphics[width=0.33\textwidth]{h1.pdf}\hfill%
\includegraphics[width=0.33\textwidth]{h2.pdf}\hfill%
\includegraphics[width=0.33\textwidth]{intersect.pdf}
\end{figure}
\noindent Keep in mind that these are only examples. We are looking for results that can apply generally to any pair of hypotheses classes.
\begin{enumerate}[label=\textbf{\alph*.}]
\item \label{shattering_coef_bound} \textbf{[15 pts]} Suppose that the \emph{shattering coefficient} of $H_1$ is $H_1[n]$ (i.e. the maximum number of ways that the hypothesis space $H_1$ can label a set of $n$ points is $H_1[n]$). Similarly, suppose that the shattering coefficient of $H_2$ is $H_2[n]$. Show that $H^*[n] \le H_1[n] H_2[n]$.
\item \label{vc_shattering} \textbf{[5 pts]} Show that if the VC dimension of a hypothesis space $H$ is $d$, then the shattering coefficient $H[d]$ is $2^d$.
\item \textbf{[15 pts]} Let $H$ be a hypothesis space with VC dimension $d$. Define $H^*$ as the hypothesis space produced by all intersections of pairs of hypotheses from $H$ (assuming that $H_1 = H_2 = H$ in our above definitions). Use your results from \ref{shattering_coef_bound} and \ref{vc_shattering} to show that the VC dimension $d_*$ of $H^*$ is bounded by $\mathcal{O}(d \log d)$. You may use the fact that if $2^x \le x^y$, then $x \le k \cdot y \log y$ for some constant $k$.
\textbf{Hint:} Since $d_*$ is the VC dimension of $H^*$, then by definition, there exists a set $S$ of $d_*$ points that is shattered by $H^*$. By Sauer's lemma, we know that the maximum number of ways that $H$ can label $S$ is bounded by $\mathcal{O}(d_*^{d})$. That is, $H[d_*] = \mathcal{O}(d_*^{d})$.
\item For each one of the following function classes, find the VC dimension. State your reasoning.
\begin{enumerate}[label=\textbf{\roman*.}]
\item \textbf{[4 pts]} Half spaces in $\mathbb{R}$, where examples on one side of the boundary are labeled $+1$, and examples on the other side are labeled $-1$.
\item \textbf{[4 pts]} Half spaces in $\mathbb{R}^2$, where examples on one side of the line are labeled $+1$, and examples on the other side are labeled $-1$.
\item \textbf{[7 pts]} Axis-aligned squares in $\mathbb{R}^2$, where points are labeled $+1$ inside the square, and $-1$ outside (as in the illustrations above).
\end{enumerate}
\end{enumerate}
\pagebreak
\section*{Problem 2: Graphical Models}
Below is depicted a graphical model with four \emph{discrete} random variables that can be used to predict whether school will be closed due to inclement weather. \textbf{Note:} We will cover concepts that will help in completing this problem in recitation on Feb 19th and in lecture on Feb 23rd. The videos will be posted on the website.
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{snow_day.pdf}
\end{figure}
\begin{enumerate}[label=\textbf{\alph*.}]
\item Answer the following questions about the conditional independence structure in the model:
\begin{enumerate}[label=\textbf{\roman*.}]
\item \textbf{[4 pts]} Which variables are independent of \textbf{temperature} given that \textbf{snow} is observed?
\item \textbf{[4 pts]} Which variables are independent of \textbf{snow} given that no variables are observed?
\item \textbf{[4 pts]} Which variables are independent of \textbf{snow} given that \textbf{temperature} is observed?
\item \textbf{[4 pts]} Which variables are independent of \textbf{school cancellation} given that \textbf{snow} and \textbf{roads salted} are observed?
\end{enumerate}
Suppose the random variables in the above graphical model have the following parameters:
The variable \textbf{temperature} does not depend on any other variable, and so it has the following prior distribution:
\begin{center}
\begin{tabular}{|c|c|} \hline
$p(\textbf{temperature} = cold)$ & $p(\textbf{temperature} = warm)$ \\ \hline \hline
0.4 & 0.6 \\ \hline
\end{tabular}
\end{center}
The variable \textbf{snow} only depends on the value of \textbf{temperature}:
\begin{center}
\begin{tabular}{|c||c|c|c|} \hline
$\textbf{temperature}$ & $p(\textbf{snow} = none \mid \textbf{temp})$ & $p(\textbf{snow} = light \mid \textbf{temp})$ & $p(\textbf{snow} = heavy \mid \textbf{temp})$ \\ \hline \hline
$cold$ & 0.4 & 0.4 & 0.2 \\ \hline
$warm$ & 0.9 & 0.08 & 0.02 \\ \hline
\end{tabular}
\end{center}
The variable \textbf{roads salted} only depends on the value of \textbf{snow}:
\begin{center}
\begin{tabular}{|c||c|c|} \hline
$\textbf{snow}$ & $p(\textbf{roads salted} = T \mid \textbf{snow})$ & $p(\textbf{roads salted} = F \mid \textbf{snow})$ \\ \hline \hline
$none$ & 0.01 & 0.99 \\ \hline
$light$ & 0.9 & 0.1 \\ \hline
$heavy$ & 0.97 & 0.03 \\ \hline
\end{tabular}
\end{center}
The variable \textbf{school cancellation} depends on both \textbf{snow} and \textbf{roads salted}. For brevity, the condition ``$\textbf{snow}, \textbf{roads slated}$'' is replaced with ``$\hdots$'':
\begin{center}
\begin{tabular}{|c|c||c|c|} \hline
$\textbf{snow}$ & $\textbf{roads salted}$ & $p(\textbf{school cancellation} = T \mid \hdots)$ & $p(\textbf{school cancellation} = F \mid \hdots)$ \\ \hline \hline
$none$ & T & 0.01 & 0.99 \\ \hline
$none$ & F & 0.01 & 0.99 \\ \hline
$light$ & T & 0.2 & 0.8 \\ \hline
$light$ & F & 0.4 & 0.6 \\ \hline
$heavy$ & T & 0.95 & 0.05 \\ \hline
$heavy$ & F & 0.99 & 0.01 \\ \hline
\end{tabular}
\end{center}
\item \textbf{[4 pts]} The joint probability is given by $p(\textbf{temperature}, \hspace{0.1cm} \textbf{snow}, \hspace{0.1cm} \textbf{roads salted}, \hspace{0.1cm} \textbf{school cancellation})$. Write the factorized form of the joint probability (as a product of simpler probabilities) for the model above.
\item Using the above model, compute the following quantities. Show your work.
\begin{enumerate}[label=\textbf{\roman*.}]
\item \textbf{[6 pts]} What is the probability
\begin{center}
$p(\textbf{temperature} = cold, \hspace{0.1cm} \textbf{snow} = light, \hspace{0.1cm} \textbf{roads salted} = F, \hspace{0.1cm} \textbf{school cancellation} = T)$?
\end{center}
\item \textbf{[6 pts]} Compute the distribution $p(\textbf{snow} \mid \textbf{school cancellation} = T, \textbf{temperature} = cold)$.
\item \textbf{[6 pts]} Compute the distribution $p(\textbf{snow} \mid \textbf{school cancellation} = F, \textbf{temperature} = cold)$.
\item \textbf{[6 pts]} Compute the distribution $p(\textbf{school cancellation} \mid \textbf{snow} = light)$.
\item \textbf{[6 pts]} What is the probability
\begin{center}
$p(\textbf{school cancellation} = T \mid \textbf{temperature} = cold, \hspace{0.1cm} \textbf{snow} = light, \hspace{0.1cm} \textbf{roads salted} = F)$?
\end{center}
\end{enumerate}
\end{enumerate}
\section*{Problem 3: Extra Credit}
In this optional extra credit problem, we will derive the VC dimension of half-spaces in $\mathbb{R}^n$. Note that all linear classifiers fall into this class (logistic regression, perceptrons, support vector machines, etc). Let $H_n$ be the set of half-spaces in $\mathbb{R}^n$.
\begin{enumerate}[label=\textbf{\alph*.}]
\item \textbf{[5 pts] Lower bound.} Prove that VC-dim$(\HH_n) \geq n+1$ by
presenting a set of $n+1$ points in $n$-dimensional space such that
one can partition that set with halfspaces in all possible ways.
(And, show how one can partition the set in any desired way.)
\item \textbf{[5 pts] Upper bound.} The following is ``Radon's
Theorem,'' from the 1920's.
{\bf Theorem. } {\em Let $S$ be a set of $n+2$ points in $n$ dimensions.
Then $S$ can be partitioned into two (disjoint) subsets $S_1$ and
$S_2$ whose convex hulls intersect.}
Show that Radon's Theorem implies that the VC-dimension of halfspaces
is {\em at most} $n+1$. Conclude that VC-dim$(\HH_n) = n+1$.
\item \textbf{[5 pts]} Now we prove Radon's Theorem. We
will need the following standard fact from linear algebra. If $x_1,.
\ldots, x_{n+1}$ are $n+1$ points in $n$-dimensional space, then they
are linearly dependent. That is, there exist real values $\lambda_1,
\ldots, \lambda_{n+1}$ {\em not all zero} such that $\lambda_1x_1 +
\ldots + \lambda_{n+1}x_{n+1} = 0$.
%% originally had this, which is not exactly correct:
%% That is, there exist real values $\lambda_1, \ldots, \lambda_n$
%% such that $x_{n+1} = \lambda_1x_1 + \ldots + \lambda_nx_n$.
You may now prove Radon's Theorem however you wish. However, as a
suggested first step, prove the following. For any set of $n+2$ points $x_1,
\ldots, x_{n+2}$ in $n$-dimensional space, there exist $\lambda_1,
\ldots, \lambda_{n+2}$ {\em not all zero} such that $\sum_i \lambda_i
x_i = 0$ and $\sum_i \lambda_i = 0$. (This is called {\em affine
dependence}.) %%Now, think about the lambdas...
\end{enumerate}
\end{document}