\documentclass[12pt]{article}
\usepackage{amsmath}
\usepackage{epsfig, graphics}
\usepackage{latexsym}
\usepackage{fullpage}
\usepackage[parfill]{parskip}
\usepackage{mysymbols}
\title{10708 Graphical Models: Homework 3\\
{\small Due October 29th, beginning of class} }
\date{October 15, 2006}
\begin{document}
\maketitle
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
{\bf Instructions}: There are six questions on this assignment. Each
question has the name of one of the TAs beside it, to whom you
should direct any inquiries regarding the question. Please submit
your homework in two parts, one for each TA. Also, please put the
TA's name on top of the homework.
\textit{Note}: Starting this homework, you \textit{will} be penalized points
for not splitting the homework.
The
last problem involves coding, which should be done in MATLAB. Do
{\it not} attach your code to the writeup. Instead, copy your
implementation to
\begin{center}
\begin{verbatim}
/afs/andrew.cmu.edu/course/10/708/Submit/your_andrew_id/HW3
\end{verbatim}
\end{center}
Refer to the web page for policies regarding collaboration, due dates, and extensions.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Q1
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Variable Elimination {\small Dhruv [10 pts]}}
\begin{figure}[ht]
\begin{center}
\includegraphics[bb= 0 0 468 544,scale=0.5]{ve2.png}
\caption{World Domination network}
\label{fig:wd}
\end{center}
\end{figure}
Your friendly neighbourhood TA, like most other TAs, is intent on world
domination. His first step, obviously, was to build a graphical model,
as shown in Figure~\ref{fig:wd}. The variables being: Graduate (G),
Free Food (FF), The Force (TF), Knowledge (K), Money (M), Power (R)
and World Domination (WD). All the variables are binary valued $\{T, F\}$.
The CPT parameters are:
\beq
P(G = T|TF = T, FF = T) = 0.9,& P(G = T|TF = T, FF = F) = 0.7\\
P(G = T|TF = F, FF = T) = 0.5,& P(G = T|TF = F, FF = F) = 0.1
\eeq
\beq
P(FF = T|TF = T) = 0.8,& P(FF = T|TF = F) = 0.6\\
P(TF = T) = 0.1&\\
P(K = T|G = T) = 0.7,& P(K = T|G = F) = 0.6\\
P(R = T|M = T) = 0.7,& P(R = T|M = F) = 0.1\\
P(M = T|G = T) = 0.6,& P(M = T|G = F) = 0.5
\eeq
\beq
P(WD = T|M = T, R = T) = 0.7, &P(WD = T|M = T, R = F) = 0.5\\
P(WD = T|M = F, R = T) = 0.6, &P(WD = T|M = F, R = F) = 0.05
\eeq
Help your friendly TA make some urgent inferences from his world domination network; but make sure
you're not baited by his nemesis: Exponential Computational Complexity.
\ben
\item
How likely is the TA to take over the world, if he manages to graduate? $P(WD = T|G = T)$ = ?
\item
$P(WD = T|FF = T)$ = ?
\item
Should we even be worried about him graduating? $P(G = T)$ = ?
\item
$P(M = T|K = T)$ = ?
\een
Additionally, report the ordering used and the factors produced after eliminating each variable for the first query [$P(WD = T | G = T)$].
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Q2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conditional Probabilities in Variable Elimination {\small Amr [13pts]}}
Consider a factor produced as a product of some of the CPDs in a Bayesian
network $\mathcal{B}$:
\beq
\tau({\bf W}) = \prod_{i = 1}^{k}P(Y_{i}|{\bf Pa}_{Y_{i}})
\eeq
where ${\bf W} = \cup_{i=1}^{k}(\{Y_{i}\}\cup{\bf Pa}_{Y_{i}})$.
\ben
\item
Show that $\tau$ is a conditional probability \emph{in some
network}. \emph{Hint}: Partition $\bf W$ into two disjoint sets,
$\bf Y$ and $\bf Z$, i.e., ${\bf W} = {\bf Y} \cup {\bf Z}$, and
${\bf Y} \cap {\bf Z}= \emptyset$. Then show that $\tau({\bf W}) =
P({\bf Y}|{\bf Z})$. See also \textbf{KF, Section 8.3.1.3}
\item
Show that the intermediate factors produced by the variable
elimination algorithm are also conditional probabilities \emph{in
some network}.
\een
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Q3
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Triangulation {\small Dhruv [7 pts]}}
\label{sec:triangulate}
\begin{figure}[!h]
\begin{center}
\includegraphics[bb=0 0 308 320,width=2in]{triangulate.png}
\caption{Bayes net for question~\ref{sec:triangulate}}
\label{fig:triangulate}
\end{center}
\end{figure}
\begin{enumerate}
\item Moralize the Bayes net in figure~\ref{fig:triangulate}.
\item Supply a perfect elimination ordering ({\it i.e.}, one that yields no fill edges). \label{elim-ord1}
\item Supply an elimination ordering that yields a triangulated graph with at least 5 nodes in one or more cliques \label{elim-ord2}
\item Draw clique trees for the elimination orderings in parts \ref{elim-ord1} and \ref{elim-ord2}.
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Q4
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Clique Tree Representation {\small Dhruv [20 pts]}}
In this question you will formalize the relationship between clique
trees for a Bayesian network and the probability distribution the
Bayes network encodes. In summary, if $P$ factorizes according to a
Bayesian Network, then any clique tree $\mathcal{T}$ for this BN is
an I-map for $P$, moreover, $P$ also factorizes according to
$\mathcal{T}$ in a way that we will make explicit below.
\begin{enumerate}
\item \text{[Clique tree I-map]} In a clique tree, consider a
separator ${\bf S}_{ij}$ between two cliques ${\bf C}_{i}$ and ${\bf
C}_{j}$. Let ${\bf X}$ be any set of variables in the ${\bf C}_{i}$
side of the tree, and ${\bf Y}$ be any set of variables in the ${\bf
C}_{j}$ side of the tree. \textbf{Prove} that $P \models ({\bf X}
\perp {\bf Y} \mid {\bf S}_{ij})$.\\
(Hint: Consider using an independence property we derived in HW1)
\item \text{[Clique tree factorization]} Using the independencies
above, \textbf{prove} that in a clique tree for a BN, \emph{when the
clique tree is calibrated}, we can represent the joint distribution
by:
\[
P({\bf X}) = \frac{\prod_i P({\bf C}_i)}{\prod_{ij} P({\bf S}_{ij})}.
\]
You should not ``prove'' by corollary from the correctness of BP in
clique trees.
\\
(Hint: combine the chain rule of probabilities with the definition of
conditional probabilities.)
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Q5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Variable Elimination in Clique Trees {\small Dhruv [20 pts]}}
Consider a chain graphical model with the structure $X_1 - X_2 -
\cdots - X_n$, where each $X_i$ takes on one of $d$ possible
assignments. You can form the following clique tree for this GM:
$\textbf{C}_1 - \textbf{C}_2 - \cdots - \textbf{C}_{n-1}$, where
$Scope[\textbf{C}_i] = \{X_i,X_i+1\}$. You can assume that this
clique tree has already been calibrated. Using this clique tree, we
can directly obtain $P(X_i,X_i+1)$. As promised in class, your goal
in this question is to compute $P(X_i,X_j)$, for any $j>i$.
\begin{enumerate}
%\subsection{}
\item Briefly, describe how variable elimination can be used
to compute $P(X_i,X_j)$, for some $j>i$, in linear time, given the
calibrated clique tree.
%\subsection{}
\item What is the running time of the algorithm in part 5.1
if you wanted to compute $P(X_i,X_j)$ for all $n$ choose $2$ choices
of $i$ and $j$?
%\subsection{}
\item Consider a particular chain $X_1 - X_2 - X_3 - X_4$.
Show that by caching $P(X_1,X_3)$, you can compute $P(X_1,X_4)$ more
efficiently than directly applying variable elimination as described
in part 5.1.
%\subsection{}
\item Using the intuition in part 5.3, design a dynamic
programming algorithm (caching partial results) which computes
$P(X_i,X_j)$ for all $n$ choose $2$ choices of $i$ and $j$ in time
asymptotically much lower than the complexity you described in part
5.2. What is the asymptotic running time of your algorithm?
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Q6
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Variable Elimination {\small Amr [30 pts]}}
\subsection{[15 pts]}
Implement the variable elimination algorithm from class. You
\textbf{shall not} implement pruning for inactive variables but we
\textbf{require} that you implement the min-fill heuristic in order
to select an elimination order. However, we will \textbf{NOT}
require that your min-fill implementation be \emph{query-specific },
\emph{i.e.}, you should first apply the min-fill heuristic once on
the whole graph to get an elimination order which will be then used
in answering all the queries \footnote{In practice, as we discussed
in class, taking the evidence variables into consideration will
result in a better elimination order, however, for simplicity we
will ignore this point.}.
You can reuse any code you wrote for hw2, and you are free to use
the the posted solution for hw2.
\textbf{Important:} Submit your implementation to your AFS code
directory, and in addition submit a script called
\textbf{\emph{"run.m"}} that when invoked reproduces all the output
in parts 6.1 and 6.2\footnote{Yes, we are aware that for 6.2 the
results depend on the machine used to run your code, but that is
fine.}. Moreover, answer the following questions in your writeup,
reporting all probabilities to \emph{four significant digits}.
\begin{enumerate}
\item [1] Using the network in Figure 1, what is the elimination order
produced by your min-fill implementation? And how many fill-edges
added?
\item [2] Using the Alarm network in \texttt{alarm.m} compute the value
of the following queries.
\begin{enumerate}
\item[(a)]How many fill-edges added to this network by your min-fill implementation?
\item[(b)] $P$(StrokeVolume = High$\;|\;$Hypovolemia = True, ErrCauter = True, PVSat = Normal, Disconnect = True, MinVolSet = Low)
\item[(c)] $P$(HRBP = Normal $\;|\;$ LVEDVolume = Normal, Anaphylaxis = False, Press = Zero, VentTube = Zero, BP = High)
\item[(d)] $P$(LVFailure = False $\;|\;$ Hypovolemia = True, MinVolSet = Low, VentLung = Normal, BP = Normal)
\item[(e)] $P$(PVSAT = Normal ,CVP = Normal $\;|\;$ LVEDVolume = High, Anaphylaxis = False, Press = Zero)
\end{enumerate}
\end{enumerate}
\subsection{[5 pts]}
Another naive way of answering a conditional probability query of
the form $P(X=x|Y=y)$ is as follows:
\begin{equation}\label{eq:naive}
P(X=x|Y=y) = \frac{P(X=x,Y=y)}{P(Y=y)} =
\frac{\sum_{z}{P(X=x,Y=y,Z=z)}}{\sum_{x',z}{P(X=x',Y=y,Z=z)}}
\end{equation}
where $X$ and $Y$ are two subsets of the variables in the network,
and $Z= \mathrm{Var(Network)} - X \cup Y$, i.e. all other variables
in the network other than $X$ and $Y$. Each of the terms in the
above summations can be evaluated by simple multiplications of the
factors in the network as you were asked to implement in HW2. We
would like to compare the time consumed by the variable elimination
algorithm you implemented and the above naive approach. However, as
you will be convinced below, you should NOT attempt to run the naive
approach on the \emph{alarm} network. Instead, we will estimate its
running time as follows:
\begin{enumerate}
\item Compute the time needed to evaluate each of the terms in the
summations above --- that is the time needed to execute a call to
\emph{assignProb}. You can do that by simply running
\emph{assignProb} once using any random assignment to the variables
in the network. \textbf{Write} down this time in your submission.
\item Compute the number of calls to \emph{assignProb} needed to evaluate
Eq.(1). \textbf{Write} down a formula for this number of calls using
the dimensionality of the variables in $X, Y \mathrm{and} \,\, Z$.
\item Estimate the time needed by the naive approach based on the
above two quantities\footnote{We will ignore the time needed to sum
the terms in the summations in (1)}.
\end{enumerate}
Using the above procedure, compare the running time of both variable
elimination and the naive approach over each of the queries in part
6.1.2
\subsection{[10 pts]}
Create a na\"{i}ve Bayes network on binary variables ${\cal V} = \{C,X_1,\ldots,X_k\}$ where $C$ is the class and $X_1,X_2,\ldots,X_k$ are the features. Choose some parameterization of the network such that each parameter $0 < \theta_{x_i|{\bf u}_i} < 1$. In the context of variable elimination
\begin{enumerate}
\item[1] What ordering on ${\cal V}$ has minimum induced treewidth (call it $\prec_{o}$) ?
\item[2] What ordering on ${\cal V}$ has maximum induced treewidth (call it $\prec_{w}$) ?
\end{enumerate}
Using your implementation of variable elimination
\begin{enumerate}
\item[3] For $k = 1 \ldots 10$ compute $\sum_{\cal V}P(C,X_1,\ldots,X_k)$ using $\prec_{o}$ and $\prec_{w}$. Plot the running time of each method vs. $k$.
\\
{\small ({\it Note}: Yes, we know that the quantity you are computing is 1.0. The point of the question is to compare running times.)}
\end{enumerate}
\end{document}