\documentclass[11pt,twoside]{scrartcl}
%opening
\newcommand{\lecid}{15-414}
\newcommand{\leccourse}{Bug Catching: Automated Program Verification}
\newcommand{\lecdate}{} %e.g. {October 21, 2013}
\newcommand{\lecnum}{24}
\newcommand{\lectitle}{CEGAR \& Craig Interpolation}
\newcommand{\lecturer}{Matt Fredrikson}
\usepackage{lecnotes}
\usepackage[irlabel]{bugcatch}
\usepackage{tikz}
\usetikzlibrary{automata,shapes,positioning,matrix,shapes.callouts,decorations.text,patterns,trees}
\usepackage{listings}
\definecolor{mygray}{rgb}{0.5,0.5,0.5}
\definecolor{backgray}{gray}{0.95}
\lstdefinestyle{whyml}{
belowcaptionskip=1\baselineskip,
breaklines=true,
language=[Objective]Caml,
showstringspaces=false,
numbers=left,
xleftmargin=2em,
framexleftmargin=1.5em,
numbersep=5pt,
numberstyle=\tiny\color{mygray},
basicstyle=\footnotesize\ttfamily,
keywordstyle=\color{blue},
commentstyle=\itshape\color{purple!40!black},
tabsize=2,
backgroundcolor=\color{backgray},
escapechar=\%,
morekeywords={predicate,invariant}
}
%% \traceget{v}{i}{\zeta} is the state of trace v at time \zeta of the i-th discrete step
\newcommand{\traceget}[3]{{#1}_{#2}(#3)}
\def\limbo{\mathrm{\Lambda}}
%% the last state of a trace
\DeclareMathOperator{\tlast}{last}
%% the first state of a trace
\DeclareMathOperator{\tfirst}{first}
\begin{document}
\lstset{
basicstyle=\ttfamily\small,
mathescape
}
%% the name of a trace
\newcommand{\atrace}{\sigma}%
%% the standard interpretation naming conventions
\newcommand{\stdI}{\dTLint[state=\omega]}%
\newcommand{\Ip}{\dTLint[trace=\atrace]}%
\def\I{\stdI}%
% \let\tnext\ctnext
% \let\tbox\ctbox
% \let\tdiamond\ctdiamond
\maketitle
\thispagestyle{empty}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
In the previous lecture we saw how to create a Kripke structure whose language is equivalent to the trace semantics of a program. However, this is problematic for model checking due to the fact that there are an infinite number of states in the structure. We began describing a way to address this using predicate abstraction, which overapproximates the Kripke structure by partitioning Kripke states into a finite number of abstract states.
Today we will continue with predicate abstraction, and see how to create an abstract transition structure for an arbitrary program. The good news is that it is always feasible to do so, as there are a finite number of states and the transitions can be computed using familiar techniques. The bad news is that often it is the case that crucial information gets lost in the approximation, leaving us unable to find real bugs or verify their absence. We'll see how to incrementally fix this using a technique called refinement, which leads to interesting new questions about automated software verification.
\section{Review: Predicate abstraction}
\begin{definition}\label{def:concret}
Given a set of predicates $A \in \hat{\Sigma}$, let $\gamma(A)$ be the set of program states $\sigma \in \mathcal{S}$ that satisfy the conjunction of predicates in $A$:
\[
\textstyle
\gamma(A) = \{\sigma\in\mathcal{S} : \sigma\models\bigwedge_{a\in A} a\}
\]
\end{definition}
\begin{definition}[Abstract Transition Structure]\label{def:abstrans}
Given a program $\asprg$, a set of abstract atomic predicates $\hat{\Sigma}$, and control flow transition relation $\epsilon(\asprg)$, let $L$ be a set of \emph{locations} given by the inductively-defined function $\plocs{\asprg}$, $\ilocs{\asprg}$ be the \emph{initial} locations of $\asprg$, and $\flocs{\asprg}$ be the \emph{final} locations of $\asprg$. The abstract transition structure $\hat{K_\asprg} = (\hat{W},\hat{I},\hat{\stepto},\hat{v})$ is a tuple containing:
\begin{itemize}
\item $\hat{W} = \plocs{\asprg} \times \powerset{\hat{\Sigma}}$ are the states defined as pairs of program locations and sets of abstraction predicates.
\item $\hat{I} = \{\langle\ell, A\rangle \in \hat{W} : \ell \in \ilocs{\asprg}\}$ are the initial states corresponding to initial program locations.
\item $\hat{\stepto} = \{(\langle\ell, A\rangle, \langle\ell', A'\rangle :~\text{for}~(\ell, \bsprg, \ell') \in \epsilon(\asprg)~\text{where there exist}~\sigma,\sigma'~\text{such that}~\sigma\in\gamma(A), \sigma'\in\gamma(A')~\text{and}~(\sigma,\sigma')\in\llbracket\bsprg\rrbracket\}$ is the transition relation.
\item $\hat{v}(\langle\ell, A\rangle) = \langle\ell, A\rangle$ is the labeling function, which is in direct correspondence with states.
\end{itemize}
\end{definition}
\begin{theorem}\label{thm:existential}
For any trace $\langle\ell_0,\sigma_0\rangle,\langle\ell_1,\sigma_1\rangle,\ldots$ of $K_\asprg$, there exists a corresponding trace of $\hat{K_\asprg}$ $\langle\hat{\ell}_0,A_0\rangle,\langle\hat{\ell}_1,A_1\rangle,\ldots$ such that for all $i\ge0$, $\ell_i = \hat{\ell}_i$ and $\sigma_i \in \gamma(A_i)$.
\end{theorem}
\begin{theorem}\label{thm:abstrans}
Let $A, B \subseteq \hat{\Sigma}$ be sets of predicates over program states, and $\bsprg$ be a program. Then for $\sigma\in\gamma(A)$, there exists a state $\sigma'\in\gamma(B)$ such that $(\sigma,\sigma')\in\llbracket\bsprg\rrbracket$ if and only if $\bigwedge_{a\in A} a \limply \dibox{\bsprg}{\bigvee_{b\in B} \lnot b}$ is not valid.
\end{theorem}
\paragraph{Spurious counterexamples}
We looked at a modification of the earlier example from the previous lecture.
\begin{lstlisting}
$\ell_0$: i := abs(N)+1;
$\ell_1$: while(0 $\le$ x $<$ N) {
$\ell_2$: i := i - 1;
$\ell_3$: x := x + 1;
$\ell_4$: }
\end{lstlisting}
This yields the following control flow transitions.
\begin{center}
\includegraphics[width=0.45\textwidth]{controlflow4.pdf}
\end{center}
Consider the following counterexample on the predicate set $\hat{\Sigma} = \{0\le i\}$.
\begin{enumerate}
\item $\langle\ell_0,0\le i\rangle\hat{\stepto}\langle\ell_1,0\le i\rangle$. This edge is in $\hat{K_\asprg}$ because $0\le i\limply\dibox{\pumod{i}{\mathtt{abs}(N)+1}}{0>i}$ is equivalent to $0\le i\limply0>\mathtt{abs}(N)+1$, which is not valid.
\item $\langle\ell_1,0\le i\rangle\hat{\stepto}\langle\ell_2,0\le i\rangle$. This edge exists because $0\le i\limply\dibox{\ptest{0\le xi}$ is equivalent to $0\le i\limply0\le xi$, which is not valid.
\item $\langle\ell_2,0\le i\rangle\hat{\stepto}\langle\ell_3,0>i\rangle$. This edge exists because $0\le i\limply\dibox{\pumod{i}{i-1}}{0\le i}$ is equivalent to $0\le i\limply0 \le i-1$ and is not valid, seen from the assignment $i=0$.
\item $\langle\ell_3,0>i\rangle\hat{\stepto}\langle\ell_1,0>i\rangle$. This edge exists because $0>i\limply\dibox{\pumod{x}{x+1}}{0\le i}$ is equivalent to $0>i\limply0\le i$, which is not valid.
\item $\langle\ell_1,0>i\rangle\hat{\stepto}\langle\ell_4,0>i\rangle$. This edge exists because $0>i\limply\dibox{\ptest{\lnot(0\le xi\limply\lnot(0\le xi$. This leads us to ask whether the following DL formula is valid:
\[
0 \le i \limply \dibox{\pumod{i}{\mathtt{abs}(N)+1};\ptest{0\le xi}$ is equivalent to $0\le i\limply0>\mathtt{abs}(N)+1$, which is not valid.
\item $\langle\ell_1,0\le i\rangle\hat{\stepto}\langle\ell_2,0\le i\rangle$. This edge exists because $0\le i\limply\dibox{\ptest{0\le xi}$ is equivalent to $0\le i\limply0\le xi$, which is not valid.
\item $\langle\ell_2,0\le i\rangle\hat{\stepto}\langle\ell_3,0>i\rangle$. This edge exists because $0\le i\limply\dibox{\pumod{i}{i-1}}{0\le i}$ is equivalent to $0\le i\limply0 \le i-1$ and is not valid, seen from the assignment $i=0$.
\item $\langle\ell_3,0>i\rangle\hat{\stepto}\langle\ell_1,0>i\rangle$. This edge exists because $0>i\limply\dibox{\pumod{x}{x+1}}{0\le i}$ is equivalent to $0>i\limply0\le i$, which is not valid.
\item $\langle\ell_1,0>i\rangle\hat{\stepto}\langle\ell_4,0>i\rangle$. This edge exists because $0>i\limply\dibox{\ptest{\lnot(0\le xi\limply\lnot(0\le x i_1
\]
This is called the \emph{path formula} for the counterexample. Each conjunct on the left of the implication corresponds to a constraint imposed by executing one of the statements on the path. Assignments introduce equalities, and tests introduce direct assertions; updates to variables are accounted for by indexing variable names in the manner of static single-assignment (SSA) form. The final conjunct is the negated safety property
Now consider the interpolant corresponding to:
\[
\begin{array}{ll}
\ausfml &\equiv i_0 = |N|+1 \\
\busfml &\equiv 0\le x i_1
\end{array}
\]
Such an interpolant is a fact about the program state immediately after the assignment $\pumod{i}{\mathtt{abs}(N)+1}$ that must hold after executing the assignment. One such fact is $i_0 = |N| + 1$ (i.e., just $\ausfml$). But another is $i_0 > |N|-x$. We can do this for each step of the counterexample, deriving interpolants along the way to learn useful facts.
\begin{itemize}
\item
$\ausfml \equiv i_0 = |N|+1$,\\
$\busfml \equiv 0\le x i_1$\\
$\cusfml \equiv i_0 > |N| - x$
\item
$\ausfml \equiv i_0 = |N|+1 \land 0\le x i_1$,\\
$\cusfml \equiv i_0 > |N|-x \land 0 \le x i_1$,\\
$\cusfml \equiv i_1 \ge |N|-x \land 0 \le x i_1$,\\
$\cusfml \equiv i_1 > |N|-x_1 \land 0\le x_1\le N$
\item
$\ausfml \equiv i_0 = |N|+1 \land 0\le x i_1$,\\
$\cusfml \equiv 0 \le i_1$
\end{itemize}
We can obtain a set of predicates to refine our abstraction with by dropping the subscripts from each interpolant above. We would then be left with,
\[
\hat{\Sigma} = \{0 \le i, i > 0, i > |N|-x, i \ge |N|-x, 0 \le x < N, 0 \le x \le N\}
\]
Using these predicates will ensure that we do not encounter the same spurious counterexample again in future attempts at model checking. To see why, notice the following sequence of observations.
\begin{enumerate}
\item After executing the first assignment $\pumod{i}{\mathtt{abs}(N)+1}$, we have that $i > 0$ must hold. In other words, $\dibox{\pumod{i}{\mathtt{abs}(N)+1}}{i > 0} \lbisubjunct |N|+1 > 0$ is valid.
\item After executing the test $\ptest{0\le x < N}$ (i.e., entering the loop) starting in a state where $i > 0$ holds, $i > |N|-x \land 0 \le x 0 \limply \dibox{\ptest{0\le x |N|-x} \land 0\le x0 \land 0\le x |N| - |x| \land 0 \le x |N|-x \land 0\le x |N|-x \land 0\le i|N|-x \land 0\le x |N|-x \land 0\le x\le N$ holds. This follows from the validity of $(i \ge |N|-x \land 0\le x |N|-x \land 0 \le x \le N}) \lbisubjunct (i \ge |N|-x \land 0 |N|-(x+1) \land 0\le x+1\le N)$.
\item Starting in a state where $i > |N|-x \land 0\le x\le N$ holds and executing the test $\ptest{\top}$ (i.e., going back to the top of the loop) leads to a state where $i > 0$ necessarily holds. This follows from the validity of $(i > |N|-x \land 0 \le x \le N \limply \dibox{\ptest{\top}}{i > 0}) \lbisubjunct (i > |N|-x \land 0 \le x \le N \limply i > 0)$.
\item Starting in a state where $i > 0$ holds and executing the test $\ptest{\lnot(0\le x 0 \limply \dibox{\ptest{\lnot(0 \le x 0 \land \lnot(0 \le x < N) \limply 0 \le i)$.
\end{enumerate}
Most automated techniques for deriving interpolants from counterexample path formulas do so in a way that the interpolant for at each step through the path formula is sufficient to prove the interpolant at the following~\cite{Henzinger2004}. In other words, if $\cusfml_i,\cusfml_{i+1}$ are the interpolants for steps $i$ and $i+1$ of the counterexample, and $\asprg$ is the statement executed at that step, then $\cusfml_i\limply\dibox{\asprg}{\cusfml_{i+1}}$. This guarantees that the counterexample used to refine the abstraction will no longer be a trace in the new transition structure.
\paragraph{Localizing abstraction}
The way in which we derived these predicates gives us more information yet. In particular, we derived each interpolant by splitting the counterexample path formula at all of the program locations along the path. So in addition to telling us which predicates are relevant to refining out the counterexample, this procedure also tells us where in the program they are relevant at. We can use this information to greatly reduce the statespace of the abstraction by localizing the set of abstraction predicates to each control flow location. So rather than using as the statespace of the abstraction $\hat{W} = \mathcal{S} \times \powerset{\hat{\Sigma}}$, we define $\hat{\Sigma}$ to be a function from control flow locations to sets of predicates:
\begin{multline*}
\hat{\Sigma} : \plocs{\asprg} \to \powerset{\text{predicates over program states}}, \\
\text{where}~\hat{\Sigma}(\ell)~\text{is the set of predicates interpolated at location}~\ell
\end{multline*}
Applying this to our running example, we are left with the abstraction depicted below. Notice that there are no paths to a state where $0 > i$ holds, so we can perform exhaustive model checking to conclude that the safety property always holds.
\begin{center}
\includegraphics[width=0.6\textwidth]{abstraction.pdf}
\end{center}
This approach is called \emph{lazy abstraction}~\cite{Henzinger2002}, because the abstraction is constructed as needed and only in relevant locations as required by counterexamples. It is a popular approach for software model checking, and has been implemented in numerous tools available today.
\bibliography{platzer,bibliography}
\end{document}