\begin{slide}{}

\center{``A Data-Parallel Programming Model for \\
        Reconfigurable Architectures''}
\center{\em{Paper by Steven A. Guccione \\
            and Mario J. Gonzalez}}

\center{Peter A. Dinda}

\center{4/12/94}
\end{slide}


\begin{slide}{}
{\em Overview}
\begin{itemize}
\item Reconfigurable Architectures
\item Programming Model
\item Compiling an Example
\end{itemize}
\end{slide}


\begin{slide}{}
{\em Reconfigurable Architectures: FPGAs} \\
Example: Xilinx 4010
\begin{itemize}
\item{20x20 matrix of CLBs and Routing support}
\item{~10,000 gates, 1120 flip/flops, 12,800 bits SRAM, 160 I/O pins}
\item{Implementing Devices}
   \begin{itemize}
   \item 24 bit Accumulator: 13 CLBs, 32 ns
   \item 16:1 MUX: 5 CLBs, 16 ns
   \item 16 bit adder: 9 CLBs, 20.5 ns
   \end{itemize}
\item{Programmable in-circuit in milliseconds}
\end{itemize}
\end{slide}

\begin{slide}{}
{\em Reconfigurable Architectures: Machines}\\
Example: Virtual Computer
\begin{itemize}
\item Attached Processor for Workstations
\item Logic Pipelines composed of SRAM, XC4010s, ICUBE communication ASICs
\item Configuration Control
\end{itemize}
\end{slide}

\begin{slide}{}
{\em Programming Model: Motivation and Vectors}
\begin{itemize}
\item Motivation: Want to do large amounts of work between reconfigurations,
      but minimize reconfiguration time
\item Vector model
    \begin{itemize}
    \item Can configure any possible type of vector functional unit
    \item Deep pipelines
    \item Long vectors
    \end{itemize}
\item But data dependencies limit pipeline depth
\end{itemize}
\end{slide}

\begin{slide}{}
{\em Programming Model: Vector-based Data Parallel Model}
\begin{itemize}
\item Elliminate data dependencies by only supporting data parallel semantics - 
``All operation are performed in parallel on a vector of data'' 
\item Solution: data parallel semantics on top of 
                a vector implementation
\item Extend with scan primitives
\item Result: C with vector operators and function calls to scan primitives
\end{itemize}
\end{slide}

\begin{slide}{}
{\em Compiling an Example}\\
$$e^x-1=x+x^2/2!+x^3/3!...$$ 
\begin{itemize}
\item Form data flow graph
\item Replace computations with functional units
\item Insert delay stages correctly when merging pipelines (paths
      through graph)
\item Give this RTL representation to synthesis and routing tools
\item Use optimizations like common subexpression elimination and
      strength reduction to reduce logic
\end{itemize}
\end{slide}

\begin{slide}{}
{\em What wasn't covered}
\begin{itemize}
\item Claim to have coded several algorithms to this model, but don't present
      any results
\item Don't talk about pipelining within individual functional units
\item How to deal with a data flow graph that exceeds logic and pin
      resources?
\end{itemize}
\end{slide}{}







