\begin{flushleft}
{\bf 2. URM Model Description}
\end{flushleft}

The URM  model is a  three dimensional,  time-dependent Eulerian model
that accounts for the transport, deposition and chemical  evolution of
various pollutants in the atmosphere. A brief description of the model
is given here.

\begin{flushleft}
{\bf 2.1 Algorithm}
\end{flushleft}

The  URM  model  is  based  on  the  atmospheric  diffusion  equation,
$$\frac{\partial c_i}{\partial t}~+~\nabla  .  ({\bf{u}}c_i)~=~~\nabla
.  ({\bf{K}} \nabla c_i)~+~f_i~+~S_i
\eqno(1)$$ Here, $c_i$ is the concentration of the {\it i}th pollutant
among {\it p}  species, i.e., {\it i} = 1, ..., {\it p}, {\bf{u}}  describes
the velocity field, {\bf{K}} is  the diffusivity tensor,  {\it f$_i$(c$_1$,
..., c$_p$)} is the chemical reaction term and {\it S$_i$} is the  net source
term. The detailed description  of the model can be found  in {\it Kumar et
al.} [1994].  The  major components of the model that  have  been
parallelized are described here.

In  solving  equations  (1),  the model  uses  the  operator splitting
method.      The     solution    is    advanced     in     time     as
$$c^{n+1}~=~L_{xy}~(\wedge    t/2)~L_{cz}~(\wedge    t)~L_{xy}~(\wedge
t/2)~c^{n}  \eqno(2)$$  $L_{xy}$  is  the two  dimensional  horizontal
transport  operator  and  $L_{cz}$  is  the  chemistry   and  vertical
transport operator.  Since the diffusion-dominated vertical transport
has time scales very similar  to the chemistry, and since the  solution
of  a diffusive  process  involves an exponential structure similar to
that of chemical decay, chemistry  and vertical transport are combined
in a single operator, $L_{cz}$.  The Streamline Upwind Petrov-Galerkin
(SUPG) finite  element method is used for  the solution  of horizontal
transport  [{\it  Odman and  Russell},  1991a].  For  the chemistry and
vertical  transport equations, the hybrid  scheme of  {\it  Young  and
Boris} [1978]  for stiff systems of ordinary differential equations is
used.

This  use   of  a  2-dimensional   horizontal  transport  operator  is
significant for two  reasons, both related to  the size of the domains
that URM is  designed  to model. First, to provide a given accuracy, a
well-chosen  multiscale grid is  significantly  more efficient  from a
computational standpoint  than is a uniform grid, because it  requires
evaluation  of the  $~L_{cz}$ operator at fewer points. Unfortunately,
it  is not  clear how  to use 1-dimensional  transport operators  with
multiscale  grids, due to  their  non-uniform  sampling  and  internal
dependent nodes.  Second,  in  conditions where significant cross-flow
components  exist, a  2-dimensional method  can use a larger time step
than  a  1-dimensional method  to achieve the  same  accuracy.   Large
domains tend  to cover more  heterogeneous geographic regions  than do
small  domains,  and therefore are more  likely  to have variable flow
behavior, causing significant cross-flow components  for  any grid.
For both these reasons,  multiscale grids  are  preferable  to uniform
grids for modeling large regions. The structural differences between
1-dimensional  uniform  grid  transport  operators  and  2-dimensional
multiscale   transport  operators  have  a   significant   impact   on
parallelization.   This  is  the  major  difference  between the  work
reported here and that of {\it Dabdub and Seinfeld} [1994] .

\begin{flushleft}
{\bf 2.2 Model structure}
\end{flushleft}

Figure  1 shows a simplified flow chart of  the  structure of the  URM
model.  It performs  most of the computation  within the hourly  loop.
First, it reads the meteorological and emissions  input data, which is
specified for each hour.  Based  on the velocity  data and the spatial
dimensions  of  the  computational grid, it calculates  the number  of
integration time steps (NSTEPS) for that hour.  Then  it  executes the
time  step  loop, where  it  calls the  horizontal  transport  routine
(HORIZ), the chemistry  and vertical transport  routine  (CHMVRT)  and
then again  HORIZ.  Each call to HORIZ solves the horizontal transport
equation for  half  an integration  time step.   CHMVRT integrates the
chemistry and vertical transport equation for a full time step.  HORIZ
performs three main  functions:  triangular (LDU) decomposition of the
coefficient  matrix, solution of the advective-diffusive equation  for
various  chemical  species,  and   the  filtering  of  the   resulting
concentration  vector to remove numerical noise.  LDU decomposition is
performed only once every hour,  as the  meteorological data changes on
an  hourly  basis.  A  nonlinear  streamline  filter  [{\it  Odman and
Russell},  1993]  follows  the   advection   step  to  avoid  negative
concentrations and is applied  before the CHMVRT step only.  After the
time  loop, the model prints  the  hourly output  data and finally the
execution is stopped  after the model has run  for  a given  number of
hours.

The main data  structure  used in  the model is  a 3-dimensional array
representing  the concentration  of  the species in  the volume  being
modeled.  The three dimensions  are  horizontal  grid nodes,  vertical
layers, and  chemical species\footnote{Although the nodes within  each
layer are ordered linearly in this structure,  each element within the
node  dimension  actually  represents  a  point  of  the 2-dimensional
horizontal multiscale grid.}.  The two main phases of the application,
the horizontal transport and the chemistry calculations, access
the  data structure  along  orthogonal  axes.   During the  horizontal
transport phase, the calculation is performed over  all  layers,  then
over all species, and finally  over all grid  points, so the  order of
the  dimensions in terms of decreasing locality of
reference\footnote{For FORTRAN, this is the order in which the
dimensions appear in the declaration statement; it is reversed for C}
(for  best  performance)  is: grid  points,  species,
layers.  In the case of chemistry and vertical transport the order is:
species, layers, grid points.  To ensure efficient data access in each
of the phases, a transpose is performed  on the data structure between
the  phases.  Before the chemistry  phase,  the concentration array is
partitioned into  individual arrays for  each  grid point,  which  are
again  packed  together into  the main  concentration array after  the
chemistry calculation.

