\begin{flushleft}
{\bf 1. Introduction}
\end{flushleft}

Tropospheric ozone has been and continues to be a major air quality
problem.  Dozens of cities in the United States exceed the federal
ozone standard.  A tremendous effort is spent on ozone abatement in
the U.S. alone.  It is difficult to formulate ozone abatement
strategies because ozone is a secondary pollutant formed as a result
of complex, non-linear reactions between nitrogen oxides (NO$_x$) and
reactive organic gases (ROG).  These precursors and/or secondary
pollutants associated with ozone formation are transported from one
area to another, making ozone modeling and abatement a regional-scale
problem rather than solely an urban problem.  Mathematical models are
the most technically defensible tools available at present to study
the evolution of ozone in the troposphere and to formulate control
strategies for its abatement.

Several  urban  and  regional-scale   air  quality  models  have  been
developed  in the last  decade.   Urban-scale models  include  the
Urban Airshed Model  (UAM) [{\it Morris  et al.}, 1989], CALGRID [{\it
Yamartino et al.}, 1992],  and the  Carnegie/California  Institute  of
Technology (CIT) model [{\it McRae et al.}, 1982].  Regional-scale
models include the Regional Oxidant Model (ROM) [{\it Lamb}, 1986] and
the Regional Acid Deposition Model (RADM) [{\it Chang et al.},  1987].
Current urban and regional-scale air quality models have fixed spatial
scales.   Urban-scale models  have  grid  size of  about  5  km.   The
cited regional-scale  models  use a  relatively coarse spatial resolution of
20-400 km because of the great computational costs associated with
fine scale resolution; however, significant processes such
as power  plant  plume dynamics,  cloud  dynamics, and urban  to rural
transitions cannot be resolved by the coarse scales used in these
regional-scale  models. It is  also well  understood that  long  range
transport of O$_3$ and its precursors plays an important role in urban
O$_3$ problems, and vice-versa. It  is  desirable to use  a model that
can  use  multiple  scales  to  effectively  capture  the  dynamics of
pollutants   at   urban  and  regional  scales.  An  Urban-to-Regional
Multiscale (URM)  model has  been developed [{\it  Odman and Russell},
1991, and  {\it  Kumar  et al.}, 1994] which  can include  fine scales
wherever  necessary  within the coarse grid mesh to more  effectively
capture the details of the atmospheric processes.  Applications of the
URM  model to  the southern  California [{\it Kumar et al.}, 1994] and
the northeastern  United States [{\it  Kumar and Russell}, 1995a]  have
indicated that this model can be  significantly more efficient  from a
computational point of view than a uniform-scale model.

The URM model has  recently been extended  to  include  sub-grid scale
processes (plume and cloud dynamics) [{\it  Kumar and Russell}, 1995b]
and will soon be coupled with complex aerosol physics as a step toward
a  next   generation  air  quality  model.   Even  with  the   current
availability  of fast  computers, the computational  demands of  these
next  generation  models will be  significant.   There  is  general
consensus  among the  supercomputing  community that  the  best way to
tackle  many  of the  biggest problems  in science  today  is  to  use
massively parallel architectures.   Air quality models are well suited
to parallelization,  although  achieving scalable performance  can  be
challenging.   This work reports on  the  parallelization of  the  URM
model and its  application to  various architectures.  The discussion,
while based  on this effort, is applicable  to other  efforts applying
high performance computing to environmental modeling.

Previous work on parallelization of air quality models includes that {\it Saylor
and Fernandes} [1993], who described the parallelization of the STEM-II acid
deposition and photochemical oxidant model [{\it Carmichael et al.}, 1991], on a
5-processor shared-memory IBM 3090-600J using IBM parallel FORTRAN.  {\it Dabdub
and Seinfeld} [1994] parallelized the CIT model [{\it McRae et al.}, 1982] on
the 512-processor distributed-memory Intel Touchstone Delta using NX as the
message passing protocol.  Both the CIT model and the STEM-II model are
different from the URM model in the way they treat the horizontal transport
(discussed in detail in Section 2.1). The CIT and STEM-II models use two
orthogonal one-dimensional horizontal transport operators rather than the
two-dimensional operator used in the URM model.  Thus, the parallelization
approaches used in the CIT and STEM-II models are not directly applicable to the
URM model\footnote{The CIT work closer in that it is also based on a
distributed-memory architecture, and scaling issues are investigated.}. The use
of a two-dimensional horizontal operator is an essential part of the URM model,
as that is what allows the use of multiple scales.  Implications with regard to
parallelism due to differences in operator splitting between the CIT and URM
models are discussed in Section 3.1.

The parallelization of the present application was achieved through
the use of the PVM (Parallel Virtual Machine) software system,
developed at Oak Ridge National Laboratories [{\it Beguelin et al.},
1991] and {\it  Geist and Sunderam}, [1991]. PVM provides a set of routines for communicating among
processes in a networked environment.  Because it is a
distributed-memory model, one advantage of using PVM is that it is
possible to run a parallel program on a cluster of workstations, and
then transfer the same code to a dedicated massively parallel machine
without major code changes.  PVM also provides a unified framework
within which large parallel systems, e.g.  a cluster of workstations,
a massively parallel supercomputer, etc., can be linked together in a
straightforward and efficient manner to form a heterogeneous
distributed computing environment.  Thus, this type of parallel
implementation is more generally applicable than one developed for a
specific architecture.

An important issue in the development of parallel air quality models
is the portability of the model across different architectures.  This
is significant because these models are meant for eventual use by a
broad community. At present, air quality models are run and analyzed
by a small community of highly specialized air quality
scientists. Consequently, there is a technological gap between those
who study air quality and those who make policy decisions based on
those studies.  There is a desire to develop a comprehensive modeling
system (CMS) that can be used directly by air quality managers to make
scientifically based policy decisions.  {\it Hansen et al.} [1994]
discuss an initiative being carried out by the Consortium for Advanced
Modeling of Regional Air Quality (CAMRAQ) in that direction.  For a
CMS to be accessible by the large community of air quality managers,
scientists, regulators, etc., it is important that the system be
designed in such a way as to be highly portable across a variety of
computational platforms.  The present work on URM provides an
opportunity to explore various issues in portability across different
parallel and distributed systems.  This application has been
parallelized and executed on a wide variety of systems, including a
16-processor Cray C90, a dedicated workstation cluster and and a group
of idle workstations. Furthermore, the current structure is designed
to allow direct use on massively parallel architectures and systems
involving massively parallel processors (MPP's) and other machines.
Ports to two such machines, the Intel Paragon MPP and the Cray T3D,
are well along.

Performance results for the application on various platforms are
presented.  Problems encountered while porting the application are
described, along with their solutions.

