%%%%%%%%	Article style document.		%%%%%%%%

\documentstyle[times,11pt,epsf]{article}
%\documentstyle[titlepage,times,11pt,picture]{article}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Get the support for EPSF in subdirectory
% Will typeset incorporated pictures on any unix system
% Will typeset and preview incorporated picture on macintoshes (Textures)
%      if custom EPSF.sty is installed. Available from tomstr@cs.cmu.edu 
\input epsf.sty

\def\mac{mac}
\ifx\sys\mac\def\figpath{:fig:}
  \else\def\figpath{./paper.fig/}
\fi

%%%%%%%%	Margin declarations.		%%%%%%%%

\oddsidemargin  0in
\evensidemargin 0in
\topmargin -0.3in
\textwidth 6.5in
\textheight 8.75in


%%%%%%%%	Macro definitions.		%%%%%%%%

% allows multiple line comments enclosed by brackets.
\newcommand{\comment}[1]{}

% define un-numbered footnote for all kinds of disclaimers and support messages
\makeatletter
\long\def\unmarkedfootnote#1{{\long\def\@makefntext##1{##1}\footnotetext{#1}}}
\makeatother

% allows multiple line comments enclosed by brackets.
\newcommand{\ignore}[1]{}

% allows multiple line comments enclosed by brackets (more visible).
\newcommand{\COMMENT}[1]{}

% allows multiple line comments enclosed by brackets.  These are to be
% omitted from the current version of the document but should be included
% in the final version.
\newcommand{\OMIT}[1]{}

% changes output to double spaced.
\newcommand{\doublespace}{\baselineskip 22pt}

% changes output to single spaced.
\newcommand{\singlespace}{\baselineskip 16pt}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%			DOCUMENT STARTS HERE				%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}

% set line spacing to \singlespace or \doublespace.
\singlespace

% choose the output format for the bibliography.
% @use(bibliography = /afs/cs/usr/prs/bib/nectar.bib)
% @use(bibliography = /afs/cs/usr/prs/bib/network.bib)

\bibliographystyle{plain}

%%%%%%%%		PAPER		%%%%%%%%


%%%%%%%%		TITLE PAGE		%%%%%%%%

\title{
Heterogeneous Distributed \\
Environmental Modeling
}

\author{
Bernd Bruegge$^{*}$, Erik Riedel$^{*}$, Armistead Russell$^{+}$, Edward Segall$^{*}$, Peter Steenkiste$^{*}$\\
\\
School of Computer Science$^{*}$ and Department of Mechanical Engineering$^{+}$\\
Carnegie Mellon University\\
Pittsburgh, PA 15213\\
}
\maketitle
\unmarkedfootnote{
This research was supported in part by the 
National Science Foundation under Award ASC\ 9217365
and in part by the Defense Advanced Research Projects
Agency (DOD) monitored by DARPA/CMO under Contract 
MDA972-90-C-0035.

The authors acknowledge the use of computational
resources provided by the Pittsburgh Supercomputing Center (PSC).
}

%%%%%%%%		ABSTRACT		%%%%%%%%

\begin{abstract}
\normalsize
Environmental models are playing an ever-increasing role in
determining strategies to improve environmental quality, in part due
to the regulatory requirement of their use.  These models have evolved
to become extremely complex and computationally intensive.  Their
complexity is continuously increasing as better and more extensive
science is included.  This places significant demand on the
computational environments in which they are applied, not only in
terms of providing computational cycles, but also in the need for
greater flexibility and ease of use.  High-performance distributed
computing environments have the potential to provide these attributes.
Toward this end, as part of a National Science Foundation Grand
Challenge project, heterogeneous computing environments are being
developed for air quality modeling. A variety of issues are being
addressed related to distributed computation in a range of
environments, and to linking networks of workstations, massively
parallel machines and traditional supercomputers.  In addition,
advanced interfaces are being developed between the models and the
input and output data sets to take advantage of new monitoring and
visualization tools.
\end{abstract}

\newpage

%%%%%%%%		PAPER BODY		%%%%%%%%

\section{Introduction}

Photochemical smog, containing ozone (a strong oxidant), acids, toxics
and aerosols (which are responsible for visibility degradation and
health effects), impacts virtually every major city. In spite of very
stringent pollutant controls, Los Angeles still experiences ozone
levels over the national standard of 0.12 ppm up to about 100 days per
year. Mexico City exceeds the Mexican standard up to about 300 days
per year. One reason for the lack of progress is the deficiencies in
our understanding of the processes that can lead to moderate high
levels of ozone. To attack this problem, comprehensive,
three-dimensional air quality models that can follow pollutant
dynamics have been developed and applied to identify the best methods
to improve air quality. These models are used to relate how changing
pollutant emissions due to emissions controls will affect future air
quality. As such, they are central to air quality management.

An ongoing NSF Grand Challenge Environmental Modeling project
addresses the problem of effectively using large scale environmental
models for policy evaluation and development in three ways: first, the
development of more accurate models; second, the use of
high-performance computing technology for rapid model execution, and
finally, the use of software engineering technology for the
development of problem solving environments for environmental
scientists.  All three aspects of the project are discussed in detail
below.

\section{Environmental modeling}
Computationally, there are a variety of interesting aspects to
comprehensive air quality models. In essence, air quality models solve
a system of 35 to 100 time-dependent, coupled, stiff, highly
non-linear PDEs, following pollutant evolution over a number of days
and thousands of kilometers. As such, they are computationally
demanding. In addition, their structure indicates that they are highly
parallelizable, and that portions should also vectorize well. This
indicates that a distributed computational environment is well-suited
for model execution. On the other hand, all-to-all data dependences on
critical paths lead to high burst communication rates, which in turn
lead to bottlenecks in some parallel and distributed computing
environments. In addition, the input data, which may be many gigabytes
in size, may be distributed among various sites. Finally, the models
are generally complex, requiring a good deal of operator
expertise. This inhibits widespread use which, in turn, presents an
obstacle to applying the scientifically most sound approaches to
environmental management. It is desired to develop a system that will
allow easier management and analysis of the models, the inputs and the
results to be used by a broader community, including scientists,
government agencies and industry. In an effort that is part of the
National Science Foundation (NSF)'s High Performance Computing and
Communications (HPCC) Grand Challenge program, these issues are being
attacked in an interdisciplinary fashion. The product of this research
is an approach that will facilitate bringing the best science to bear
in air quality management.

Air quality models are based on numerical solution of the atmospheric
advection-diffusion- reaction equation (ADRE), which describes the
compound dynamics undergoing turbulent transport, diffusion and
chemical reaction:

$$\frac{\partial c_i}{\partial t}~+~\nabla .  ({\bf{u}}c_i)~=~~\nabla
.  ({\bf{K}} \nabla c_i)~+~f_i~+~S_i \eqno(1)$$ Here, $c_i$ is the
concentration of the {\it i}th pollutant among {\it p} species, i.e.,
{\it i} = 1, ..., p, {\bf{u}} describes the velocity field, {\bf{K}}
is the diffusivity tensor, f$_i$(c$_1$, ..., c$_p$) is the chemical
reaction term and S$_i$ is the emission rate of species $i$.  The
major components of the model that have been parallelized will be
described. A detailed description of the model can be found in
\cite{kumar:euler:geophysical}.

The resulting set of PDEs are coupled through the chemical reaction
term. In general, species have a wide range of lifetimes ---
milliseconds to months. This lead to the set of equations not only
being highly non-linear, but stiff. Traditionally, the set of PDEs is
transformed into ODEs using operator splitting:

The solution is advanced in time as $$c^{n+1}~=~L_{xy}~(\Delta
t/2)~L_{cz}~(\Delta t)~L_{xy}~(\Delta t/2)~c^{n} \eqno(2)$$ $L_{xy}$
is the two dimensional horizontal transport operator and $L_{cz}$ is
the chemistry and vertical transport operator.  Since the
diffusion-dominated vertical transport has time scales very similar to
the chemistry, and since the solution of a diffusive process involves
an exponential structure similar to chemical decay, chemistry and
vertical transport are combined in a single operator, $L_{cz}$.  The
Streamline Upwind Petrov-Galerkin (SUPG) finite element method is used
for the solution of horizontal transport
\cite{odman:multiscale:atmospheric}.  For chemistry and the vertical
transport equations, the hybrid scheme of
\cite{young:equations:reactive} for stiff systems of ordinary
differential equations is used.

One reason for splitting the operators is that specialized algorithms
can be used to treat each process. Ultimately, the resulting set of
ODEs may include millions of equations to be solved for time periods
of days. On a single-processor machine, this can take days or weeks
depending on the application. However, the structure of the models
lends them to being parallelized and, of particular interest, to being
parallelized for execution in distributed environments.

As an example, the air quality model used in the project, called the
Urban-to-Regional Multiscale (URM) Airshed
Model\cite{odman:multiscale:atmospheric}\cite{kumar:euler:geophysical},
is being applied to develop control strategies for improving air
quality in the Northeastern United States. The modeling domain is
about 1200 km (East-West) by 1200 km (North-South) by 3 km
(vertically). Horizontal grid resolution ranges from 4.2 km to 72
km. Vertical grid resolution ranges from 30 m to 1500 m. Approximately
10,000 computational nodes are used, and 50 species are being
followed. On a single-processor DEC Alpha workstation, a simulation of
this size may take 5 days. Given that one may need to test hundreds of
strategies, roughly in succession (to use knowledge gained from
previous tests), the computational delay to do so is burdensome and
often prohibitive. In approaching this problem, a variety of
distributed environments were tested and utilized to address a
specific issue: whether regulations should specify a maximum emissions
level over the whole domain, or if emissions controls in specific
locations were more effective. As is usual when dealing with such
problems, the answers were needed in a matter of days, thus
exemplifying one need for high performance computing. Another is that
these models are central to computational experimentation, so slow
turnaround, or reducing the scope of the model, can lead to inferior
results. The greater performance needed to achieve these goals has
been achieved by creation of a distributed, heterogeneous version of
the URM Airshed model.

\section{Heterogeneous Computing}

Environmental modeling is a computationally expensive process, and,
like many grand challenge and national challenge applications, these
models can make ready use of parallel and distributed computer systems
to obtain the necessary cycles. A wide variety of such systems is
available, including shared-memory systems (Cray C90) and
tightly-coupled (Intel Paragon\cite{paragon:xps}, Cray T3D
\cite{t3d:overview}) and loosely-coupled distributed-memory systems
(e.g. workstation clusters).

An interesting option is heterogeneous computing, in which an
application is distributed over a heterogeneous collection of machines
connected by a high-speed local-area or wide-area
network. Heterogeneous computing not only combines the resources of
multiple computer systems, but also has the potential of performing
the computation very efficiently by mapping each component (task) of
the application onto the architecture that is most appropriate for
that task. The choice of architecture can be driven by the nature of
the computation (scalar versus vectorizable versus data-parallel code)
or by specific features of the system such as large memory size.

An issue that is at the core of both distributed and heterogeneous
computing is that of {\em portability}. Users want to execute the
application on a variety of different platforms. The preferred
platform at any given time will depend on convenience, availability,
and application-specific factors such as input sizes and running time
estimates. This issue is even more critical for heterogeneous
computing applications, since the set of platforms used for a specific
execution can change dramatically from user to user and from time to
time. While portability of sequential programs is relatively well
understood, developing portable distributed applications is much
harder.  Thus, distributed applications often must be substantially
reworked when moved between platforms in order to achieve correct
execution, acceptable performance, or both.

In this project, we investigate how applications can be distributed
over heterogeneous sets of systems connected by high-speed
networks. We are using the Airshed model to drive the research. We
approached the problem from two different angles. We first distributed
the model by hand using PVM (Parallel Virtual
Machine\cite{pvm:dmcc6}\cite{pvm:user}), and showed that exploiting a
combination of task and data parallelism can create a parallel
application that runs efficiently on a variety of parallel
architectures. Parallelization by hand provides implementation
flexibility and exposes to user control the problems associated with
heterogeneous computing. An ongoing part of our work is to develop
tools that automate some of the tasks required for heterogeneous
computing. This research is done in cooperation with the Fx
parallelizing compiler group at CMU.


\section{Parallelism in URM Airshed}

The URM Airshed model (or simply, Airshed) is a three dimensional,
time-dependent Eulerian mathematical model that models the transport,
deposition and chemical evolution of various pollutants in the
atmosphere. The program uses a three-dimensional array to model the
atmosphere. This array represents a three-dimensional multiscale grid
of chemical concentrations. Typical dimensions are 5-20 horizontal
layers, 5000-10000 grid cells inside each layer, and 50-100 chemical
species.

Airshed spends most of its time in two main computational phases: a
horizontal transport phase, and a combined chemistry and vertical
transport phase. Both phases can be solved using data parallelism,
although to do so, the two phases must distribute the concentration
array along the horizontal and vertical dimensions, respectively
(Figure \ref{phases}). Since the two phases access the data structure
in different directions, they are separated in the distributed
implementation by a global transpose operation that serves to both
redistribute the data and orient it for efficient access by each
phase.

\begin{figure}[htb]
\centerline{
\epsfxsize=6in
\epsfbox{phasesII.eps}
}
\caption{Data parallelism: Two-phase domain decomposition with global
transpose between phases}
\label{phases}
\end{figure}

Data parallelism in the Horiz and ChemVert routines is the main source
of parallelism in this application. However, for a typical execution,
a non-trivial amount of time (10\%) is spent in the rest of the
computation. When large numbers of processors are used, these parts of
the model become a sequential bottleneck once the ``easy'' data
parallelism has been exploited, and must be parallelized as well in
order to achieve scalable speedup of the entire application. The types
of parallelism available in these tasks fall into two classes:
parallelization of secondary tasks such as communication and I/O, and
pipelining of several data preprocessing tasks with the main
computation. These optimizations, applied in isolation, would result
in negligible speedup because each optimization applies to only a
relatively small fraction of the sequential computation. However, once
the main computation has been parallelized, their combined effect is
significant.

The variety of parallelism present in Airshed is by no means
unique. Many other applications (especially other physical simulations
based on multiple time scales or geometries) exhibit a range of types
of parallelism.

\section{Performance Using PVM}

After parallelizing Airshed, it was executed on a number of systems,
including the Pittsburgh Supercomputing Center (PSC)s Cray C90, the
PSC DEC Alpha Supercluster, the Cray C90-T3D heterogeneous system, and
a group of Alpha workstations distributed in the Carnegie Mellon
Computer Science Department.

The first distributed Airshed implementation parallelized only the
Horiz and ChemVert data-parallel components. We observed respectable
but lower than desired speedups on different platforms; e.g. 6.5 on a
10-node DEC Alpha cluster. A careful analysis of execution traces
confirmed our earlier observation: Nearly linear speed occurs up in
the data-parallel computational phases, but the sequential code
reduces the overall application speedup. In other words, in this
application (like many others), parallelism in inner loops does not
translate directly into scalable speedup. Note that the mixture of
types of parallelism raises the difficult problem of how tasks should
be mapped onto the distributed system. In general, the optimal mapping
will depend on the characteristics of both the system and the
application.

This hand parallelization effort exposed a number of challenges
associated with writing distributed applications using explicit
message passing. Although the same message passing library (PVM) was
used on each system, moving the application across the different
systems involved significant effort. Besides the obvious problems
(e.g. non-compatible versions of PVM on different systems),
differences in compilation environments, execution environments and
file systems have forced the development of a system called ARO (A
Restructuring Organizer) that supports compilation and execution on
these different environments in a consistent way. This environment is
designed to support different performance optimizations on each system
as well, via conditional compilation.

\section{Fx}

Our experience with the hand-parallelized code clearly shows the need
for tools to support heterogeneous distributed computing. These tools
should not only hide the compilation and execution details of each
system, as ARO does, but should simplify development of the
distributed program by raising the level of abstraction to something
higher than the message-passing level. It should also automate (to a
significant degree) performance optimization, which typically requires
a detailed understanding of the program.

We are working with the Fx parallelizing compiler group at CMU to
achieve this goal. Fx is a FORTRAN dialect that combines data
parallelism, as it is found in High-Performance Fortran (HPF), with
task parallelism\cite{FORTRAN:TASK}. Fx is quite expressive -- using
it, we were able to express in a prototype ``skeleton'' program all
significant task and data parallelism present in Airshed. Our current
activities focus on restructuring Airshed and in tuning the Fx
compiler so that we can realize this potential with the actual Airshed
program.

At the time of writing, we have successfully used Fx to distribute the
ChemVert phase of the computation across a number of platforms
(Figure\ \ref{FxSpeedups}. Note: in this figure, the T3D numbers are
based on PVM, since an Fx port to the T3D was not available at that
time).  Observe that on ChemVert, both PVM and Fx are able to achieve
almost linear speedup. However, in contrast to the PVM approach, Fx
allows us to achieve this performance using a single source code for
both sequential and parallel versions. Further, in Fx, changing from
one supported data distribution to another is simply a matter of
changing a keyword, since distribution and indexing operations are
handled by the compiler. Thus, the Fx code is cleaner, more malleable
and more compact.


\begin{figure}[htb]
\centerline{
\epsfxsize=4in
\epsfbox{FxSpeedups.eps}
}
\caption{Near-Linear Speedup of ChemVert phase}
\label{FxSpeedups}
\end{figure}


%The tradeoff for these advantages is the classic tradeoff between
%high-level and low-level languages -- if a data distribution is not
%supported in Fx, we cannot use it, while we could conceivably change
%the PVM implementation to use any required distribution.  We have in
%fact done this in the past, using non-uniform block distributions to
%support a mix of workstations of differing speeds.  However, this and
%other obviously useful distributions are likely to be supported by Fx
%in the near future. Moreover, features offered by Fx that would be
%relatively difficult to program by hand (such as switching between a
%cyclic distribution for the node dimension in ChemVert and a block
%distribution for the layer and species dimensions in Horiz) are likely
%to offer such a significant performance and coding benefit that we
%find the Fx approach to be very promising.

The next step is to exploit the other types of parallelism mentioned
above across both homogeneous and heterogeneous systems.

\section{Software Engineering Issues}

While their high computational requirement is one reason advanced
environmental models are not more widely used, a second, often more
severe bottleneck is their ``unfriendliness''. Generally, the more
scientifically advanced a model, the harder it is to use. This is a
significant problem. In a recent National Academy Report, it was
underlined that one reason greater progress has not been made in
improving air quality is that the best science has not been used to
address the problem.

To address this issue, the Geographic and Environmental Modeling
System (GEMS) has been developed by our group. The most important goal
of GEMS from the end user's perspective is ease of use. The intent is
to provide an environmental spreadsheet for the exploration of ``What
if...?'' scenarios by a scientist or regulator. To do this, a system
must provide a consistent interface to a set of physically distributed
and heterogeneous computing resources that cooperate to produce a
given analysis, and must also facilitate the necessary communication
among these resources.

GEMS is intended not only to help the regulatory end users, but also
help the model developers incorporate new science more rapidly into
their models.  Our long-term goal is to provide a general
application-specific framework for developing scientific applications
structured like Airshed, while our short-term goal is to develop a
prototype system specifically to solve the air quality modeling
problem.  Several enabling technologies in user interfaces, process
control, database management, and visualization are now available
within GEMS to provide a usable, extensible and flexible framework for
environmental modeling. In order to make the best use of these
technologies, we are engaged in ongoing dialog between developers and
users.

\section{Use of GEMS}

GEMS works with several distinct types of data:

\begin{itemize}
\parskip=0em
\partopsep=0em
\topsep=0em
\parsep=0em
\itemsep=.1em
\item{\em Map data} extracted from the U.S. Census Department TIGER
files. This data provides a vector map down to street level broken
down by state, county, civil district, census tract. and census block.
\item{\em Imagery data} provided by LANDSAT satellite system. This
data provides a raster representation of a picture of the earth's
surface taken from space at 30-meter resolution.
\item{\em Elevation data} provided by the LANDSAT satellite
system. This data provides elevation figures at the same 30-meter
resolution as the imagery data. This allows the display of physical
features such as mountain ranges and provides a representation of the
topology of the area being viewed.
\item{\em Population data} provided by the U.S. Census Department
PL94-171 and STF1A data sets. This data provides population figures
and other statistics, as collect by the Census Department. Data in
these sets are registered to the divisions provided by the TIGER data,
again down to the detail of census blocks.
\item{\em Gridded Model data} provided by the U.S. EPA, the National
Weather Service, and local environmental protection agencies.  This
data provides information on the meteorology and initial pollutant
concentrations, which serve as the inputs for the air quality
model. The output from the air quality model consists of pollutant
concentrations and fluxes gridded in the same manner.
\item{\em Emission Source data} provided by the U.S. EPA and local
environmental agencies. This data provides information on the sources
of pollutants in a region under study. This information, in addition
to the meteorological conditions, are the major input parameters to
the air quality model.

\end{itemize}

To perform a model calculation, the user determines the appropriate
parameters for a run, including the choice of which code is to be
executed and the machine(s) on which the code should run. When the
user initiates a remote execution, a process is started on the remote
machine(s) and the parameters chosen are passed to the remote program.

Each execution produces a considerable amount of data.  For this data
to provide useful insight into the workings of the atmosphere, it must
be visualized in a form that users can readily interpret. We have made
use of an existing visualization system PV-Wave from Precision
Numerics, Inc.\cite{PVWAVE92} to produce a visual representation of
the model outputs.

\begin{figure}[htb]
\centerline{
\epsfxsize=5in
\epsfbox{/afs/me.cmu.edu/tmp/epri/gatech-final/normal.ps}
}
\caption{The GEMS User Interface}
\label{GEMS}
\end{figure}


Figure \ref{GEMS} illustrates the analyses that can be performed using
GEMS. The figure is a snapshot of the GEMS user interface as it would
be seen by a scientist or regulator during a modeling session. The
view is of the Los Angeles basin, and data is based on measurements
made during the SCAQS period in 1987 with calculations based on the
CIT Airshed Model. Data from U.S. Census Department TIGER maps forms
the geographic base for the main map. Overlaid on the geographic data
is a surface shading that depicts the ozone concentrations over the
region at 2:00 PM on August 27, 1987. The color scale in the upper
left shows values going from very low (blue, in the color original) to
very high (red) shading. The Toolbox at the left of the picture is the
region of user interaction with the system. The current selection of
date, output format, and species can be seen in the Chemical Lab tool
depicted there. Other available tools control details of the map and
choice of scenario, display of population data, and the display and
editing of emissions sources. The bottom left section shows a plot of
the average ozone concentration in the entire modeling region as it
varies over a 24 hour period. The interface code is written in C++ and
uses the X/Motif user interface toolkit. GEMS currently runs on
Digital DECstation, DEC Alpha AXP, and Sun SPARCstation platforms.

\section{Frameworks}

The GEMS system is composed of five software subsystems: User
Interface, Execution, Data Management, Visualization, and
Monitoring. One goal of the GEMS system is to connect these five
subsystems together in a single coherent object-oriented framework.

The two aspects of GEMS that are most crucial in achieving our goals
are: an underlying object-oriented system model, and a set of protocols
for the interaction between the GEMS components that allow new
components to be easily incorporated into the framework.

Frameworks provide a higher level of reuse than abstract classes, and
allow considerable leverage in the rapid development of software
systems. The eventual goal of GEMS is to provide an
application-specific framework to support software development in the
environmental modeling community.  The GEMS framework provides a
plug-and-play architecture that allows a developer to rapidly build
software systems to support environmental modeling.

One of the main problems for such an architecture is defining the
protocols for the interaction between the components of one subsystem
and components of one of the other subsystems. This task is especially
difficult if the individual components were not developed with this
specific interaction in mind. A more detailed description of these
issues is described in\cite{GEMS:ECOOP94}

\section{User-Centered Analysis}

In the traditional model of software development, the end user and
domain expert interact with the development team only at the beginning
and end of the life cycle.  In the development of the GEMS system, we
have chosen to take the more realistic view that this type of
interaction, where the user can come up with a well-defined set of
requirements that the development team can transform into an
acceptable system, should be the exception rather than the rule. There
is significant evidence that limited interaction with users leads to
the undesirable situation, present in so many software projects, that
70-80 percent of a project's time and resources are spent in the
maintenance phase. Users may feel perfectly satisfied with the
requirements they have set out in the Problem Statement, but when the
system is finally ready for delivery its functionality rarely matches
the clients expectations. We felt that an organizational approach
based on negotiation between the developers and users throughout the
entire process is required in order to produce high-quality
software\cite{GEMS:JCSE}. Close interaction between users and
developers is beneficial to both sides. The software developer needs
to gain a significant understanding of the problem domain under
consideration in order to be able to provide the best solution to the
problem. It is also very helpful for clients to gain some insight into
the field of available software tools and technologies, since this
will give them a better idea of how a new software system will
interact with existing capabilities. It will also enable making better
decisions about the requirements for such a system. Such an approach
requires users who are willing to invest their time into working with
the developers and giving them feedback on the progress of the
development. Although it may not always be easy to have software
developers interact with users, we strongly believe that it is well
worth the effort on both sides, that it can serve to avoid many of the
problems that arise in the more traditional approaches, and that it
will produce a higher quality software product as a result. 

\section{Conclusion}

As greater computational and regulatory demands are applied to air
quality modeling, the need to provide modeling systems that take
advantage of advanced computational environments is increasing.  A
comprehensive, distributed modeling system has been developed to ease
the use of complex models such as the URM Airshed model, and to
provide the computational power required by this and related models.
GEMS provides ease of model use and result evaluation. It includes a
custom user interface (built on top of commercial visualization
programs), a geographic information system, and data management
capabilities.  At this stage, use of heterogeneous computational
systems is facilitated by PVM, using manual parallelization.  This
parallelization effort highlighted the need for better tools to
support heterogeneous distributed computing.  A second effort uses Fx,
which is proving to be quite expressive and powerful.  The environment
resulting from the combination of GEMS with PVM and/or Fx is being
used to address a variety of air quality issues, including how the the
use of alternative fuels may affect air pollution and what strategies
are most effective for reducing regional smog in the Northeast.

%\newpage
\bibliography{paper}

\end{document}


