To: Distribution
From: 
 David K. Kahaner
 US Office of Naval Research Asia
 (From outside US):  23-17, 7-chome, Roppongi, Minato-ku, Tokyo 106 Japan
 (From within  US):  Unit 45002, APO AP 96337-0007
  Tel: +81 3 3401-8924, Fax: +81 3 3403-9670
  Email: kahaner@cs.titech.ac.jp
Re: High Performance Computing in Japan: Supercomputing
28 June 1992
This file is named "jhpc-sc.92"

ABSTRACT. A summary of high performance computing in Japan (part 1 of 2).

The following report was co-authored by Dr. U.Wattenberg, of the Tokyo
Office of the German National Research Center for Computer Science. A
much shorter version is to be published in the Sept 1992 issue of IEEE
Spectrum. Its level and content are addressed toward readers of
that journal, who may not be experts in computing; this report has more
detail, but is still far from complete. For similar reasons, we have
included only a very few references, although almost every topic treated
deserves a careful citation. We would also like to thank the many people
who helped and gave us timely, understanding advice.  All errors are, of
course, entirely our responsibility.

For electronic distribution this report is broken into two parts, this
part on supercomputing [file "jhpc-sc.92"], and a second on parallel
computing [file "jhpc-pp.92"].

SUPERCOMPUTING AND PARALLEL COMPUTING: THE VIEW FROM JAPAN
Contents: 
   [Sections 1--10 in file "jhpc-sc.92", Sections 11-19 in file "jhpc-pp.92"]
 1.  Introduction
 2.  Research and Development in Japan
 3.  Early government support for supercomputing research in Japan,
     The Superspeed Project
 4.  Supercomputing: How many supercomputers are in Japan?
 5.  Characteristics of supercomputers: Architecture and Performance
 6.  Characteristics of supercomputers: Technology
 7.  Supercomputer performance measurement
 8.  Supercomputer software
 9.  Brief summary of major Japanese supercomputer characteristics
10.  Japanese supercomputers in the US--very few
       ----------remaining sections in file "jhpc-pp.92"
11.  Parallel computing: Early beginning and cautious progress
12.  Japanese parallel computers: A start with applications in physics
13.  Japanese parallel computers: Dataflow machines are still "in"
14.  Japanese parallel computers: Logic programming
15.  Japanese parallel computers: Semi-commercial and in-house use
16.  Japanese parallel computers: Other massively parallel systems
17.  The Real World Computing program
18.  Summary
19.  References
 

SUPERCOMPUTING AND PARALLEL COMPUTING: THE VIEW FROM JAPAN

    Dr. David K. Kahaner
    US Office of Naval Research Asia
  (From outside US):  23-17, 7-chome, Roppongi, Minato-ku, Tokyo 106 Japan
  (From within  US):  Unit 45002, APO AP 96337-0007
    Tel: +81 3 3401-8924,  Fax: +81 3 3403-9670
    Email: kahaner@xroads.cc.u-tokyo.ac.jp

    Dr. Ulrich Wattenberg
    German National Research Center for Computer Science (GMD)
    Deutsches Kulturzentrum
    7-5-56 Akasaka
    Minato-ku, Tokyo 107 Japan
    Tel: +81 3 3586-7104, Fax: +81 3 3586-7187
     Email: wattenberg@gmd.co.jp

1. Introduction

        This paper discusses supercomputing and also parallel computing
activities in Japan. We focus on commercial, pre-commercial, and
experimental prototypes (distinctions between these are sometimes
arbitrary and made for purposes of clarity) and attempt to give a sense
of the important systems and ideas, but make no effort to be exhaustive.
The emphasis is on systems, rather than research in algorithms, software
or tools, which need to be treated in a separate report. Also omitted
for lack of space is any significant discussion of high performance
workstations, networking or communications technology.

        The term supercomputer usually refers to a vendor's latest
offering, and thus is poorly defined. Other papers in this issue treat
the definition of this and related terms; here we use supercomputer to
mean, informally, a large scale, multi-user computer, suitable for a
variety of computational tasks but especially good for numerical
applications based upon arrays (vectors) of floating point numbers. It
is provided with a complete entourage of peripheral devices such as high
speed disks, large memory, etc. But what makes a supercomputer today is
not just the hardware, but a combination of fast processing, large
memory, and fast I/O.  It also consists of certain kinds of software:
common networkable operating systems, compilers which aid in improving
performance by optimizing, vectorizing, and parallelizing, as well as a
large collection of application software and software tools.  It
specifically includes Cray Research Inc's Y-MP, NEC's SX-3, and others
with up to 16 independent processors sharing one commonly addressable
memory.  This description is useful for our discussion, and in no way
suggests that computers not included in the category cannot perform very
significant and cost effective computation.

        We group under the umbrella of parallel computer, those systems
with a large number of individual processing elements, more than 64 and
potentially, hundreds or thousands. It includes products (Thinking
Machines CM-200, Sharp DDP, hypercube multiprocessors, etc.) prototypes
(Fujitsu AP1000, NEC Cenju II), as well as university and other
experimental systems (ETL EM-4, Kyushu University KRPP). (None of these
lists is exhaustive.  These machines can have raw performance better
that of the supercomputers in the preceding paragraph, but they are
neither general purpose nor in mainstream use at this time.) At one time
a useful distinction could be made between shared and distributed memory
computers, with parallel computers mostly being those in which each
processor had its own local memory. But this distinction is blurring, as
many parallel computers have physical distributed memory that can be
treated as a common shared memory, and shared memory computers can
usually have their memory partitioned so that it is available to
individual processors.

	To understand the high performance computing environment in
Japan, it is useful to have a brief overview of the roles of government
and industry in Japanese research and development funding. The next
section gives an introduction to this topic.

2. Research and Development in Japan
# Government plays a small role in research #

	In Japan, information technology is the most important area of
research besides life science and environmental research. The budget for
R&D in information processing in Japan amounted in 1989 to 1012 billion
Yen, of which 958 billion were spent by industry, 24 billion by private
research institutes, 23 billion by universities, and 5 billion by
governmental research institutes [1]. (There are approximately 125 Yen
per US dollar.)

(a) R&D at universities
	There are about 500 Universities in Japan with some 100 of them
in Tokyo and its suburbs. Most of the universities, however, are private
and are, with some exceptions, mainly concerned with education. Even at
national universities, intensive research is concentrated at the seven
so called imperial universities, the first ones, founded in the
seventies and eighties of the last century in each part of Japan: Tokyo,
Kyoto, Osaka, Tohoku (Sendai), Hokkaido (Sapporo), Nagoya, and Kyushu
(Kita-Kyushu).  Within this group, Tokyo University has traditionally
taken a central role and often advises on government projects. After the
war, some other universities achieved a higher profile, e.g. Kobe,
Hiroshima, and Tsukuba University (outside Tokyo). There is little
project funding by the supervising Ministry of Education (Mombusho, also
written MESC). Private universities, especially Keio and Waseda, both in
Tokyo, also engage in research in science and technology.

(b) R&D at national laboratories and other non-profit research 
laboratories
	National laboratories in the field of science and technology in
Japan are supervised by several different ministries or agencies with
little cross-funding of research projects. A leading role in this field
is played by the Electrotechnical Laboratory (ETL) in the science city
Tsukuba. There, fewer than 200 researchers out of 700  are concerned
with information processing, but the principal researchers always play a
leading role not only in preparing MITI (Ministry of International Trade
and Industry) projects but also in implementing them. There are also
some quasi-national laboratories established for a limited period of
time, e.g. ICOT (Institute for New Generation Computer Technology--see
Section 14), associated with the Fifth Generation Computer Systems project.
After finishing the project, the researchers return to their mother
organizations. 

(c) R&D in the computer industry
	As mentioned above, industry spent 958 billion Yen on R&D in
1989, with a growth rate of 25% compared with 1988. Half of that amount
was spent in the computer industry proper, the other half being spent on
other industrial sectors. It has to be remembered that most of the
budget was spent on development, with only about six percent for any
kind of long term research, including parallel, neural and optical
computing.  Thus, when long term research is considered, government and
universities were spending about as much as the industrial sector, about
50 billion yen each.  Also private, is  Nippon Telegraph and Telephone
(NTT), which carries out long term research in several broad fields.

(d) Cooperative research between all three sectors
	Some years ago, the key phrase "san-gaku-kan" began to appear in
every document on Japanese research policy. It is a short form for
research cooperation between industry (san), universities (gaku) and
governmental research institutes (kan).  Discussions showed that in
Japan this cooperation was not (and is not yet) well established; the
biggest problem exists within the government itself. In principle, the
Ministry of Education is concerned with basic research, the Science and
Technology Agency (STA) with "big science", e.g. nuclear energy, air and
space development, and MITI with applied research, but there is
naturally an overlap between these areas. In order to minimize
organizational problems, there is no cross-funding between MITI and the
Ministry of Education. Within the (new) Real World Computing project
(see Section 17), which will be closer to basic research than MITI
projects in the past, a softening of these strict regulations is
expected. At the same time, Japan is changing its laws and regulations
to make participation by foreign researchers in national projects
easier.

3. Early government support for supercomputing research in Japan,
   The Superspeed Project

	At the end of the seventies, as it became apparent, that new
computer architectures and new devices  would be necessary for  future
needs in information processing,  MITI went the usual way in bringing
together experts from universities, governmental research laboratories
and industry to formulate a project proposal. The outcome was quite
unusual, as MITI decided to run two large projects in parallel, the High
Speed Computing System for Scientific and Technological Uses Project,
dubbed the Superspeed Project, (1981-1989, 23 Billion Yen) and the Fifth
Generation Computer System Project (1982-1991, 55 billion Yen). Where
the FGCS Project aimed at a risky, new  computing paradigm, cutting
relationships to existing computer systems (Section 14), the Superspeed
Project can be seen more as an extension of the present systems. It
aimed at the development of a high-speed computing system for scientific
and technical applications.  The target system was supposed to operate
at a rate of more the 10 GFLOPS, which was 100 to 1000 times faster than
the speed of conventional computers at that time. Two major R&D projects
were conducted: one on high speed novel devices and one on computer
architecture, algorithms and languages for parallel computing.

	The six major vertically integrated computer/semiconductor
companies - Fujitsu, Hitachi, Mitsubishi, NEC, Oki, Toshiba - together
with the ETL participated in the project. Matsushita and Sony wanted to
join the project but were not allowed in to discourage excessive
competition. The research on high speed devices was divided up among the
six participating firms: NEC, Toshiba, Hitachi, and Mitsubishi
researched gallium arsenide (GaAs) chips; Fujitsu, Hitachi, and NEC,
Josephson junctions; Fujitsu and Oki, HEMT (High electronic mobility
transistor) devices.

	The research on parallel processing was divided into three
subgroups: a high speed parallel (4 CPU) subproject (called
PHI-Parallel, Hierarchical Intelligent computer project); the Sigma-I
dataflow subproject; and a satellite image processing subproject. Of the
three, PHI was the most important. In a practical approach to developing
a 4 CPU machine as quickly as possible, the subproject combined four of
Fujitsu's existing one processor VP 2000 supercomputers. To this
combination was added a large high-speed common memory. Since each of
the VPs already had its own memory the concept of a hierarchical memory
structure appeared. The idea was that a user shouldn't have to know
about this hierarchy and could treat the memory as "flat".

	The project was safely concluded in 1990 by demonstrating the
PHI system to the evaluation team. The prototype high speed parallel
system using 4 processors ran at over 10 GFLOPs, peak, and had real
performance of over 1GFLOP. NEC wrote and tested one benchmark that
solved a very large (32K) system of linear equations in under 11 hours.
This was not a prototype of a machine that could be directly
commercialized. Gallium arsenide devices-- HEMT and MESFET-- were used,
though not as extensively as envisioned; Josephson Junction devices were
not used at all, although advances in Josephson junctions put Japan in
the lead in this area.  Less tangibly, the project focused the private
sector on supercomputers at a critical time, earlier and more heavily
than they would have done individually. Of course, cooperation also
meant that work was done faster and more economically.  Individually,
the Japanese companies were also investing heavily, some estimates were
as high as 3-4 times the government figure, $300-500 million by each of
the three. [2]

	The second architectural subproject - the Sigma-I Dataflow
subproject- focused on developing a machine with 128 processors, a
precursor to a massively parallel machine with 1024 processors within
ETL. The research group around Toshio Shimada successfully completed the
128 processor machine in 1989, but apparently the basic design was
given up in favor of other approaches to (modified) dataflow machines.

	The third subproject was the satellite image data processing
system. Three types of architecture were explored: Toshiba focused on a
high speed 3 dimensional display processor using 16 very fast VLSI
processors; Mitsubishi developed a cellular array processor CAP with
4096 PEs, operating in a SIMD mode;  Oki worked on a 2 dimensional
display using 8 processors for use in global data processing networks.

	The Superspeed Project was considered by the Japanese companies
as helpful, but the results have not yet been incorporated into
individual products. Also some differences can be observed between the
attitude of the Big Three (Fujitsu, Hitachi and NEC), which could do
without the governmental subsidies and the minor ones (Mitsubishi, Oki,
Toshiba), which would have had more difficulty embarking in these new
directions without the extra funds.


4. Supercomputing
 #How many supercomputers are in Japan?#
 
        There are between four and five hundred supercomputers installed
worldwide (this excludes IBM installations which are difficult to
count); about 125 of these are now in Japan. Three large Japanese
electronic companies, NEC, Fujitsu, and Hitachi produce  shared memory
supercomputers with some parallel features; these are products, and are
supported and marketed as such.  Within Japan, Fujitsu has almost half
of the supercomputer installations; Cray, Hitachi, and NEC sharing the
balance. (Counting replacements and upgrades, about 250 supercomputers
have been installed in Japan; with Fujitsu again providing about half of
these.)
 
         There are about 40 supercomputers at Japanese universities but
the number could be misunderstood because at least a third are older
machines or others with very modest performance.  It also includes
computers with nonstandard operating systems, few standard application
software products, and inadequate networking. (These might still be
appropriate for training and some applications.) Today, high performance
workstations have significant computational capability and memory size
although they are not counted as supercomputers.  Thus, a supercomputer
count reflects systems that (at time of installation) were unequivocally
viewed as supercomputers by the vendor (supplier), purchaser, and by
most knowledgeable members of the community. Such a number is best
understood only as a qualitative measure.

	Most Japanese university scientists can get supercomputer time,
but rarely on a top-end machine which are mostly at industrial labs or
in the prestigious national universities.  Access to supercomputers at
Japanese universities has improved markedly in the past two or three
years, although in our opinion, it is still below what is available to US
academics.  There are nothing comparable to the US NSF supercomputer
centers. (Supercomputer centers are established at major universities,
at several government laboratories, as well as at private corporations.
Recruit's Institute for Supercomputing Research and the Institute of
Computational Fluid Dynamics, are good examples of the latter.)

	Networking has improved recently.  But academic networking is
not as ubiquitous as it is in the US; the prestigious universities have
excellent services while many other universities have none.  There are
more, high performance networks in the US than in Japan.  Network
interconnectivity in the US is also much better than in Japan; several
more or less independent Japanese networks are supported by different
Ministries. Researchers in Japan sometimes communicate with each other
or with colleagues in Europe by transiting through the US. (This is
changing. For example the Japanese government is establishing a direct
link to Europe for collaborative research within the context of the Real
World Computing project--see below.) Counterparts to very high
performance networking projects in progress or planned in the US have
not yet jelled in Japan.  However, Japan has excellent and in some cases
unique technology including a large infrastructure in the ISDN, and
their networking difficulties seem to be more social, organizational, or
cultural than technological.  Nevertheless, research in supercomputing
lags that of the West, except for applications developers working on
commercial software packages.  There are one or two supercomputer
conferences each year with small technical programs--fewer than one
third the number of papers presented at US conferences.

5. Characteristics of Supercomputers: Architecture and Performance

        Today's supercomputers have a large memory, 1-32 Gigabytes, and
several (currently up to 16) independent and very high performance CPUs
(sometimes called Functional Units--FUs). Within each CPU there are
several pipelines (pipes) consisting of the components that add,
multiply., etc. (Within a CPU the pipes have only one instruction path
and must all carry out the same calculation, whereas different
instructions can be executing on the independent CPUs.) A floating point
operation (FLOP) is not achieved until the pipe has been filled, but
once this happens a new FLOP occurs each clock cycle (hence the term,
pipe).  Data can be moved from/to memory at rates up to a few Gigabytes
per second, but this is not fast enough to keep up with the arithmetic
performance.  Thus some kind of memory hierarchy is employed. For
example, within each CPU, data from memory go first to registers which
are built of the fastest and most expensive SRAM (static random access
memory) chips and have a capacity up to about one megabyte.  Data from
the registers can be operated on by the pipelined arithmetic units at
the peak hardware speed under certain circumstances.

	An essential difference between US and Japanese supercomputers
has been that US supercomputers have more CPUs with each having a small
number of pipes. On the other hand, Japanese machines have had fewer
CPUs but each has more pipes, as many as 16. This is mostly a case that
US companies have more experience building multi-CPU machines; the
distinction is slowly changing as the Japanese add more CPUs to their
systems.

        Peak performance can be computed from the hardware
specifications of the machine. It is obtained by dividing the total
number of independent add and multiply pipes by the clock cycle time in
nano seconds (ns) to produce a result in Gigaflops (GFLOPs).
Performance of Japanese supercomputers is always specified in terms of
the peak that the hardware can achieve.  Peak performance varies from
about 5GFLOPs for Fujitsu VP2600 (billions of 64 bit floating point
operations per second) to 32GFLOPs for Hitachi S-3800. The Cray Y-MP C90
has a peak speed of about 15GFLOPs. NEC's SX-3 has a peak of 26GFLOPs.

	Of course, most real applications will exhibit performance far
below the peak. Actual performance is measured in terms of throughput,
performance on specific applications or benchmarks, etc. (Informally,
many scientists assume that usable speed is one order of magnitude less
than claimed peak.) This can be heavily influenced by how rapidly and in
what quantity data can be moved around.  The startup time to fill a pipe
from a register is an overhead, and it will reduce the computing speed
unless it can be amortized over a sufficiently large number of
calculations. If there are many pipes then subdividing arrays to utilize
them all reduces the number using each and increases the relative
importance of the startup.  Also, bandwidth between memory and registers
must match the realizable speed of the CPUs.  There is additional
overhead (memory latency) arising in the process of fetching numbers
from memory for deposit in the registers, and this depends on the type
of memory chips used, how skillfully irregular retrievals are carried
out, and whether bank or other conflicts in memory are avoided.  In real
problems, there are significant fractions of the program that require
floating point computation of scalars as distinguished from arrays. Some
supercomputers such as Fujitsu VP2000 have two, separate scalar
arithmetic units for each CPU which operate concurrently with the vector
(array) unit.  Like data movement, these scalar units are not relevant
in computing peak performance, but are important in measuring real
performance.

	To summarize the preceding paragraphs, the key to building a
high performance supercomputer is to balance memory capability,
arithmetic processor performance, data movement capability, etc. Each
component plays a crucial role. This is generally related to the overall
architectural design of the system, and is an area in which Cray has
been particularly strong.


6. Characteristics of Supercomputers: Technology

	Another way to make machines faster is to use faster components,
hardware, and devices, and the Japanese have excelled here.

        In this area the key ingredient is the parent company's use of
their highest level and most sophisticated technology.  NEC states this
explicitly in their 1990 annual report, "the actual performance of a
supercomputer is determined by its scalar performance.... NEC's approach
to supercomputer architecture is clear.  Our first priority is to
provide high-speed single processor systems which have vector processing
functions and are driven by the fastest technologies, while giving due
consideration to ease of programming and ease of use; we also seek to
provide shared memory multiprocessor systems to further improve
performance." Hitachi's chief engineer Michihiro Hirai says that
"hardware technology is one of the key determining factors of the
supercomputer's performance." Similarly, Fujitsu Director Toshio
Hiraguri says "conclusively, the past breakthroughs in computer hardware
technology resulted from challenging what appeared to be technological
limits in the field of large-scale computers." The Japanese see four
major hardware tasks as being key to additional performance, faster
chips, smaller size, heat reduction, and elimination of logic bugs.

        Supercomputers from NEC, Fujitsu, and Hitachi use tried and true
emitter-coupled logic (ECL) semiconductor technology for basic processor
chips, but have pushed their capabilities in this area quite far. For
example, clock cycle time varies from 3.2nano second (Fujitsu), 2.5nano
second (NEC), to about 2.0nano second (Hitachi). These figures are
better than U.S.  products (the Cray Y-MP C90 has a cycle time of
4.2nano seconds).  Faster clocks translate into better performance.
Another example of technology push is in the area of lithography, the
process of outlining circuits.  Beginning as an optical process
generating 10-micron line widths in the 1960s, the practice is now an
X-ray process in the 0.8-0.5 micron range.  As line-widths become
narrower,  more highly packed chips can be built.  The Japanese are
aggressively working to reduce line width, and also to improve width
variability in the hopes that the former will translate into direct
performance improvements, the latter into less conservative designs--
hence also improved performance.  ECL gate densities are also improving.
Hitachi's newly announced (1992) supercomputer uses 25,000- gate arrays,
NEC's (introduced in late 1989) has 20,000-gate arrays, and Fujitsu's
(also introduced in 1989) uses 15,000-gate arrays.

        High end Japanese machines all have water cooled CPUs, but
slightly slower air cooled versions are also available. In addition, air
cooling is used in peripheral devices. Fujitsu uses GaAs (gallium
arsenide) chips in some of its peripherals so these can be effectively
cooled by air (GaAs can run cooler than silicon).  Generally, the use of
exotic device technology has been fairly conservative, although there
are research projects at all the large Japanese companies.  Thus far
GaAs is not being used for CPU chips in any commercial Japanese
machines, nor are even more sophisticated Josephson junction circuits.
Fujitsu used the Superspeed project results (Section 3) to develop a
hybrid Josephson junction-VLSI device, and plans to use it in its next
generation supercomputers, probably out in the mid-nineties.  (It takes
3-5 years to produce a large scale supercomputer product.) Similarly,
NEC developed GaAs logic devices as well as memory chips and has
designed a multichip package for supercomputers.  GaAs is seen as slowly
replacing ECL although the Japanese are convinced that there are still
performance gains to be obtained with silicon.
	
	The issue of silicon versus GaAs is interesting and also shows
how difficult it is to predict based on technological development.  In
1989 Spectrum [3] commented on plans by Seymour Cray to develop a GaAs
based supercomputer.  "The company hopes the GaAs circuitry, along with
a four-fold increase in parallelism, will improve on the Cray-2's
performance by a factor of 12." This would have put the projected peak
at about 5GFLOPs. Cray's new machine was to be available in late 1989
but is still under development at this time.

	As a final example of Japanese pushing technological excellence
to build high performance systems we note that NEC has demonstrated that
they can build in a single 17mm-square air cooled chip, a 64 bit
floating point processor, with peak performance of 0.2GFLOPs. A
1GFLOP processor would only require five such chips.

	Supercomputer reliability is another area where Japanese
technology plays a role.  Where hardware is concerned, the Japanese
have a substantial advantage, and although the Cray Y-MP series (which
is based on VLSI technology) has been markedly better than Cray's
previous generations, it lags behind the Japanese.

	In summary, Japanese industry's "pursuit of more powerful single
processors is supported by a wide spectrum of in-house technologies
ranging from single crystal production, VLSI fabrication, packaging, to
assembling, and testing. The cost of hardware development is diffused by
employing the same components in supercomputers and mainframes."[4]

7. Supercomputer Performance Measurement

	There is a great deal of controversy surrounding the selection
of performance benchmarks, and whether they are oversimplified and fail
to bring in adequate complexity. Not only the selection of problems, but
to what extent automatic or manual optimization can be performed is
relevant. Taking programs and making them run efficiently on a given
supercomputer can either be done by hand or automatically.  Automatic
vectorizers take low-level loops and try to make use of all the pipes on
a single CPU to gain efficiency. Autotaskers operate at a higher level,
partitioning the program and causing separate parts to run on different CPUs
concurrently.  Performance of the Japanese single CPU systems has
significantly improved to the point where they are internationally
competitive.  For example, in a recent benchmark [5] on a typical
aerodynamics program, the authors comment that "one cannot avoid being
impressed by the demonstrated power of all the [US and Japanese]
benchmarked computers, due both to the speed of the hardware components
and to the capability of their compilers. A sustained speed of at least
1GFLOPs was achieved by all computers [on one CPU]." Nevertheless this
is very application dependent.

	Today, the trend is toward purchasing high end supercomputers
with more than one CPU, and a distinction should be made between single
and multiple CPU performance. Overall throughput performance relates to
running a mix of programs each on one CPU. Peak performance relates to
running one program using several CPUs. Computer Center managers are
mostly concerned with throughput; headlines focus on peak.  Quoted peak
performance figures, such as 32 or 26GFLOPs, is "number of
processors" times single processor peak (the second situation above),
but this peak figure is much more difficult to achieve in practice than
the peak on a single CPU.  Not only does it demand more balance from the
hardware but also demands more from the system software and the system
autotasking capabilities.  Japanese supercomputers have very fast CPUs,
because of the excellent fundamental technology mentioned above, and
also because of their use of many data pipes. At the time of the writing
of this report only NEC among the Japanese supercomputer manufacturers
has a multiprocessor system installed outside its own facilities.  NEC's
single processor performance has been excellent, but demonstrated
performance on the (new) multiple CPU system has been substantially
below its potential peak.  This is a combination of lack of bandwidth to
memory (balance) and lack of multiprocessor experience.  Neither Fujitsu
nor Hitachi have multi-CPU systems in the field to test, and we can only
speculate that their situation will be similar.  As Japanese computer
vendors gain more experience with multi-CPU systems, performance is
bound to improve. But at the moment it is several years behind Cray.

8. Supercomputer Software

	Software for supercomputers includes compilers, libraries,
operating systems, support for networking, and software tools.

        All three Japanese supercomputers now are available with a
customized version of the Unix operating system.  The use of Unix will
help the migration of application programs onto Japanese systems. People
are just now coming to grips with the need to assess software costs, and
moving to Unix is clearly seen as one way to reduce costs for the end
user as well as the vendor. In Japan, this is a change from the use of
proprietary operating systems that has occurred only in the past two or
three years. For Hitachi it is only just now occurring, and the company
has not totally embraced Unix -- its newest supercomputer is available
in a Unix version, and also with the company's own IBM-like operating
system for compatibility with older Hitachi systems. The situation is
similar for Fujitsu, which also supports both Unix and its own system.

	In the past, applications developed in the West have been
installed very slowly, and this was a a major impediment to the purchase
of Japanese supercomputers both in and outside Japan. Using Unix will
improve this situation. However, using a standard operating system only
means that software portability is improved and development time is
reduced, not that a program will run efficiently. There does not yet
seem to be any shortcut to maximum performance short of incorporating
knowledge of the hardware into the algorithms and software.

        Early Japanese supercomputer software development was limited to
producing Japanese language interfaces for Western software products,
and this is still an important activity.  For example, NEC has recently
moved the latest version of the heavily used engineering analysis system
NASTRAN to its supercomputers, and the company's supercomputer
promotional literature lists about one hundred products (many from the
West) that are available in the areas of structural analysis, fluid
dynamics, mechanical analysis, crash analysis, civil engineering,
magnetics, acoustics, electron device simulation, injection molding,
chemistry, graphics, mathematics, etc.  Other vendors are engaged in
similar projects.  But more recently, first rate packages designed and
implemented in Japan are appearing.  Good examples are DEQSOL from
Hitachi for the solution of the partial differential equations arising
in engineering simulation, Alpha-flow from Fuji Research Institute for
solution of fluid dynamics problems, Fortran/KR from Fujitsu allowing
object oriented programming from within a Fortran environment, and AMOSS
from NEC for molecular orbital calculations. There is also a clear trend
to enlist Western scientists who already have experience with
supercomputer implementations. NEC and Fujitsu have established
facilities as well as collaborative research with groups in the US,
Australia, and Europe for this purpose.  Thus Japanese vendors are
becoming more effective at accessing expertise outside of Japan.

        For those users who need to create software (rather than using
existing applications) standard languages such as Fortran and C are
available on all Japanese supercomputers, and the vendors are careful to
insure that these meet all announced standards, although they have
various enhancements too. To get efficient programs users can rearrange
their algorithms, insert special directives within their programs, and
also use vendor provided automatic vectorizers and autotasking.
Optimized vendor libraries (providing functionality such as matrix
manipulation, fast Fourier transform, etc.) with simple interfaces are
another good way to obtain efficiency.  The three Japanese supercomputer
companies have large teams of programmers developing these libraries,
and they also support well known commercial libraries from the West,
IMSL and NAG, and non commercial projects such as Eispack, Linpack, etc.
If the user interfaces are standardized, then portability is maintained
along with efficiency.  But there is no work originating in Japan with
an eye toward standardization of scientific software.  And there is
almost no research comparable to that in the West on portable numerical
algorithms, as typified for example, by the Lapack project at the
University of Tennessee and other cooperating places.  Nor is there much
pressure to develop standardized software; vendors and users still
develop libraries and user interfaces for their own platforms and
applications.  Japanese computer users can, and do write their own
application software. People who have studied it from the inside claim
it can be quite good.  One American computer science graduate student
who has been working at a major Japanese government research lab
commented as follows. "As far as I can tell, the guys here can write
software like demons. I know a lot of good hackers back at school [MIT],
as well as from my experience in industry.  These guys are major hackers
-- definitely quite above average, although not in the top range of my
experience."

         In summary, Japanese software for supercomputers, especially
for multiprocessors, is only now emerging.

	Japanese supercomputer vendors have made major strides in
networkability of their products, and support for high speed interfaces
such as HIPPI (High-speed Interface Adapter) is available.

9. Brief Summary of Major Japanese Supercomputer Characteristics

(a) Hitachi
        Hitachi's newest supercomputers, announced during the spring
1992, are, HITAC S-3800 and HITAC S-3600. (HITAC is the trade-name used
by Hitachi for all their large-scale systems.) Delivery is scheduled for
January 1993. At this point there are no plans to market the systems in
the U.S.

        The S-3800 can be obtained as a multiprocessor system (up to
four processors), with a clock cycle estimated to be 2 nano seconds
using silicon. This is the fastest clock of any commercial supercomputer
system. These are the third generation of HITAC supercomputers,
following HITAC S-810 and S-820. In the past year Hitachi has not sold
many of their 810/820s because performance lagged that of NEC and
Fujitsu products, so this product has been needed. Significantly,
Hitachi supercomputers will now support the Unix operating system.

        The S-3800 is water cooled and has six models. The high-end
machine, model 480 has four CPUs, each with a scalar and vector unit,
eight add/multiply pipes, four division pipes, four load pipes, four
load/store pipes, and a mask pipe. Main storage is 2Gigabytes, and peak
performance is 8GFLOPs per CPU and a maximum of 32GFLOPs for the four
processor model.

	An interesting option is Scientific Animation Graphics, that
allows direct access to high speed storage and can send out either
ordinary (NTSC) or high definition TV (HDTV) signals to an outboard
display unit at standard video rates of 30 frames each second. HDTV
requires 119 Mbytes/second data transfer rate, which is supported.

(b) NEC
        NEC introduced their SX-3 series almost three years ago, and it
is the Japanese computer that created the most stir in the West.
Recently, they made a modification allowing the 2.9nano second cycle
time to be reduced to 2.5nano seconds; the new model has the R
designation, SX-3R. The high end model SX-3R/44 has four CPUs and a
total of 64 arithmetic pipes. Peak performance is 6.4GFLOPs per CPU, and
a maximum of 25.6GFLOPs. Up to 8Gbytes of memory are available.  NEC has
been trying hard to sell their supercomputers in the US, even to the
extent of using the court system to force entry into, what they felt,
was a closed procurement. SX-3 computers run Unix. A few SX-3s have been
installed outside of Japan and performance measurements are beginning to
be published.

(c) Fujitsu
        Fujitsu's VP2000 series supercomputers come in a variety of
models, of which two have peak performance of 5GFLOPs. The VP2600/20
has one CPU, while the VP2400/40 has two. Thus one has a choice of more
asynchronous parallelism (VP2600) or synchronous parallelism (VP2400) to
obtain the same peak performance. Clock cycle time is 3.2 nano seconds,
and memory is 3Gbytes.  Each CPU has two pipes, but each pipe can
simultaneously perform addition and multiplication. Further, the VP2400
and VP2600 have double and quadruple thick pipes, meaning that each pipe
on the VP2600 can perform 16 floating point operations per clock cycle;
on the VP2400 the corresponding number is 8 per cycle. Both Unix and
Fujitsu's own operating system are supported, but recently, a new model
VPX, with the same performance characteristics but which only runs Unix
has been made available. Recently, Fujitsu has stepped up its efforts to
market its supercomputers in the US, but plans to focus on private
industry rather than government sales.

10. Japanese Supercomputers in the US--Very Few

        As of this writing, there are no Japanese supercomputers in the
US, except for an SX-3 at NEC's Houston center, two Fujitsu systems sold
to a US subsidiary of a Norwegian company in 1987 and 1988, and a VP
2400 at the company's San Jose facility.  Resistance in the US is partly
due to concern about proven cost effective performance, partly to the
wide availability of US-based Crays with a large software base and
experienced user community, and partly to hesitancy on the part of US
computer center directors (especially within government) to buck
explicit or implicit disapproval from Congress and funding agencies
without any strong countervailing arguments. Both NEC and Fujitsu have
been more successful at selling outside the US, aided perhaps, by
generous terms.  

         What situations will eventually generate sales at fair market
prices in the US, even to skittish government labs?  The two most
probable are (a) a "solution" with either specific software, or
integration into an existing hardware environment, or (b) performance.
(a) might occur in an international organization that is already using
Japanese products outside of the US and wants a high degree of
compatibility in their US facility. For (b) we asked the question of how
much performance was necessary at a recent conference attended by US
experts with purchasing responsibilities.  Japanese supercomputers will
have to perform better than their US counterparts to convince these
people, perhaps by as much as a factor of ten and at least by a factor
of three or four. At this time, Japanese supercomputer marketing in the
US is directed toward private sector sales.

	On the other side, US supercomputer makers assert that sales of
non-Japanese supercomputers to Japanese government institutes and
universities are hindered by various unfair practices. It is not the
purpose of this paper to discuss these issues, except to say that they
are under negotiation.

        For future trends, it is clear that the Japanese see that ECL
technology is nearing its limitation, but is not yet dead. Cooling will
be improved and circuit density will increase, but these technologies
too will be increasingly difficult to push. The only approaches are then
to make individual processors faster by using radically new
technologies, such as optical, biological, etc., or to develop parallel
systems.


--------------FOR SECTIONS 11-19, SEE FILE "jhpc-pp.92--------------------