To: Distribution From: David K. Kahaner US Office of Naval Research Asia (From outside US): 23-17, 7-chome, Roppongi, Minato-ku, Tokyo 106 Japan (From within US): Unit 45002, APO AP 96337-0007 Tel: +81 3 3401-8924, Fax: +81 3 3403-9670 Email: kahaner@cs.titech.ac.jp Re: High Performance Computing in Japan: Supercomputing 28 June 1992 This file is named "jhpc-sc.92" ABSTRACT. A summary of high performance computing in Japan (part 1 of 2). The following report was co-authored by Dr. U.Wattenberg, of the Tokyo Office of the German National Research Center for Computer Science. A much shorter version is to be published in the Sept 1992 issue of IEEE Spectrum. Its level and content are addressed toward readers of that journal, who may not be experts in computing; this report has more detail, but is still far from complete. For similar reasons, we have included only a very few references, although almost every topic treated deserves a careful citation. We would also like to thank the many people who helped and gave us timely, understanding advice. All errors are, of course, entirely our responsibility. For electronic distribution this report is broken into two parts, this part on supercomputing [file "jhpc-sc.92"], and a second on parallel computing [file "jhpc-pp.92"]. SUPERCOMPUTING AND PARALLEL COMPUTING: THE VIEW FROM JAPAN Contents: [Sections 1--10 in file "jhpc-sc.92", Sections 11-19 in file "jhpc-pp.92"] 1. Introduction 2. Research and Development in Japan 3. Early government support for supercomputing research in Japan, The Superspeed Project 4. Supercomputing: How many supercomputers are in Japan? 5. Characteristics of supercomputers: Architecture and Performance 6. Characteristics of supercomputers: Technology 7. Supercomputer performance measurement 8. Supercomputer software 9. Brief summary of major Japanese supercomputer characteristics 10. Japanese supercomputers in the US--very few ----------remaining sections in file "jhpc-pp.92" 11. Parallel computing: Early beginning and cautious progress 12. Japanese parallel computers: A start with applications in physics 13. Japanese parallel computers: Dataflow machines are still "in" 14. Japanese parallel computers: Logic programming 15. Japanese parallel computers: Semi-commercial and in-house use 16. Japanese parallel computers: Other massively parallel systems 17. The Real World Computing program 18. Summary 19. References SUPERCOMPUTING AND PARALLEL COMPUTING: THE VIEW FROM JAPAN Dr. David K. Kahaner US Office of Naval Research Asia (From outside US): 23-17, 7-chome, Roppongi, Minato-ku, Tokyo 106 Japan (From within US): Unit 45002, APO AP 96337-0007 Tel: +81 3 3401-8924, Fax: +81 3 3403-9670 Email: kahaner@xroads.cc.u-tokyo.ac.jp Dr. Ulrich Wattenberg German National Research Center for Computer Science (GMD) Deutsches Kulturzentrum 7-5-56 Akasaka Minato-ku, Tokyo 107 Japan Tel: +81 3 3586-7104, Fax: +81 3 3586-7187 Email: wattenberg@gmd.co.jp 1. Introduction This paper discusses supercomputing and also parallel computing activities in Japan. We focus on commercial, pre-commercial, and experimental prototypes (distinctions between these are sometimes arbitrary and made for purposes of clarity) and attempt to give a sense of the important systems and ideas, but make no effort to be exhaustive. The emphasis is on systems, rather than research in algorithms, software or tools, which need to be treated in a separate report. Also omitted for lack of space is any significant discussion of high performance workstations, networking or communications technology. The term supercomputer usually refers to a vendor's latest offering, and thus is poorly defined. Other papers in this issue treat the definition of this and related terms; here we use supercomputer to mean, informally, a large scale, multi-user computer, suitable for a variety of computational tasks but especially good for numerical applications based upon arrays (vectors) of floating point numbers. It is provided with a complete entourage of peripheral devices such as high speed disks, large memory, etc. But what makes a supercomputer today is not just the hardware, but a combination of fast processing, large memory, and fast I/O. It also consists of certain kinds of software: common networkable operating systems, compilers which aid in improving performance by optimizing, vectorizing, and parallelizing, as well as a large collection of application software and software tools. It specifically includes Cray Research Inc's Y-MP, NEC's SX-3, and others with up to 16 independent processors sharing one commonly addressable memory. This description is useful for our discussion, and in no way suggests that computers not included in the category cannot perform very significant and cost effective computation. We group under the umbrella of parallel computer, those systems with a large number of individual processing elements, more than 64 and potentially, hundreds or thousands. It includes products (Thinking Machines CM-200, Sharp DDP, hypercube multiprocessors, etc.) prototypes (Fujitsu AP1000, NEC Cenju II), as well as university and other experimental systems (ETL EM-4, Kyushu University KRPP). (None of these lists is exhaustive. These machines can have raw performance better that of the supercomputers in the preceding paragraph, but they are neither general purpose nor in mainstream use at this time.) At one time a useful distinction could be made between shared and distributed memory computers, with parallel computers mostly being those in which each processor had its own local memory. But this distinction is blurring, as many parallel computers have physical distributed memory that can be treated as a common shared memory, and shared memory computers can usually have their memory partitioned so that it is available to individual processors. To understand the high performance computing environment in Japan, it is useful to have a brief overview of the roles of government and industry in Japanese research and development funding. The next section gives an introduction to this topic. 2. Research and Development in Japan # Government plays a small role in research # In Japan, information technology is the most important area of research besides life science and environmental research. The budget for R&D in information processing in Japan amounted in 1989 to 1012 billion Yen, of which 958 billion were spent by industry, 24 billion by private research institutes, 23 billion by universities, and 5 billion by governmental research institutes [1]. (There are approximately 125 Yen per US dollar.) (a) R&D at universities There are about 500 Universities in Japan with some 100 of them in Tokyo and its suburbs. Most of the universities, however, are private and are, with some exceptions, mainly concerned with education. Even at national universities, intensive research is concentrated at the seven so called imperial universities, the first ones, founded in the seventies and eighties of the last century in each part of Japan: Tokyo, Kyoto, Osaka, Tohoku (Sendai), Hokkaido (Sapporo), Nagoya, and Kyushu (Kita-Kyushu). Within this group, Tokyo University has traditionally taken a central role and often advises on government projects. After the war, some other universities achieved a higher profile, e.g. Kobe, Hiroshima, and Tsukuba University (outside Tokyo). There is little project funding by the supervising Ministry of Education (Mombusho, also written MESC). Private universities, especially Keio and Waseda, both in Tokyo, also engage in research in science and technology. (b) R&D at national laboratories and other non-profit research laboratories National laboratories in the field of science and technology in Japan are supervised by several different ministries or agencies with little cross-funding of research projects. A leading role in this field is played by the Electrotechnical Laboratory (ETL) in the science city Tsukuba. There, fewer than 200 researchers out of 700 are concerned with information processing, but the principal researchers always play a leading role not only in preparing MITI (Ministry of International Trade and Industry) projects but also in implementing them. There are also some quasi-national laboratories established for a limited period of time, e.g. ICOT (Institute for New Generation Computer Technology--see Section 14), associated with the Fifth Generation Computer Systems project. After finishing the project, the researchers return to their mother organizations. (c) R&D in the computer industry As mentioned above, industry spent 958 billion Yen on R&D in 1989, with a growth rate of 25% compared with 1988. Half of that amount was spent in the computer industry proper, the other half being spent on other industrial sectors. It has to be remembered that most of the budget was spent on development, with only about six percent for any kind of long term research, including parallel, neural and optical computing. Thus, when long term research is considered, government and universities were spending about as much as the industrial sector, about 50 billion yen each. Also private, is Nippon Telegraph and Telephone (NTT), which carries out long term research in several broad fields. (d) Cooperative research between all three sectors Some years ago, the key phrase "san-gaku-kan" began to appear in every document on Japanese research policy. It is a short form for research cooperation between industry (san), universities (gaku) and governmental research institutes (kan). Discussions showed that in Japan this cooperation was not (and is not yet) well established; the biggest problem exists within the government itself. In principle, the Ministry of Education is concerned with basic research, the Science and Technology Agency (STA) with "big science", e.g. nuclear energy, air and space development, and MITI with applied research, but there is naturally an overlap between these areas. In order to minimize organizational problems, there is no cross-funding between MITI and the Ministry of Education. Within the (new) Real World Computing project (see Section 17), which will be closer to basic research than MITI projects in the past, a softening of these strict regulations is expected. At the same time, Japan is changing its laws and regulations to make participation by foreign researchers in national projects easier. 3. Early government support for supercomputing research in Japan, The Superspeed Project At the end of the seventies, as it became apparent, that new computer architectures and new devices would be necessary for future needs in information processing, MITI went the usual way in bringing together experts from universities, governmental research laboratories and industry to formulate a project proposal. The outcome was quite unusual, as MITI decided to run two large projects in parallel, the High Speed Computing System for Scientific and Technological Uses Project, dubbed the Superspeed Project, (1981-1989, 23 Billion Yen) and the Fifth Generation Computer System Project (1982-1991, 55 billion Yen). Where the FGCS Project aimed at a risky, new computing paradigm, cutting relationships to existing computer systems (Section 14), the Superspeed Project can be seen more as an extension of the present systems. It aimed at the development of a high-speed computing system for scientific and technical applications. The target system was supposed to operate at a rate of more the 10 GFLOPS, which was 100 to 1000 times faster than the speed of conventional computers at that time. Two major R&D projects were conducted: one on high speed novel devices and one on computer architecture, algorithms and languages for parallel computing. The six major vertically integrated computer/semiconductor companies - Fujitsu, Hitachi, Mitsubishi, NEC, Oki, Toshiba - together with the ETL participated in the project. Matsushita and Sony wanted to join the project but were not allowed in to discourage excessive competition. The research on high speed devices was divided up among the six participating firms: NEC, Toshiba, Hitachi, and Mitsubishi researched gallium arsenide (GaAs) chips; Fujitsu, Hitachi, and NEC, Josephson junctions; Fujitsu and Oki, HEMT (High electronic mobility transistor) devices. The research on parallel processing was divided into three subgroups: a high speed parallel (4 CPU) subproject (called PHI-Parallel, Hierarchical Intelligent computer project); the Sigma-I dataflow subproject; and a satellite image processing subproject. Of the three, PHI was the most important. In a practical approach to developing a 4 CPU machine as quickly as possible, the subproject combined four of Fujitsu's existing one processor VP 2000 supercomputers. To this combination was added a large high-speed common memory. Since each of the VPs already had its own memory the concept of a hierarchical memory structure appeared. The idea was that a user shouldn't have to know about this hierarchy and could treat the memory as "flat". The project was safely concluded in 1990 by demonstrating the PHI system to the evaluation team. The prototype high speed parallel system using 4 processors ran at over 10 GFLOPs, peak, and had real performance of over 1GFLOP. NEC wrote and tested one benchmark that solved a very large (32K) system of linear equations in under 11 hours. This was not a prototype of a machine that could be directly commercialized. Gallium arsenide devices-- HEMT and MESFET-- were used, though not as extensively as envisioned; Josephson Junction devices were not used at all, although advances in Josephson junctions put Japan in the lead in this area. Less tangibly, the project focused the private sector on supercomputers at a critical time, earlier and more heavily than they would have done individually. Of course, cooperation also meant that work was done faster and more economically. Individually, the Japanese companies were also investing heavily, some estimates were as high as 3-4 times the government figure, $300-500 million by each of the three. [2] The second architectural subproject - the Sigma-I Dataflow subproject- focused on developing a machine with 128 processors, a precursor to a massively parallel machine with 1024 processors within ETL. The research group around Toshio Shimada successfully completed the 128 processor machine in 1989, but apparently the basic design was given up in favor of other approaches to (modified) dataflow machines. The third subproject was the satellite image data processing system. Three types of architecture were explored: Toshiba focused on a high speed 3 dimensional display processor using 16 very fast VLSI processors; Mitsubishi developed a cellular array processor CAP with 4096 PEs, operating in a SIMD mode; Oki worked on a 2 dimensional display using 8 processors for use in global data processing networks. The Superspeed Project was considered by the Japanese companies as helpful, but the results have not yet been incorporated into individual products. Also some differences can be observed between the attitude of the Big Three (Fujitsu, Hitachi and NEC), which could do without the governmental subsidies and the minor ones (Mitsubishi, Oki, Toshiba), which would have had more difficulty embarking in these new directions without the extra funds. 4. Supercomputing #How many supercomputers are in Japan?# There are between four and five hundred supercomputers installed worldwide (this excludes IBM installations which are difficult to count); about 125 of these are now in Japan. Three large Japanese electronic companies, NEC, Fujitsu, and Hitachi produce shared memory supercomputers with some parallel features; these are products, and are supported and marketed as such. Within Japan, Fujitsu has almost half of the supercomputer installations; Cray, Hitachi, and NEC sharing the balance. (Counting replacements and upgrades, about 250 supercomputers have been installed in Japan; with Fujitsu again providing about half of these.) There are about 40 supercomputers at Japanese universities but the number could be misunderstood because at least a third are older machines or others with very modest performance. It also includes computers with nonstandard operating systems, few standard application software products, and inadequate networking. (These might still be appropriate for training and some applications.) Today, high performance workstations have significant computational capability and memory size although they are not counted as supercomputers. Thus, a supercomputer count reflects systems that (at time of installation) were unequivocally viewed as supercomputers by the vendor (supplier), purchaser, and by most knowledgeable members of the community. Such a number is best understood only as a qualitative measure. Most Japanese university scientists can get supercomputer time, but rarely on a top-end machine which are mostly at industrial labs or in the prestigious national universities. Access to supercomputers at Japanese universities has improved markedly in the past two or three years, although in our opinion, it is still below what is available to US academics. There are nothing comparable to the US NSF supercomputer centers. (Supercomputer centers are established at major universities, at several government laboratories, as well as at private corporations. Recruit's Institute for Supercomputing Research and the Institute of Computational Fluid Dynamics, are good examples of the latter.) Networking has improved recently. But academic networking is not as ubiquitous as it is in the US; the prestigious universities have excellent services while many other universities have none. There are more, high performance networks in the US than in Japan. Network interconnectivity in the US is also much better than in Japan; several more or less independent Japanese networks are supported by different Ministries. Researchers in Japan sometimes communicate with each other or with colleagues in Europe by transiting through the US. (This is changing. For example the Japanese government is establishing a direct link to Europe for collaborative research within the context of the Real World Computing project--see below.) Counterparts to very high performance networking projects in progress or planned in the US have not yet jelled in Japan. However, Japan has excellent and in some cases unique technology including a large infrastructure in the ISDN, and their networking difficulties seem to be more social, organizational, or cultural than technological. Nevertheless, research in supercomputing lags that of the West, except for applications developers working on commercial software packages. There are one or two supercomputer conferences each year with small technical programs--fewer than one third the number of papers presented at US conferences. 5. Characteristics of Supercomputers: Architecture and Performance Today's supercomputers have a large memory, 1-32 Gigabytes, and several (currently up to 16) independent and very high performance CPUs (sometimes called Functional Units--FUs). Within each CPU there are several pipelines (pipes) consisting of the components that add, multiply., etc. (Within a CPU the pipes have only one instruction path and must all carry out the same calculation, whereas different instructions can be executing on the independent CPUs.) A floating point operation (FLOP) is not achieved until the pipe has been filled, but once this happens a new FLOP occurs each clock cycle (hence the term, pipe). Data can be moved from/to memory at rates up to a few Gigabytes per second, but this is not fast enough to keep up with the arithmetic performance. Thus some kind of memory hierarchy is employed. For example, within each CPU, data from memory go first to registers which are built of the fastest and most expensive SRAM (static random access memory) chips and have a capacity up to about one megabyte. Data from the registers can be operated on by the pipelined arithmetic units at the peak hardware speed under certain circumstances. An essential difference between US and Japanese supercomputers has been that US supercomputers have more CPUs with each having a small number of pipes. On the other hand, Japanese machines have had fewer CPUs but each has more pipes, as many as 16. This is mostly a case that US companies have more experience building multi-CPU machines; the distinction is slowly changing as the Japanese add more CPUs to their systems. Peak performance can be computed from the hardware specifications of the machine. It is obtained by dividing the total number of independent add and multiply pipes by the clock cycle time in nano seconds (ns) to produce a result in Gigaflops (GFLOPs). Performance of Japanese supercomputers is always specified in terms of the peak that the hardware can achieve. Peak performance varies from about 5GFLOPs for Fujitsu VP2600 (billions of 64 bit floating point operations per second) to 32GFLOPs for Hitachi S-3800. The Cray Y-MP C90 has a peak speed of about 15GFLOPs. NEC's SX-3 has a peak of 26GFLOPs. Of course, most real applications will exhibit performance far below the peak. Actual performance is measured in terms of throughput, performance on specific applications or benchmarks, etc. (Informally, many scientists assume that usable speed is one order of magnitude less than claimed peak.) This can be heavily influenced by how rapidly and in what quantity data can be moved around. The startup time to fill a pipe from a register is an overhead, and it will reduce the computing speed unless it can be amortized over a sufficiently large number of calculations. If there are many pipes then subdividing arrays to utilize them all reduces the number using each and increases the relative importance of the startup. Also, bandwidth between memory and registers must match the realizable speed of the CPUs. There is additional overhead (memory latency) arising in the process of fetching numbers from memory for deposit in the registers, and this depends on the type of memory chips used, how skillfully irregular retrievals are carried out, and whether bank or other conflicts in memory are avoided. In real problems, there are significant fractions of the program that require floating point computation of scalars as distinguished from arrays. Some supercomputers such as Fujitsu VP2000 have two, separate scalar arithmetic units for each CPU which operate concurrently with the vector (array) unit. Like data movement, these scalar units are not relevant in computing peak performance, but are important in measuring real performance. To summarize the preceding paragraphs, the key to building a high performance supercomputer is to balance memory capability, arithmetic processor performance, data movement capability, etc. Each component plays a crucial role. This is generally related to the overall architectural design of the system, and is an area in which Cray has been particularly strong. 6. Characteristics of Supercomputers: Technology Another way to make machines faster is to use faster components, hardware, and devices, and the Japanese have excelled here. In this area the key ingredient is the parent company's use of their highest level and most sophisticated technology. NEC states this explicitly in their 1990 annual report, "the actual performance of a supercomputer is determined by its scalar performance.... NEC's approach to supercomputer architecture is clear. Our first priority is to provide high-speed single processor systems which have vector processing functions and are driven by the fastest technologies, while giving due consideration to ease of programming and ease of use; we also seek to provide shared memory multiprocessor systems to further improve performance." Hitachi's chief engineer Michihiro Hirai says that "hardware technology is one of the key determining factors of the supercomputer's performance." Similarly, Fujitsu Director Toshio Hiraguri says "conclusively, the past breakthroughs in computer hardware technology resulted from challenging what appeared to be technological limits in the field of large-scale computers." The Japanese see four major hardware tasks as being key to additional performance, faster chips, smaller size, heat reduction, and elimination of logic bugs. Supercomputers from NEC, Fujitsu, and Hitachi use tried and true emitter-coupled logic (ECL) semiconductor technology for basic processor chips, but have pushed their capabilities in this area quite far. For example, clock cycle time varies from 3.2nano second (Fujitsu), 2.5nano second (NEC), to about 2.0nano second (Hitachi). These figures are better than U.S. products (the Cray Y-MP C90 has a cycle time of 4.2nano seconds). Faster clocks translate into better performance. Another example of technology push is in the area of lithography, the process of outlining circuits. Beginning as an optical process generating 10-micron line widths in the 1960s, the practice is now an X-ray process in the 0.8-0.5 micron range. As line-widths become narrower, more highly packed chips can be built. The Japanese are aggressively working to reduce line width, and also to improve width variability in the hopes that the former will translate into direct performance improvements, the latter into less conservative designs-- hence also improved performance. ECL gate densities are also improving. Hitachi's newly announced (1992) supercomputer uses 25,000- gate arrays, NEC's (introduced in late 1989) has 20,000-gate arrays, and Fujitsu's (also introduced in 1989) uses 15,000-gate arrays. High end Japanese machines all have water cooled CPUs, but slightly slower air cooled versions are also available. In addition, air cooling is used in peripheral devices. Fujitsu uses GaAs (gallium arsenide) chips in some of its peripherals so these can be effectively cooled by air (GaAs can run cooler than silicon). Generally, the use of exotic device technology has been fairly conservative, although there are research projects at all the large Japanese companies. Thus far GaAs is not being used for CPU chips in any commercial Japanese machines, nor are even more sophisticated Josephson junction circuits. Fujitsu used the Superspeed project results (Section 3) to develop a hybrid Josephson junction-VLSI device, and plans to use it in its next generation supercomputers, probably out in the mid-nineties. (It takes 3-5 years to produce a large scale supercomputer product.) Similarly, NEC developed GaAs logic devices as well as memory chips and has designed a multichip package for supercomputers. GaAs is seen as slowly replacing ECL although the Japanese are convinced that there are still performance gains to be obtained with silicon. The issue of silicon versus GaAs is interesting and also shows how difficult it is to predict based on technological development. In 1989 Spectrum [3] commented on plans by Seymour Cray to develop a GaAs based supercomputer. "The company hopes the GaAs circuitry, along with a four-fold increase in parallelism, will improve on the Cray-2's performance by a factor of 12." This would have put the projected peak at about 5GFLOPs. Cray's new machine was to be available in late 1989 but is still under development at this time. As a final example of Japanese pushing technological excellence to build high performance systems we note that NEC has demonstrated that they can build in a single 17mm-square air cooled chip, a 64 bit floating point processor, with peak performance of 0.2GFLOPs. A 1GFLOP processor would only require five such chips. Supercomputer reliability is another area where Japanese technology plays a role. Where hardware is concerned, the Japanese have a substantial advantage, and although the Cray Y-MP series (which is based on VLSI technology) has been markedly better than Cray's previous generations, it lags behind the Japanese. In summary, Japanese industry's "pursuit of more powerful single processors is supported by a wide spectrum of in-house technologies ranging from single crystal production, VLSI fabrication, packaging, to assembling, and testing. The cost of hardware development is diffused by employing the same components in supercomputers and mainframes."[4] 7. Supercomputer Performance Measurement There is a great deal of controversy surrounding the selection of performance benchmarks, and whether they are oversimplified and fail to bring in adequate complexity. Not only the selection of problems, but to what extent automatic or manual optimization can be performed is relevant. Taking programs and making them run efficiently on a given supercomputer can either be done by hand or automatically. Automatic vectorizers take low-level loops and try to make use of all the pipes on a single CPU to gain efficiency. Autotaskers operate at a higher level, partitioning the program and causing separate parts to run on different CPUs concurrently. Performance of the Japanese single CPU systems has significantly improved to the point where they are internationally competitive. For example, in a recent benchmark [5] on a typical aerodynamics program, the authors comment that "one cannot avoid being impressed by the demonstrated power of all the [US and Japanese] benchmarked computers, due both to the speed of the hardware components and to the capability of their compilers. A sustained speed of at least 1GFLOPs was achieved by all computers [on one CPU]." Nevertheless this is very application dependent. Today, the trend is toward purchasing high end supercomputers with more than one CPU, and a distinction should be made between single and multiple CPU performance. Overall throughput performance relates to running a mix of programs each on one CPU. Peak performance relates to running one program using several CPUs. Computer Center managers are mostly concerned with throughput; headlines focus on peak. Quoted peak performance figures, such as 32 or 26GFLOPs, is "number of processors" times single processor peak (the second situation above), but this peak figure is much more difficult to achieve in practice than the peak on a single CPU. Not only does it demand more balance from the hardware but also demands more from the system software and the system autotasking capabilities. Japanese supercomputers have very fast CPUs, because of the excellent fundamental technology mentioned above, and also because of their use of many data pipes. At the time of the writing of this report only NEC among the Japanese supercomputer manufacturers has a multiprocessor system installed outside its own facilities. NEC's single processor performance has been excellent, but demonstrated performance on the (new) multiple CPU system has been substantially below its potential peak. This is a combination of lack of bandwidth to memory (balance) and lack of multiprocessor experience. Neither Fujitsu nor Hitachi have multi-CPU systems in the field to test, and we can only speculate that their situation will be similar. As Japanese computer vendors gain more experience with multi-CPU systems, performance is bound to improve. But at the moment it is several years behind Cray. 8. Supercomputer Software Software for supercomputers includes compilers, libraries, operating systems, support for networking, and software tools. All three Japanese supercomputers now are available with a customized version of the Unix operating system. The use of Unix will help the migration of application programs onto Japanese systems. People are just now coming to grips with the need to assess software costs, and moving to Unix is clearly seen as one way to reduce costs for the end user as well as the vendor. In Japan, this is a change from the use of proprietary operating systems that has occurred only in the past two or three years. For Hitachi it is only just now occurring, and the company has not totally embraced Unix -- its newest supercomputer is available in a Unix version, and also with the company's own IBM-like operating system for compatibility with older Hitachi systems. The situation is similar for Fujitsu, which also supports both Unix and its own system. In the past, applications developed in the West have been installed very slowly, and this was a a major impediment to the purchase of Japanese supercomputers both in and outside Japan. Using Unix will improve this situation. However, using a standard operating system only means that software portability is improved and development time is reduced, not that a program will run efficiently. There does not yet seem to be any shortcut to maximum performance short of incorporating knowledge of the hardware into the algorithms and software. Early Japanese supercomputer software development was limited to producing Japanese language interfaces for Western software products, and this is still an important activity. For example, NEC has recently moved the latest version of the heavily used engineering analysis system NASTRAN to its supercomputers, and the company's supercomputer promotional literature lists about one hundred products (many from the West) that are available in the areas of structural analysis, fluid dynamics, mechanical analysis, crash analysis, civil engineering, magnetics, acoustics, electron device simulation, injection molding, chemistry, graphics, mathematics, etc. Other vendors are engaged in similar projects. But more recently, first rate packages designed and implemented in Japan are appearing. Good examples are DEQSOL from Hitachi for the solution of the partial differential equations arising in engineering simulation, Alpha-flow from Fuji Research Institute for solution of fluid dynamics problems, Fortran/KR from Fujitsu allowing object oriented programming from within a Fortran environment, and AMOSS from NEC for molecular orbital calculations. There is also a clear trend to enlist Western scientists who already have experience with supercomputer implementations. NEC and Fujitsu have established facilities as well as collaborative research with groups in the US, Australia, and Europe for this purpose. Thus Japanese vendors are becoming more effective at accessing expertise outside of Japan. For those users who need to create software (rather than using existing applications) standard languages such as Fortran and C are available on all Japanese supercomputers, and the vendors are careful to insure that these meet all announced standards, although they have various enhancements too. To get efficient programs users can rearrange their algorithms, insert special directives within their programs, and also use vendor provided automatic vectorizers and autotasking. Optimized vendor libraries (providing functionality such as matrix manipulation, fast Fourier transform, etc.) with simple interfaces are another good way to obtain efficiency. The three Japanese supercomputer companies have large teams of programmers developing these libraries, and they also support well known commercial libraries from the West, IMSL and NAG, and non commercial projects such as Eispack, Linpack, etc. If the user interfaces are standardized, then portability is maintained along with efficiency. But there is no work originating in Japan with an eye toward standardization of scientific software. And there is almost no research comparable to that in the West on portable numerical algorithms, as typified for example, by the Lapack project at the University of Tennessee and other cooperating places. Nor is there much pressure to develop standardized software; vendors and users still develop libraries and user interfaces for their own platforms and applications. Japanese computer users can, and do write their own application software. People who have studied it from the inside claim it can be quite good. One American computer science graduate student who has been working at a major Japanese government research lab commented as follows. "As far as I can tell, the guys here can write software like demons. I know a lot of good hackers back at school [MIT], as well as from my experience in industry. These guys are major hackers -- definitely quite above average, although not in the top range of my experience." In summary, Japanese software for supercomputers, especially for multiprocessors, is only now emerging. Japanese supercomputer vendors have made major strides in networkability of their products, and support for high speed interfaces such as HIPPI (High-speed Interface Adapter) is available. 9. Brief Summary of Major Japanese Supercomputer Characteristics (a) Hitachi Hitachi's newest supercomputers, announced during the spring 1992, are, HITAC S-3800 and HITAC S-3600. (HITAC is the trade-name used by Hitachi for all their large-scale systems.) Delivery is scheduled for January 1993. At this point there are no plans to market the systems in the U.S. The S-3800 can be obtained as a multiprocessor system (up to four processors), with a clock cycle estimated to be 2 nano seconds using silicon. This is the fastest clock of any commercial supercomputer system. These are the third generation of HITAC supercomputers, following HITAC S-810 and S-820. In the past year Hitachi has not sold many of their 810/820s because performance lagged that of NEC and Fujitsu products, so this product has been needed. Significantly, Hitachi supercomputers will now support the Unix operating system. The S-3800 is water cooled and has six models. The high-end machine, model 480 has four CPUs, each with a scalar and vector unit, eight add/multiply pipes, four division pipes, four load pipes, four load/store pipes, and a mask pipe. Main storage is 2Gigabytes, and peak performance is 8GFLOPs per CPU and a maximum of 32GFLOPs for the four processor model. An interesting option is Scientific Animation Graphics, that allows direct access to high speed storage and can send out either ordinary (NTSC) or high definition TV (HDTV) signals to an outboard display unit at standard video rates of 30 frames each second. HDTV requires 119 Mbytes/second data transfer rate, which is supported. (b) NEC NEC introduced their SX-3 series almost three years ago, and it is the Japanese computer that created the most stir in the West. Recently, they made a modification allowing the 2.9nano second cycle time to be reduced to 2.5nano seconds; the new model has the R designation, SX-3R. The high end model SX-3R/44 has four CPUs and a total of 64 arithmetic pipes. Peak performance is 6.4GFLOPs per CPU, and a maximum of 25.6GFLOPs. Up to 8Gbytes of memory are available. NEC has been trying hard to sell their supercomputers in the US, even to the extent of using the court system to force entry into, what they felt, was a closed procurement. SX-3 computers run Unix. A few SX-3s have been installed outside of Japan and performance measurements are beginning to be published. (c) Fujitsu Fujitsu's VP2000 series supercomputers come in a variety of models, of which two have peak performance of 5GFLOPs. The VP2600/20 has one CPU, while the VP2400/40 has two. Thus one has a choice of more asynchronous parallelism (VP2600) or synchronous parallelism (VP2400) to obtain the same peak performance. Clock cycle time is 3.2 nano seconds, and memory is 3Gbytes. Each CPU has two pipes, but each pipe can simultaneously perform addition and multiplication. Further, the VP2400 and VP2600 have double and quadruple thick pipes, meaning that each pipe on the VP2600 can perform 16 floating point operations per clock cycle; on the VP2400 the corresponding number is 8 per cycle. Both Unix and Fujitsu's own operating system are supported, but recently, a new model VPX, with the same performance characteristics but which only runs Unix has been made available. Recently, Fujitsu has stepped up its efforts to market its supercomputers in the US, but plans to focus on private industry rather than government sales. 10. Japanese Supercomputers in the US--Very Few As of this writing, there are no Japanese supercomputers in the US, except for an SX-3 at NEC's Houston center, two Fujitsu systems sold to a US subsidiary of a Norwegian company in 1987 and 1988, and a VP 2400 at the company's San Jose facility. Resistance in the US is partly due to concern about proven cost effective performance, partly to the wide availability of US-based Crays with a large software base and experienced user community, and partly to hesitancy on the part of US computer center directors (especially within government) to buck explicit or implicit disapproval from Congress and funding agencies without any strong countervailing arguments. Both NEC and Fujitsu have been more successful at selling outside the US, aided perhaps, by generous terms. What situations will eventually generate sales at fair market prices in the US, even to skittish government labs? The two most probable are (a) a "solution" with either specific software, or integration into an existing hardware environment, or (b) performance. (a) might occur in an international organization that is already using Japanese products outside of the US and wants a high degree of compatibility in their US facility. For (b) we asked the question of how much performance was necessary at a recent conference attended by US experts with purchasing responsibilities. Japanese supercomputers will have to perform better than their US counterparts to convince these people, perhaps by as much as a factor of ten and at least by a factor of three or four. At this time, Japanese supercomputer marketing in the US is directed toward private sector sales. On the other side, US supercomputer makers assert that sales of non-Japanese supercomputers to Japanese government institutes and universities are hindered by various unfair practices. It is not the purpose of this paper to discuss these issues, except to say that they are under negotiation. For future trends, it is clear that the Japanese see that ECL technology is nearing its limitation, but is not yet dead. Cooling will be improved and circuit density will increase, but these technologies too will be increasingly difficult to push. The only approaches are then to make individual processors faster by using radically new technologies, such as optical, biological, etc., or to develop parallel systems. --------------FOR SECTIONS 11-19, SEE FILE "jhpc-pp.92--------------------