School of Computer Science, Carnegie Mellon University
Pittsburgh, PA 15213-3891
webb+@cmu.edu

Acknowledgment

This research was partially supported by the Adanced Research Projects Agency of the Department of Defense under contract number F19628-93-C-0171, ARPA order number A655, "High Performance Computing Graphics," monitored by Hanscom Air Force Base. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Advanced Research Projects Agency, the Department of Defense, or the U.S. government.

This paper will appear in the proceedings of the International Conference on Pattern Recognition, October 1994, Jerusalem.

High Performance Computing in Image Processing and Computer Vision

Abstract
Image processing and computer vision are natural applications for High Performance Computing (here considered to be general-purpose parallel supercomputing), but there are many barriers to tis effective use in computer vision. These barriers are described, two systems (the Carnegie Mellon ALVINN system and the animate vision model at Rochester) that overcame them are discussed, and then a world-wide survey is taken of application of HPC to image processing and computer vision. Finally, the future of this field is described.
1: Introduction
The inherent parallelism in image processing and computer vision suggests that high performance computing (HPC) should be readily applicable, and historically image processing and computer vision have been the most common areas proposed for the use of HPC. But actual applications have been few, and the impact of HPC on research in image processing and computer vision has been slight.
Here we discuss the barriers to application of HPC to image processing and computer vision, give some success stories, survey current research, and forecast the future of this work.
2: Barriers to the application of hpc to image processing and computer vision
2.1: Interfaces
Video interfaces are a basic requirement for the application of HPC to most image processing and computer vision problems, but are at most an afterthought by supercomputer manufacturers. For example, HiPPI (high performance parallel interface) is an emerging de facto standard for supercomputer peripherals, and there are several HiPPI video output (i.e., graphics) interfaces, but only one manufacturer (PsiTech) offers a HiPPI video input interface.
2.2: Cost
2.2.1: Market requirements: In many industrial image processing applications the maximum that may be spent on computing resources is extremely small compared to the cost of a high performance computer, e.g., $50,000 or less. It is hard to justify using HPC in these areas, even if HPC could offer significant speedup.
2.2.2: Difficulty of sharing resources: Scientific computing has benefited greatly from the sharing of HPC resources, particularly through the supercomputing centers. No one single scientific application may be able to justify the expenditure of several millions of dollars for a supercomputer, but often several can. However, the need for interaction in many image processing and computer vision domains obviates sharing.
2.2.3: Expendability: In some robotic applications the robot and associated computing resources are placed in relatively hazardous situations. It is almost impossible to justify placing several millions of dollars worth of computing equipment in such an environment.
2.3: Size and power
High performance computers are typically large and consume a lot of power, making it impossible to mount them on robot vehicles.
2.4: Latency requirements
2.4.1: Message-passing latency: As shown in [26] the latency requirements in robotics image processing and computer vision, where the sensory processing is part of a control loop, are much stiffer than latency requirements in scientific computing. There are significant architectural implications of this fact, primarily for message passing latency; it must be much lower than in scientific computing. Thus, many HPC machines designed for use in scientific computing are simply unsuitable for robotics image processing and computer vision tasks.
<2.4.2: Network latency: Because of cost sharing, HPC machines are generally available to image processing and computer vision researchers over networks at regional supercomputing centers. Latency over these networks is generally unacceptable for robot control applications (which require response times of 100 ms or less) and may, when combined with bandwidth limitations, be unacceptable for interactive applications (which require response times of 1 s to 1 minute.)
3: Success stories in the application of hpc to image processing and computer vision
3.1: The development of the alvinn system on the Carnegie Mellon Navlab
ALVINN (Autonomous Land Vehicle in a Neural Network) is a neural-network based road following system [19]. It is one of the most successful examples of the application of neural networks to a robot control problem; using ALVINN, the robot vehicle is capable of driving for hours at highway speeds. The success of ALVINN has led to the application of similar neural networks to a wide range of robot control and perception problems. There is little doubt that this represents a real advance of image processing and computer vision research.
ALVINN was developed during a period of collaboration between HPC and computer vision researchers. It was implemented on NAVLAB, a van that had been modified for autonomous control [24]. On board NAVLAB were several workstations, control hardware, cameras, and a ten cell, 100 MFLOPS Carnegie Mellon Warp machine.
The presence of Warp on NAVLAB was critical to the decision to apply neural networks to road following [5]. At the time of the development of this system, no other computer that could be mounted in NAVLAB could do the backpropagation quickly enough to train the neural networks to learn the road following task in a reasonable time [20]. Indeed, even on Warp, initial training was very slow: eight hour overnight runs on a laboratory machine were used.
Eventually, experience with ALVINN led to a vast reduction in the network size and the algorithm was moved to run on Warp within NAVLAB, in on-line runs. After a period of experimentation with that system, which led to a further reduction of network size., and the introduction of the Sun 4, the entire algorithm was then moved to a workstation.
Thus, HPC played a critical though transitional role in the development of this computer vision advance.
3.2: The development of the animate vision model at Rochester
Animate vision [1] is a paradigm for visual processing in which the goals of the agent is explicitly taken into account in formulating sensory processing. In this model an agent may do far less sensing than in traditional paradigms because only those sensory steps necessary to accomplishing the goals of the agent are actually executed.
The development of the animate vision model at Rochester was strongly influenced by experience there with parallel computing, in particular the system they constructed that used the BBN Butterfly for higher-level vision and Datacube hardware for image processing [4].
The further development of the animate vision paradigm has been frustrated by a lack of suitable parallel architectures. Several architectures have been tried: for example, the BBN Butterfly supports the right models of parallelism but has limited I/O bandwidth; transputers are incompatible with other familiar systems and have limited processing capability [3].
3.3: Evaluation of success stories
3.3.1: Interfaces: Both the Carnegie Mellon and Rochester systems explicitly addressed the problem of vision interfaces by using hardware based on Datacube. Particularly in the case of the Carnegie Mellon system, the limited bandwidth between the Datacube and the Warp system (over the VME bus) limited total processing and restricted the class of algorithms that could be efficiently implemented on Warp.
3.3.2: Cost: The supercomputers in both of these systems were paid for by the DARPA Strategic Computing Program, which meant that they did not have to be justified as computer vision systems per se, effectively eliminating the cost issue. On the other hand, since these systems were developed as experimental hardware, they lacked many features important to their success as vision systems, and there was no direct follow-on hardware to which these systems could be ported. And there was no direct application of these systems to commercial problems, partly because of the cost issue.
3.3.3: Size and power: The Warp computer, because of its design, was remarkably small for its computing power, which is one of the reasons it could be mounted in Navlab. Even so, its power consumption (and the associated cooling needs) was a constant problem in Navlab.Size and power was not an issue in the Rochester system, as this was fixed in a laboratory environment.
3.3.4: Latency requirements: In both of these systems network latency was not an issue since the supercomputer was available on-site. Message passing latency plays a significant role in both systems:
Carnegie Mellon. Low message passing latency was one of Warp's distinguishing features, and was one of the reasons low-level image processing algorithms could be mapped efficiently onto it. In the neural network algorithm low message passing latency was critical to getting the high performance from backpropagation that was necessary for it to be used in ALVINN [20].
Rochester. The BBN Butterfly had high message passing latency, which made it impossible to use for image processing. This is the reason for the introduction of the Datacube as part of the system. High message passing latency was also critical in leading Rochester to use task-level decomposition of the animate vision system, since message passing latency plays a less significant role when systems are broken down with a larger grain size.
4: Current research in hpc applications to image processing and computer vision
This a greatly abbreviated survey of the use of HPC for image processing and computer vision in North America, Europe, and Japan.
North America, particularly the United States, is the leader in this area, because of long-term investment by the Advanced Research Projects Agency of the Department of Defense, and the health of the computer industry generally.
European research on HPC applications to image processing and computer vision has been largely done on machines based on transputers and on the AMT DAP, though the transputer is being displaced by new chips from Texas Instruments. Many different groups have constructed transputer-based machines and applied them to problems in industrial computer vision and robot vehicle control. The AMT DAP has been used as the basis of much work in SIMD image processing algorithms.
Japan is a world leaders in the application of image processing techniques to industrial systems, but relatively little of their effort has been directed to the development of parallel computer vision systems.
Carnegie Mellon University: Carnegie Mellon is one of only two universities (the other being the Universität der Bundeswehr München) applying parallel computers to outdoor, high speed robot vehicle guidance . Early work in this project has been discussed above; current work related to HPC is uses the MasPar MP-1 [11].
Daimler-Benz: Work here has focussed on applying HPC to image processing problems, particularly at the intermediate level [7].
Electrotechnical Laboratory: The primary focus of massively parallel systems work at ETL is on dataflow computers.
Genoa University: Researchers at Genoa university have been studying communications issues in SIMD parallel computers.
Kyushu University: Researchers at Kyushu University have constructed two architectures with image processing as an applications area: the Kyushu University Reconfigurable Parallel Processor [16] and AMP (autonomous multi-processor) [23].
New York University --- Courant Institute of Mathematical Sciences: The technique of geometric hashing, a method for matching scene objects with a model database that is particularly well suited for parallel implementation, was invented at New York University [12].
Okayama University: Researchers here have described the RTA/1, a n-dimensional torus with interconnected local memories, and analyzed its performance on various image processing algorithms, including local oeprations, Hough transform, and connected components [14].
Purdue University: Current work has focussed on mapping issues for image processing and computer vision algorithms on commercial HPC architectures [18].
Queen Mary College, University of London: Researchers at Queen Mary College are leaders in the application of the AMT DAP to problems in image processing and computer vision, and have also been involved in the development of the DAP and DAP software. They have developed parallel algorithms for a variety of mid-to-high level vision algorithms, such as object matching [10].
State University of New York at Buffalo: A wide variety of geometric algorithms (such as convex hull and line segmentation) have been mapped onto architectures with different interprocessor communication structures, such as pyramid and reduced mesh [15].
Swiss Federal Institute of Technology: The Swiss Federal Institute of Technology developed SYDAMA, a static dataflow machine. Current work is based on the successor machine, SYDAMA-2 [9].
Syracuse University: Researchers there have developed algorithms for many image operations, including image template matching and clustering, and analyzed their mapping onto architectures such as the hypercube and reduced-mesh architectures [21].
Thinking Machines Corporation (TMC): TMC and researchers at MIT developed widely-used parallel programming models, in particular the scan-vector model, which has been widely used in parallel programming applications, including computer vision.
Université Paris Sud: Researchers here developed SPHINX, a SIMD pyramid machine, and mapped it onto the Connection Machine [22].
University of British Columbia: Recent work here has been structured around the Vision Engine, a Datacube and transputer-based machine being applied to motion, stereo, and tracking [13].
University of Crete: A focus of their research has been on the relationship between type of image processing operations and parallel efficiency [8].
University of Florence: Researchers here have long been leaders in studies of local parallel algorithms such as optical flow, and have recently turned to implementing their work on the Connection Machine [2].
University of Maryland: A particular focus has been on the application of parallelism to problems in which the natural data mapping is irregular and difficult to parallelize, such as in focus of attention [17].
University of Oulu: Industrial machine vision is a chief topic of research in the Computer Laboratory at the University of Oulu, and parallel computers are being used in order to meet real-time requirements [25].
University of Rochester: Recent work focuses on the implementation of "animate vision" systems on parallel computers, including the Butterfly, as discussed above.
University of Southern California (USC): This group includes researchers from all areas: theoretical analysis of algorithms, the implementation of algorithms on real parallel computers, computer vision researchers employing parallel computers in their work, and the design of new programming methodologies for parallel computer vision.
University of Tokyo: Researchers at the University of Tokyo have developed PSM-32, a shared memory MIMD machine, intended for image processing and computer vision [6].
5: The future of hpc applications to image processing and computer vision
The single most important limit to the application of HPC to image processing and computer vision has been the lack of an acknowledged market. While the barriers discussed earlier are largely technical, they can all be overcome given appropriate design.
In the past, HPC application to image processing and computer vision has been sustained by government investment, which cannot by itself generate a large enough market to justify the kinds of optimizations needed to make HPC really usable in these areas. But this is changing. New technologies are making video processing of greater importance:
High bandwidth networks are leading to increased interest in the distribution of video. In these applications many image processing operations, particularly compression, must be done at very high speeds, making HPC a natural base on which to build video servers (for example, NCube Corporation recently introduced such a server.) Just as important, future advances in the technology needed to support video distribution will come from advances in image processing and computer vision. Given the quantities of data involved, HPC must be used to conduct this research.
Advanced robotics systems are finally coming of age. Because of cost, size and power constraints these systems will not use HPC per se, but HPC can be used to conduct research in such systems that will be applied to develop the next generation of such systems.
Other applications, such as medical image processing and image databases, are also reaching a stage of maturity that makes it possible to begin addressing the speed issues that must be overcome in a working system.
Scientific computations that are trivially parallelizable have already been parallellized; those remaining require more communication, and so have requirements more like image processing and computer vision algorithms.
Acknowledgments
Much of the motivation and some of the ideas in this paper came from a discussion on HPC and computer vision that took place on the Internet in late 1993. The participants in that discussion are too numerous to list here; but contributions by Larry Davis, Chip Weems, Chris Brown, and Ram Nevatia were particularly important.
This research was partially supported by the Adanced Research Projects Agency of the Department of Defense under contract number F19628-93-C-0171, ARPA order number A655, "High Performance Computing Graphics," monitored by Hanscom Air Force Base. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Advanced Research Projects Agency, the Department of Defense, or the U.S. government.
Bibliography
1. Ballard, D.H., Animate Vision. Artificial Intelligence, 1991. 48(1): p. 57-86.

2. Bimbo, A.D. and P. Nesi. Optical Flow Estimation on the Connection-Machine CM-2. in Workshop on Computer Architectures for Machine Perception. 1993. New Orleans, LA: IEEE Computer Society.

3. Brown, C., Personal Communication. Comments on active vision and parallel systems.1994,

4. Brown, C.M. Parallel Vision with the Butterfly Computer. in Third International Conference on Supercomputing. 1988. Boston, MA:

5. Crisman, J.D. and J.A. Webb, The Warp Machine on Navlab. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991. 13(5): p. 451-65.

6. Deguchi, K., K. Tago, and I. Morishita. Integrated Parallel Image Processings on a Pipelined MIMD Multi-Processor System PSM. in 10th International Conference on Pattern Recognition. 1990. Atlantic City, NJ:

7. Gerogiannis, D. Programming Intermediate Level Vision Tasks on Parallel Machines. in International Conference on Pattern Recognition. 1992. The Hague, The Netherlands: IEEE Computer Society Press.

8. Gerogiannis, D. and S.C. Orphanoudakis. Efficient Use of Parallelism in Intermediate Level Vision Tasks. in International Conference on Pattern Recognition. 1992. The Hague, The Netherlands: IEEE Computer Society Press.

9. Gunzinger, A. Concept and Realization of a Heterogeneous Multiprocessor System for Real Time Image Processing. in Computer Architectures for Machine Perception. 1991. Paris, France: D.G.A./E.T.C.A., C.N.R.S./I.E.F. and M.E.N./D.R.E.D.

10. Holder, D. and H. Buxton, Polyhedral object recognition with sparse data-validation of interpretations. Image and Vision Computing, 1990. 8(2): p. 124-9.

11. Jochem, T.M. and S. Baluja. A Massively Parallel Road Follower. in Workshop on Computer Architectures for Machine Perception. 1993. New Orleans, LA: IEEE Computer Society.

12. Lamdan, Y. and H.J. Wolfson. Geometric Hashing: A General and Efficient Model-Based Recognition Scheme. in Second International Conference on Computer Vision. 1988. Tampa, FL:

13. Little, J.J. and J. Kam. A Smart Buffer for Tracking Using Motion Data. in Workshop on Computer Architectures for Machine Perception. 1993. New Orleans, LA: IEEE Computer Society.

14. Matsuyama, T., N. Asada, and M. Aoyama. Parallel Image Analysis on Recursive Torus Architecture. in Workshop on Computer Architectures for Machine Perception. 1993. New Orleans, LA: IEEE Computer Society.

15. Miller, R., et al., Efficient Parallel Algorithms for Intermdiate-Level Vision Analysis on the Recopnfigurable Mesh, in Parallel Architectures and Algorithms for Image Understanding. 1991, Academic Press: p. 185-207.

16. Murakami, K., et al. The Kyushu University Reconfigurable Parallel Processor-Design Philosophy and Architecture. in Information Processing 89. Proceedings of the IFIP 11th World Computer Congress. 1989. San Francisco, CA:

17. Narayanan, P.J., L.T. Chen, and L.S. Davis, Effective Use of SIMD Parallelism in Low- and Intermediate-Level Vision. Computer, 1992. 25(2): p. 68-73.

18. Patel, J.N. and L.H. Jamieson. Evaluating Scalability of the 2-D FFT on Parallel Computers. in Computer Architectures for Machine Perception. 1993. New Orleans, LA: IEEE Computer Society Press.

19. Pomerleau, D.A., Efficient Training of Artificial Neural Networks for Autonomous Navigation. Neural Computation, 1991. 3(1): p. 88-97.

20. Pomerleau, D.A., et al. Neural Network Simulation at Warp Speed: How We Got 17 Million Connections Per Second. in Proceedings of 1988 IEEE International Conference on Neural Networks. 1988.

21. Ranka, S. and S. Sahni, Clustering on a Hypercube Multicomputer. IEEE Transactions on Parallel and Distributed Systems, 1991. 2(2): p. 129-37.

22. Rougerie, E. and A. Mérigot. Architectural Simulation of a Fine Grain Parallel Pyramid Computer on the Connection Machine. in Computer Architectures for Machine Perception. 1991. Paris, France: D.G.A./E.T.C.A., C.N.R.S./I.E.F. and M.E.N./D.R.E.D.

23. Taniguchi, R.-I. and M. Amimiya. AMP: An Autonomous Multi-processor for Image Processing and Computer Vision. in 10th International Conference on Pattern Recognition. 1990. Atlantic City, NJ:

24. Thorpe, C.E., Vision and Navigation: The Carnegie Mellon Navlab. 1990, Boston: Kluwer Academic Publishers. 367.

25. Vuohtoniemi, V. and T. Seppaenen. Transputer-based Machine Vision Systems Research at the University of Oulu. in Nordic Transputer Applications. 1st and 2nd Nordic Transputer Seminars. 1991. Turku, Finland and Trondheim, Norway:

26. Webb, J.A. Latency and Bandwidth Considerations in Parallel Robotics Image Processing. in Supercomputing '93. 1993. Portland, OR: IEEE Computer Society.