Books and Contributions to Books

Gross, T. and Lam, M. A Retrospective on the Warp Machines. In Sohi, G. et al. (editors), 25 Years of Computer Architecture, chapter ()pp. 42--45. IEEE, 1998.

Gross, T. and O'Hallaron, D. iWarp: Anatomy of a Parallel Computing System. MIT Press, 1998.

Nicolau, A. and Gelernter, D. and Gross, T. and Padua, D. (Eds). Research Monographs in Parallel and Distributed Computing: Advances in Languages and Compilers for Parallel Processing. The MIT Press, Cambridge, MA., 1991.

Gross, T. Code Optimization of Pipeline Constraints. PhD thesis, Stanford University, September 1983.

Journal Publications

Bolliger, J. and Gross, T. A Framework-Based Approach to the Development of Network-Aware Applications. IEEE Trans. Softw. Eng. 24(5):376 -- 390, May 1998.

Riehle, D. and Brudermann, R. and Gross, T. and Maetzel, K.-U. Pattern Density in the Design of an Object Transport Service. ACM Computing Surveys ():, (to appear).

Suzuoka, T. and Subhlok, J. and Gross, T. A Performance Debugging Tool for High Performance Fortran Programs. Concurrency -- Practice and Experience 9(10):927--945, Oct 1997.

Gross, T. and Hasegawa, A. and Hinrichs, S. and O'Hallaron, D. and Stricker, T. Communication Styles for Parallel Systems. IEEE Computer 27(12):34--44, Dec 1994.

Gross, T. and O'Hallaron, D. and Subhlok, J. Task Parallelism in a High Performance Fortran Framework. IEEE Parallel and Distributed Technology 2(3):16--26, Fall 1994.

Freudenberger, S. and Gross, T. and Lowney, P. Avoidance and Suppression of Compensation Code in a Trace Scheduling Compiler. ACM Trans. on Prog. Lang. Syst. 17(3):1156--1214, 1994.

Stichnoth, J. and O'Hallaron, D. and Gross, T. Generating Communication for Array Statements: Design, Implementation, and Evaluation. Journal of Parallel and Distributed Computing 21(1):150-159, 1994.

Gross, T. An Overview of Programming the iWarp System. Intl. J. of High Speed Computing 5(3):379-401, 1993.

Gross, T. and Steenkiste, P. Structured Dataflow Analysis for Arrays and its Use in an Optimizing Compiler. Software: Practice \& Experience 20(2):133-155, February 1990.

Gross, T. R. and Hennessy, J. L. and Przybylski, S. A., and Rowen, C. Measurement and Evaluation of the MIPS Architecture and Processor. ACM Trans. on Computer Systems 6(3):229--258, August 1988.

Annaratone, M. and Arnould, E. and Gross, T. and Kung, H. T. and Lam, M. S. and Menzilcioglu, O. and Webb, J. A. The Warp Machine: Architecture, Implementation and Performance. IEEE Trans. on Computers C-36(12):1523-1538, Dec. 1987.

Gross, T. Software Implementation of Floating Point Arithmetic on a Reduced-Instruction-Set Processor. J.of Parallel and Distributed Computing 2(4):362-375, 1985.

Przybylski, S. and Gross, T. and Hennessy, J. and Jouppi, N. and Rowen, C. Organization and VLSI Implementation of MIPS. Journal of VLSI and Computer Systems 1(2):170--208, Fall 1984.

Hennessy, J.L. and Gross, T.R. Postpass Code Optimization of Pipeline Constraints. ACM Trans. on Prog. Lang. Syst. 5(3):422 -448, July 1983.

Selective Conferences

Scherer, A. and Lu, H. and Gross, T. and Zwaenepoel, W. Transparent Adaptive Parallelism on NOWs using OpenMP. In Proc. 7thACM Symp. on Principles and Practice of Parallel Prog. (PPoPP'99), pp. (to appear). ACM, Atlanta, GA, May 1999.

Bolliger, J. and Gross, T. and Hengartner, U. Bandwidth Modelling for Network-Aware Applications. In Proc. INFOCOM'99, pp. . IEEE, New York, March 1999.

Riehle, D. and Gross, T. Role-Model Based Framework Design and Integration. In Proc. OOPSLA'98, pp. (to appear). ACM, Vancouver, October 1998.

Lowekamp, B. and Miller, N. and Sutherland, D. and Gross, T. and Steenkiste, P. and Subhlok, J. A Resource Query Interface for Network-Aware Applications. In Proc. 7th IEEE Symp. High-Performance Distr. Comp., pp. (to appear). July 1998.

O'Hallaron, D. and Shewchuk, J. and Gross, T. Architectural Implications of a Family of Irregular Applications. In Proc. 4th Symp. on High Performance Computer Architecture, pp. 80--89. IEEE, Las Vegas, Feb 1998.

Stichnoth, J. and Gross, T. Code Composition as an Implementation Language for Compilers. In Conf. on Domain-Specific Languages, Proceedings, pp. 119--131. USENIX, Santa Barbara, Oct 1997.

Lueh, G. and Gross, T. Call-cost Directed Register Allocation. In Proc. ACM SIGPLAN'97 Conf. on Prog. Language Design and Implementation, pp. 296--307. ACM, June 1997.

Stricker, T. and Gross, T. Global Address Space, Non-Uniform Bandwidth: A Memory System Performance Characterization of Parallel Systems. In Proc. 3rd Symp. on High Performance Computer Architecture, pp. 168--179. IEEE, San Antonio, Jan 1997.

Subhlok, J. and Gross, T. and Suzuoka, T. Impact of Job Mix on Optimizations for Space Sharing Schedulers. In Proc. Supercomputing'96. ACM/IEEE, Pittsburgh, PA, Nov 1996.

Adl-Tabatabai, A. and Gross, T. and Lueh, G. Code Reuse in an Optimizing Compiler. In Proc. OOPSLA'96, pp. 51--68. ACM, October 1996.

Adl-Tabatabai, A. and Gross, T. Source-Level Debugging of Scalar Optimized Code. In Proc. ACM SIGPLAN'96 Conf. on Prog. Language Design and Implementation, pp. 33--43. ACM, May 1996.

Stricker, T. and Stichnoth, J. and O'Hallaron, D. and Hinrichs, S. and Gross, T. Decoupling Synchronization and Data Transfer in Message Passing Systems of Parallel Computers. In Proc. Intl. Conf. on Supercomputing, pp. 1-10. ACM, Barcelona, July 1995.

Stricker, T. and Gross, T. Optimizing Memory System Performance for Communication in Parallel Computers. In Proc. 22nd Intl. Symp. on Computer Architecture, pp. 308--319. ACM/IEEE, Portofino, Italy, June 1995.

Subhlok, J. and O'Hallaron, D. and Gross, T. and Dinda, P. and Webb, J. Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs . In Supercomputing '94, pp. 330-339. Washington, DC, November 1994.

Gross, T. and Steenkiste, P. Architecture Implications of High-Speed I/O for Distributed-Memory Computers. In Proc. Intl. Conf. on Supercomputing ICS94, pp. 176--185. ACM, Manchester, England, July 1994.

Adl-Tabatabai, A. and Gross, T. Detection and Recovery of Endangered Variables Caused by Instruction Scheduling. In Proc. ACM SIGPLAN'93 Conf. on Prog. Language Design and Implementation, pp. 13--25. ACM, June 1993.

Adl-Tabatabai, A. and Gross, T. Evicted Variables and the Interaction of Global Register Allocation and Symbolic Debugging. I Conf Recor of the 20th nual ACM Sym . on Principles ofnProg..Lang.,dpp. 371--383.AnACM, Januaryp1993.

Subhlok, J. and Stichnoth, J. and O'Hallaron, D. and Gross, T. Exploiting Task and Data Parallelism on a Multicomputer. In Proc. 4th ACM Symp. on Principles and Practice of Parallel Prog. (PPoPP), pp. 13--22. May 1993.

Feldmann, A. and Gross, T. and O'Hallaron, D. and Stricker, T. Subset Barrier Synchronization on Private-Memory Machines. In Proc. SPAA 92, pp. 209--218. ACM, San Diego, June 1992.

Fisher, A. L. and Gross, T. Teaching Empirical Performance Evaluation of Parallel Programs. In Proc. 1992 SIGCSE Technical Symp., pp. 309--313. ACM Special Interest Group on Computer Science Education (SGICSE), Kansas City, MO, March 1992.

Cate, V. and Gross, T. Combining the Concepts of Compression and Caching for a Two-Level Filesystem. In Proc. Fourth Intl. Conf. on Architectural Support for Prog. Languages and Operating Systems (ASPLOS IV), pp. 200--211. ACM/IEEE, Palo Alto, April 1991.

Fisher, A. L. and Gross, T. Teaching the Programming of Parallel Computers. In Proc. 1991 SIGCSE Technical Symp., pp. 102--107. ACM Special Interest Group on Computer Science Education (SIGCSE), San Antonio, TX, March 1991.

Borkar, S. and Cohn, R. and Cox, G. and Gross, T. and Kung, H. T. and Lam, M. and Levine, M. and Moore, B. and Moore, W. and Peterson, C. and Susman, J. and Sutton, J. and Urbanski, J. and Webb, J. Supporting Systolic and Memory Communication in iWarp. Technical Report CMU-CS-90-197, School of Computer Science, Carnegie Mellon, December 1990. First published in Proc. 17th Intl. Symp. on Computer Architecture, pp. 70-81.

Gross, T. Communication in iWarp Systems. In Proc. Supercomputing '89, pp. 436--445. ACM/IEEE, November 1989.

Gross, T. and Zobel, A. and Zolg, M. Parallel Compilation for a Parallel Machine. In Proc. ACM SIGPLAN '89, pp. 91-100. ACM, Portland, OR, June 1989.

Cohn, R. and Gross, T. and Lam, M. and Tseng, P. S. Architecture and Compiler Tradeoffs for a Wide Instruction Word Microprocessor. In Proc. Third. Intl. Conf. on Architectural Support for Prog. Languages and Operating Systems (ASPLOS III), pp. 2-14. ACM/IEEE, Boston, Apr 1989.

Borkar, S. and Cohn, R. and Cox, G. and Gleason, S. and Gross, T. and Kung, H. T. and Lam, M. and Moore, B. and Peterson, C. and Pieper, J. and Rankin, L. and Tseng, P. S. and Sutton, J. and Urbanski, J. and Webb, J. iWarp: An Integrated Solution to High-Speed Parallel Computing. In Proc. Supercomputing '88, pp. 330-339. IEEE Computer Society and ACM SIGARCH, Orlando, Florida, November 1988.

Bruegge, B. and Gross, T. A Program Debugger for a Systolic Array: Design and Implementation. In Proc. Second Workshop on Parallel and Distributed Debugging, pp. 174-182. ACM, Madison, WI, May 1988. SIGPLAN Notices Vol. 24, Nr. 1.

Bruegge, B. and Gross, T. An Integrated Environment for Development and Execution of Real-Time Programs. In Proc. ACM Intl. Conf. on Supercomputing, pp. 153-162. ACM, St. Malo, France, July 1988.

Gross, T. and Sussman, A. Mapping a Single-Assignment Language onto the Warp Systolic Array. In Kahn, G. (editor), Proc. Conf. on Functional Lang. and Computer Architecture, pp. 347--363. ACM, Springer, Portland, OR, Sep 1987.

Siegell, B. and Gross, T. Program-specific and Architecture- specific Simulators. In Barbacci, M. and Koomen, C. (editors), th CHDL 87, Proc. 8 Intl. Symp. on Computer Hardware Description Languages and their Applications, pp. 29-45. IFIP WG. 10.2, North Holland/Elsevier, Amsterdam, April 1987.

Gross, T. and Lam, M. Compilation for a High-performance Systolic Array. In Proc. ACM SIGPLAN '86 Symp. on Compiler Construction, pp. 27-38. ACM, Palo Alto, June 1986.

Annaratone, M. and Arnould, E. and Gross, T. and Kung, H. T. and Lam, M. S. and Menzilcioglu, O. and Sarocky, K. and Webb, J. A. Warp Architecture and Implementation. In Conf. Proc. 13th Annual Intl. Symp. on Computer Architecture, pp. 346--356. IEEE/ACM, June 1986.

Gross, T. Floating Point Arithmetic on a Reduced-Instruction- Set Processor. In Kai Hwang (editor), Proc. 7th Symp. on Computer Arithmetic, pp. 86--92. IEEE Computer Society, Urbana, Ill., June 1985.

Rowen, C. and Przybylski,S. and Jouppi, N. and Gross, T., Shott, J. and Hennessy, J. MIPS: A High Performance 32-Bit NMOS Microprocessor. In Digest of Intl. Solid-State Circuits Conf., pp. 180-181. IEEE, San Francisco, Ca., Febuary 1984.

Hennessy, J. and Jouppi, N. and Przybylski, S. and Rowen, C. and Gross, T. Design of a High Performance VLSI Processor. In Proc. Third Caltech Conf. on VLSI, pp. 33-54. Calif Institute of Technology, Pasadena, Ca., March 1983.

Gross, T.R. and Hennessy, J.L. Optimizing Delayed Branches. In Proceedings: The 15th Annual Microprogramming Workshop Micro 15, pp. 114-120. IEEE, October 1982.

Hennessy, J. and Jouppi, N. and Przybylski, S. and Rowen, C. and Gross, T. and Baskett, F. and Gill, J. MIPS: A Microprocessor Architecture. In Proceedings: The 15th Annual Microprogramming Workshop Micro 15, pp. 17-22. IEEE, October 1982.

Hennessy, J.L. and Jouppi, N. and Baskett, F. and Gross, T.R. and Gill, J. Hardware/Software Tradeoffs for Increased Performance. In Proc. SIGARCH/SIGPLAN Symp. on Architectural Support for Prog. Languages and Operating Systems, pp. 2--11. ACM, Palo Alto, March 1982.

Hennessy, J.L. and Gross, T.R. Code Generation and Reorganization in the Presence of Pipeline Constraints. In Proc. Ninth POPL Conf., pp. 120-127. ACM, January 1982.

Other Conferences and Workshops

Lowekamp, B. and Miller, N. and Sutherland, D. and Gross, T. and Steenkiste, P. and Subhlok, J. Network-aware Parallel Computing with Remos. In Proc. 11th Workshop on Languages and Compilers for Parallel Computing, pp. (to appear). Springer Verlag, Chapel Hill, NC, 1999.

DeWitt, A. and Gross, T. The Potential of Thread-Level Speculation based on Value Profiling. Proc. 3rd Workshop on the Interaction between Compilers and Computer Architecture (ASPLOS-VIII), San Jose.

Gross, T. and Steenkiste, P. A Perspective on Network/Application Coupling. Proc. 8th NOSSDAV Workshop (Network and Operating System Services for Digital Audio and Video. Short paper.

Gross, T. Bisection debugging. In Kamkar, M. (Ed) (editor), Proc. 3rd Intl. Workshop on Automated Debugging. Linkoeping Electronic Articles in Computer and Information Science, (ISSN 1401-9841, Vol 2, No. 9), May 1997. www.ep.liu.se/ea/cis/1997/009.

Lueh, G. and Gross, T. and Adl-Tabatabai, A. Global Register Allocation Based on Graph Fusion. In Languages and Compilers for Parallel Computing (9th Intl. Workshop, LCPC'96), pp. 246--265. Springer Verlag, San Jose, CA, Aug 1997. LNCS 1239.

Stichnoth, J. and Gross, T. A communication backend for parallel language compilers. In Proc. 8th Intl. Workshop Languages and Compilers for Parallel Computing (LCPC '95), pp. 224--238. Springer Verlag, Columbus, OH, Aug 1995.

Stichnoth, J. and Gross T. A Communication Backend for Parallel Language Compilers. Proc. Fifth Workshop on Compilers for Parallel Languages, Malaga, Spain (Tech Report UMA- DAC-95/09), pages 65--77.

Adl-Tabatabai, A. and Gross, T. Engineering a Global Optimizer and Code Generator for Reuse. Proc. Fifth Workshop on Compilers for Parallel Languages, Malaga, Spain (Tech Report UMA-DAC-95/09), pages 395--407.

Gross, T. Programming Languages and Compilers for Parallel Computing. SPEEDUP 9(1):6--11, July 1995.

Suzuoka, T. and Subhlok, J. and Gross, T. Performance Debugging Based on Scalability Analysis. In Proceeding of the Fifth Symp. on the Frontiers of Massively Parallel Computation, pp. 406-413. McLean, VA, February 1995.

Gross, T. and Hinrichs, S. and Subhlok, J. Construction and Delivery of Messages for Modular Parallel Programs. In Arabina, H. (editor), Transputer Research and Applications 7, pp. 176--185. IOS Press, Amsterdam, 1994. Transputer and Occam* Engineering Series.

Stichnoth, J. and O'Hallaron, D. and Gross, T. Generating Communication for Array Statements: Design, Implementation, and Evaluation. In Conf. Record of the 6th Workshop on Languages and Compilers for Parallel Computing. Portland, OR, August 1993.

Adl-Tabatabai, A. and Gross, T. and Lueh, G. and Reinders, J. Modelling Instruction-Level Parallelism for Software Pipelining. In Proc. IFIP WG 10.3 (Concurrent Sytems) Working Conf. on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, pp. 321--330. IFIP WG 10.3, North Holland, Orlando, FL., Jan 1993.

Adl-Tabatabai, Ali-Reza and Gross, Thomas The Effects of Register Allocation and Instruction Scheduling on Symbolic Debugging. In Proc. Supercomputer Debugging Workshop '92, pp. 115--126. Los Alamos National Laboratory, Dallas, October 1992.

Gross, T. and Hinrichs, S. Debugging a Parallel Program: Capturing Inter-Processor Communication in an iWarp Torus. In Proc. Supercomputer Debugging Workshop '92, pp. 239--276. Dallas, October 1992.

Gross, T. and Hinrichs, S. and Lueh, G. and O'Hallaron, D. and Stichnoth, J. and Subhlok, J. Compiling Task and Data Parallel Programs for iWarp. In Proc. Second Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors, pp. 32-35. , SIGPLAN Notices 28(1), Jan 93, Boulder, CO, September 1992.

Hinrichs, S. and Gross, T. Utilizing New Communication Features in Compilation for Private-Memory Machines. In Proc. 5th Intl. Workshop, Languages and Compilers for Parallel Computing, chapter 35pp. 563--576. Springer, 1992.

Gross, T. and Ward, M. The Suppression of Compensation Code. In Proc. 3rd Workshop on Prog. Languages and Compilers for Parallel Computing, pp. 260-273. Univ. of California, Irvine, Irvine, CA, 1990.

Baxter, B. and Cox, G. and Gross, T. and Kung, H. T. and O'Hallaron, D. and Peterson, C. and Webb, J.A. Building Blocks for a New Generation of Application-Specific Computing Systems. In Proc. IEEE Application Specific Array Processor Conf., pp. 190-201. IEEE, Princeton, New Jersey, September 1990.

Annaratone, M. and Arnould, E. and Cohn, R. and Gross, T., Kung, H. T. and Lam, M. and Menzilcioglu, O. and Sarocky, K. and Senko, J. and Webb, J. Warp Architecture: From Prototype to Production. In Proc. 1987 National Computer Conf., pp. 133-140. AFIPS, Chicago, June 1987.

Bruegge, B. and Chang C. and Cohn R. and Gross T. and Lam M., Lieu P. and Noaman A. and Yam D. The Warp Programming Environment. In Proc. 1987 National Computer Conf., pp. 141-148. AFIPS, Chicago, June 1987.

Annaratone, M. and Arnould, E. and Cohn, R. and Gross, T., Kung, H. T. and Lam, M. and Menzilcioglu, O. and Sarocky, K. and Senko, J. and Webb, J. Architecture of Warp. In Proc. Compcon Spring 87, pp. 264-267. IEEE Computer Society, San Francisco, February 1987.

Bruegge, B. and Chang, C. and Cohn, R. and Gross, T. and Lam, M. and Lieu, P. and Noaman, A. and Yam, D. Programming Warp. In Proc. Compcon Spring 87, pp. 268--271. IEEE Computer Society, San Francisco, February 1987.

Gross, T. and Kung, H. T. and Lam, M. and Webb, J. Warp as a Machine for Low-Level Vision. In Proc. IEEE Intl. Conf. on Robotics and Automation, pp. 790-800. March 1985.

Gross, T. Hennessy, J. Jouppi, N. Przybylski, S. Rowen, C. Agarwal, A. Steenkiste, P. A Perspective on High-Level Language Architecture. In Proc. Intl. Workshop on High-Level Computer Architecture, pp. 3.12 -3.14. University of Maryland, Los Angeles, May 1984.

Hennessy, J. and Jouppi, N. and Przybylski, S. and Rowen, C. and Gross, T. Performance Issues in VLSI Processor Design. In Proc. Int. Conf. on Computer Design, pp. 153--156. IEEE, Port Chester, N.Y.,, October 1983.

Gross, T. Code Optimization Techniques for Pipelined Architectures. In Proc. Compcon Spring 83, pp. 278--285. IEEE Computer Society, San Francisco, March 1983.

Hennessy, J.L. and Jouppi, N. and Gill, J. and Baskett, F., Strong, A. and Gross, T.R. and Rowen, C. and Leonard, J. The MIPS Machine. In Proc. Compcon Spring 82, pp. 2--7. IEEE, San Francisco, February 1982.

Other Publications

DeWitt, T. and Gross, T. and Lowekamp, B. and Miller, N. and Steenkiste, P. and Subhlok, J. and Sutherland, D. ReMoS: A Resource Monitoring System for Network-Aware Applications. Technical Report CMU-CS-97-194, Carnegie Mellon School of Computer Science, Dec 1997.

Lueh, G. and Gross, T. and Adl-Tabatabai, A. Global Register Allocation Based on Graph Fusion. Technical Report 96-106, Carnegie Mellon University, School of Computer Science, March 1996.

Dinda, P. and Gross, T. and O'Hallaron, D. and Segall, E. and Stichnoth, J. and Subhlok, J. and Webb, J. and Yang, B. The CMU Task Parallel Program Suite. Technical Report CMU-CS-94-131, School of Computer Science, Carnegie Mellon University, March 1994.

Adl-Tabatabai, A. and Gross, T. Symbolic Debugging of Globally Optimized Code: Data Value Problems and Their Solutions. Technical Report CMU-CS-94-105, CMU, January 1994.

Gross, T. Core Assembly Language Instruction Interface for RISC-style Microprocessors. Technical Note, Carnegie Mellon University.

Steenkiste, P. and Gross, T. High-Level Language architecture from a RISC perspective. In Militunovic, V. (editor), High-Level Language Computer Architecture, pp. 107-130. Computer Science Press, 1988.

Gill, J. and Gross, T. and Hennessy, J. and Jouppi, N., Przybylski, S. and Rowen, C. Summary of MIPS Instructions. Technical Note 83-237, Stanford University, November 1983.

Gross, T. Guide to MIPS Software. 1983.Internal Report. .

Gross, T. and Gill, J. . A Short Guide to MIPS Assembly Instructions. Technical Note 83-236, Stanford University, November 1983.


[School of Computer Science | Carnegie Mellon ]



Mar 17, 1999