| |
Papers by Year (Most Recent on the top)
| 2007 |
(in press) Distributed Watchpoints: Debugging Large Multi-Robot Systems | bib abstract | |
Michael De Rosa, Seth Copen Goldstein, Peter Lee, Jason D. Campbell, Padmanabhan Pillai, and Todd C. Mowry.
International Journal of Robotics Research,
2007.
Also appeared as Distributed Watchpoints: Debugging Large Multi-Robot Systems, (icra07).
|
| hide Abstract | Distributed systems frequently exhibit properties of interest which span multiple entities. These properties cannot easily be detected from any single entity, but can be readily be detected by combining the knowledge of multiple entities. Testing for distributed properties is especially important in debugging or verifying software for modular robots. We have developed a technique we call distributed watchpoint triggers which can efficiently recognize distributed conditions. Our watchpoint description language can handle a variety of temporal, spatial, and logical properties spanning multiple robots. This paper presents the specification language, describes the distributed online mechanism for detecting distributed conditions in a running system, and evaluates the performance of our implementation. |
| | @article{mderosa-ijrr-2007,
also = {Distributed Watchpoints: Debugging Large Multi-Robot
Systems, (icra07)},
abstract = {Distributed systems frequently exhibit properties of
interest which span multiple entities. These properties cannot
easily be detected from any single entity, but can be readily be
detected by combining the knowledge of multiple entities. Testing
for distributed properties is especially important in debugging
or verifying software for modular robots. We have developed a
technique we call distributed watchpoint triggers which can
efficiently recognize distributed conditions. Our watchpoint
description language can handle a variety of temporal, spatial,
and logical properties spanning multiple robots. This paper
presents the specification language, describes the distributed
online mechanism for detecting distributed conditions in a
running system, and evaluates the performance of our
implementation.},
author = {De Rosa, Michael and Goldstein, Seth Copen and Lee, Peter
and Campbell, Jason D. and Pillai, Padmanabhan and Mowry, Todd
C.},
journal = {International Journal of Robotics Research},
keywords = {Debugging, Distributed Systems},
title = {Distributed Watchpoints: Debugging Large Multi-Robot
Systems},
year = {2007}
}
|
|
Operation Chaining Asynchronous Pipelined Circuits | bib | |
Girish Venkataramani and Seth Copen Goldstein.
In ICCAD,
November, 2007.
|
| @inproceedings{venkataramani-iccad07,
author = {Venkataramani, Girish and Goldstein, Seth Copen},
title = {Operation Chaining Asynchronous Pipelined Circuits},
booktitle = {ICCAD},
month = {November},
year = {2007},
keywords = {Asychronous Circuits, CAD, Global Critical Path}
}
|
|
(in press) A Modular Robotic System Using Magnetic Force Effectors | bib abstract | |
Brian Kirby, Burak Aksak, James F. Hoburg, Todd C. Mowry, and Padmanabhan Pillai.
In In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS '07),
October, 2007.
|
| hide Abstract | One of the primary impediments to building ensembles with many modular robots is the complexity and number of mechanical mechanisms used to construct the individual modules. As part of the Claytronics project—which aims to build very large ensembles of modular robots—we investigate how to simplify each module by eliminating moving parts and reducing the number of mechanical mechanisms on each robot by using force-at-a-distance actuators. Additionally, we are also investigating the feasibility of using these unary actuators to improve docking performance, implement intermodule adhesion, power transfer, communication, and sensing. |
| | @inproceedings{bkirby-iros07,
author = {Kirby, Brian and Aksak, Burak and Hoburg, James F. and
Mowry, Todd C. and Pillai, Padmanabhan},
title = {A Modular Robotic System Using Magnetic Force Effectors},
booktitle = {In Proceedings of the IEEE International Conference on
Intelligent Robots and Systems (IROS '07)},
year = {2007},
month = {October},
abstract = {One of the primary impediments to building ensembles
with many modular robots is the complexity and number of
mechanical mechanisms used to construct the individual modules.
As part of the Claytronics project---which aims to build very
large ensembles of modular robots---we investigate how to
simplify each module by eliminating moving parts and reducing the
number of mechanical mechanisms on each robot by using
force-at-a-distance actuators. Additionally, we are also
investigating the feasibility of using these unary actuators to
improve docking performance, implement intermodule adhesion,
power transfer, communication, and sensing.},
keywords = {Actuation, Adhesion, Claytronics, Robotics},
url = {http://www.cs.cmu.edu/~claytronics/papers/bkirby-iros07.pdf}
}
|
|
(in press) Declarative Programming for Modular Robots | bib abstract | |
Michael P. Ashley-Rollman, Michael De Rosa, Siddhartha S. Srinivasa, Padmanabhan Pillai, Seth Copen Goldstein, and Jason D. Campbell.
In Workshop on Self-Reconfigurable Robots/Systems and Applications at IROS '07,
October, 2007.
|
| hide Abstract | Because of the timing, complexity, and asynchronicity challenges common in modular robot software we have recently begun to explore new programming models for modular robot ensembles. In this paper we apply two of those models to a metamodule-based shape planning algorithm and comment on the differences between the two approaches. Our results suggest that declarative programming can provide several advantages over more traditional imperative approaches, and that the differences between declarative programming styles can themselves contribute leverage to different parts of the problem domain. |
| | @inproceedings{ashley-rollman-derosa-iros07wksp,
author = {Ashley-Rollman, Michael P. and De Rosa, Michael and
Srinivasa, Siddhartha S. and Pillai, Padmanabhan and Goldstein,
Seth Copen and Campbell, Jason D.},
title = {Declarative Programming for Modular Robots},
booktitle = {Workshop on Self-Reconfigurable Robots/Systems and
Applications at {IROS '07}},
year = {2007},
month = {October},
keywords = {Programming Models, Planning, Claytronics},
abstract = {Because of the timing, complexity, and asynchronicity
challenges common in modular robot software we have recently
begun to explore new programming models for modular robot
ensembles. In this paper we apply two of those models to a
metamodule-based shape planning algorithm and comment on the
differences between the two approaches. Our results suggest that
declarative programming can provide several advantages over more
traditional imperative approaches, and that the differences
between declarative programming styles can themselves contribute
leverage to different parts of the problem domain.}
}
|
|
(in press) Electrostatic Latching for Inter-module Adhesion, Power Transfer, and Communication in Modular Robots | bib abstract | |
Mustafa Emre Karagozler, Jason D. Campbell, Gary K. Fedder, Seth Copen Goldstein, Michael Philetus Weller, and Byung W. Yoon.
In Proceedings of the IEEE International Conference on Intelligent Robots and Systems IROS '07,
October, 2007.
|
| hide Abstract | A simple and robust inter-module latch is possibly the most important component of a modular robotic system. This paper describes a latch based on capacitive coupling which not only provides significant adhesion forces, but can also be used for inter-module power transmission and communication. The key insight that enables electrostatic adhesion to be effective at the macroscale is to combine flexible electrodes with a geometery that uses shear forces to provide adhesion. To measure the effectiveness of our latch we incorporated it into a 28cm x 28cm x 28cm modular robot. The result is a latch which requires almost zero static power and yet can hold over 0.6N/cm^2 of latch area. |
| | @inproceedings{karagozler-iros07,
author = {Karagozler, Mustafa Emre and Campbell, Jason D. and
Fedder, Gary K. and Goldstein, Seth Copen and Weller, Michael
Philetus and Yoon, Byung W.},
title = {Electrostatic Latching for Inter-module Adhesion, Power
Transfer, and Communication in Modular Robots},
booktitle = {Proceedings of the IEEE International Conference on
Intelligent Robots and Systems IROS '07},
year = {2007},
month = {October},
abstract = {A simple and robust inter-module latch is possibly the
most important component of a modular robotic system. This paper
describes a latch based on capacitive coupling which not only
provides significant adhesion forces, but can also be used for
inter-module power transmission and communication. The key
insight that enables electrostatic adhesion to be effective at
the macroscale is to combine flexible electrodes with a geometery
that uses shear forces to provide adhesion. To measure the
effectiveness of our latch we incorporated it into a 28cm x 28cm
x 28cm modular robot. The result is a latch which requires almost
zero static power and yet can hold over 0.6N/cm^2 of latch
area.},
keywords = {Actuation, Adhesion, Claytronics, Robotics},
url = {http://www.cs.cmu.edu/~claytronics/papers/karagozler-iros07.pdf}
}
|
|
(in press) Internal Localization of Modular Robot Ensembles | bib abstract | |
Stanislav Funiak, Padmanabhan Pillai, Jason D. Campbell, and Seth Copen Goldstein.
In Workshop on Self-Reconfiguring Modular Robotics at the IEEE International Conference on Intelligent Robots and Systems (IROS) '07,
October, 2007.
|
| hide Abstract | The determination of the relative position and pose of every robot in a modular robotic ensemble is a necessary preliminary step for most modular robotic tasks. Localization is particularly important when the modules make local noisy observations and are not significantly constrained by inter-robot latches. In this paper, we propose a robust hierarchical approach to the internal localization problem that uses normalized cut to guide an incremental solution to incorporating the observations. A key component of our algorithm is a method to reduce the cost of normalized cut computations. The result is a scalable algorithm that works for large, non-homogeneous ensembles. We evaluate our algorithm in simulation on ensembles of up to 10,000 modules. |
| | @inproceedings{funiak-iros07,
author = {Funiak, Stanislav and Pillai, Padmanabhan and Campbell,
Jason D. and Goldstein, Seth Copen},
title = {Internal Localization of Modular Robot Ensembles},
booktitle = {Workshop on Self-Reconfiguring Modular Robotics at the
IEEE International Conference on Intelligent Robots and Systems
(IROS) '07},
year = {2007},
month = {October},
abstract = {The determination of the relative position and pose of
every robot in a modular robotic ensemble is a necessary
preliminary step for most modular robotic tasks. Localization is
particularly important when the modules make local noisy
observations and are not significantly constrained by inter-robot
latches. In this paper, we propose a robust hierarchical approach
to the {\em internal localization} problem that uses normalized
cut to guide an incremental solution to incorporating the
observations. A key component of our algorithm is a method to
reduce the cost of normalized cut computations. The result is a
scalable algorithm that works for large, non-homogeneous
ensembles. We evaluate our algorithm in simulation on ensembles
of up to 10,000 modules.},
keywords = {Probabilistic Inference, Sensing, Localization,
Distributed Algorithms, Claytronics}
}
|
|
(in press) Meld: A Declarative Approach to Programming Ensembles | bib abstract | |
Michael P. Ashley-Rollman, Seth Copen Goldstein, Peter Lee, Todd C. Mowry, and Padmanabhan Pillai.
In Proceedings of the IEEE International Conference on Intelligent Robots and Systems IROS '07,
October, 2007.
|
| hide Abstract | This paper presents Meld, a programming language for modular robots, i.e., for independently executing robots where inter-robot communication is limited to immediate neighbors. Meld is a declarative language, based on P2, a logic-programming language originally designed for programming overlay networks. By using logic programming, the code for an ensemble of robots can be written from a global perspective, as opposed to a large collection of independent robot views. This greatly simplifies the thought process needed for programming large ensembles. Initial experience shows that this also leads to a considerable reduction in code size and complexity. An initial implementation of Meld has been completed and has been used to demonstrate its effectiveness in the Claytronics simulator. Early results indicate that Meld programs are considerably more concise (more than 20x shorter) than programs written in C++, while running nearly as efficiently. |
| | @inproceedings{ashley-rollman-iros07,
author = {Ashley-Rollman, Michael P. and Goldstein, Seth Copen and
Lee, Peter and Mowry, Todd C. and Pillai, Padmanabhan},
title = {Meld: A Declarative Approach to Programming Ensembles},
booktitle = {Proceedings of the IEEE International Conference on
Intelligent Robots and Systems {IROS '07}},
year = {2007},
month = {October},
keywords = {Programming Languages, Claytronics},
abstract = {This paper presents Meld, a programming language for
modular robots, i.e., for independently executing robots where
inter-robot communication is limited to immediate neighbors. Meld
is a declarative language, based on P2, a logic-programming
language originally designed for programming overlay networks. By
using logic programming, the code for an ensemble of robots can
be written from a global perspective, as opposed to a large
collection of independent robot views. This greatly simplifies
the thought process needed for programming large ensembles.
Initial experience shows that this also leads to a considerable
reduction in code size and complexity. An initial implementation
of Meld has been completed and has been used to demonstrate its
effectiveness in the Claytronics simulator. Early results
indicate that Meld programs are considerably more concise (more
than 20x shorter) than programs written in C++, while running
nearly as efficiently.}
}
|
|
(in press) Movement Primitives for an Orthogonal Prismatic Closed-Lattice-Constrained Self-Reconfiguring Module | bib abstract | |
Michael Philetus Weller, Mustafa Emre Karagozler, Brian Kirby, Jason D. Campbell, and Seth Copen Goldstein.
In Workshop on Self-Reconfiguring Modular Robotics at the IEEE International Conference on Intelligent Robots and Systems (IROS) '07,
October, 2007.
|
| hide Abstract | We describe a new set of prismatic movement primitives for cubic modular robots. Our approach appears more practical than previous metamodule-based approaches. We also describe recent hardware developments in our cubic robot modules that have sufficient stiffness and actuator strength so that when they work together they can realize, in earth’s gravity, all of the motion primitives we describe here. |
| | @inproceedings{weller-iros07,
author = {Weller, Michael Philetus and Karagozler, Mustafa Emre and
Kirby, Brian and Campbell, Jason D. and Goldstein, Seth Copen},
title = {Movement Primitives for an Orthogonal Prismatic
Closed-Lattice-Constrained Self-Reconfiguring Module},
booktitle = {Workshop on Self-Reconfiguring Modular Robotics at the
IEEE International Conference on Intelligent Robots and Systems
(IROS) '07},
year = {2007},
month = {October},
keywords = {Adhesion, Robotics, Planning, Claytronics},
abstract = {We describe a new set of prismatic movement primitives
for cubic modular robots. Our approach appears more practical
than previous metamodule-based approaches. We also describe
recent hardware developments in our cubic robot modules that have
sufficient stiffness and actuator strength so that when they work
together they can realize, in earth's gravity, all of the motion
primitives we describe here.}
}
|
|
Global Critical Path: A Tool for System-Level Timing Analysis | pdf bib abstract | |
Girish Venkataramani, Mihai Budiu, Tiberiu Chelcea, and Seth Copen Goldstein.
In Proceedings of the 44th ACM/IEEE Design Automation Conference,
pages 783–786, June, 2007.
|
| hide Abstract | An effective method for focusing optimization effort on the most important parts of a design is to examine those elements on the critical path. Traditionally, the critical path is defined at the RTL level, as the longest path in the combinational logic between clocked reisters. In this paper, we present a system-level timing analysis technique to define the concept of a Global Critical Path (GCP), for predicting system-level performance. We show how the GCP can be used as a theoretical and practical tool for understanding, summarizing and optimizing the behavior of highly concurrent self-timed circuits. We formally define the GCP and show how it can be constructed using a discrete event model and hardware profiling techniques. The GCP provides valuable insight into the control-path behavior of circuits and in finding system-level bottlenecks. We have incorporated the GCP construction and analysis framework into a high-level synthesis and simulation toolchain, thus enabling complete automation in modeling, analysis and optimization. |
| | @inproceedings{dac07-gcp,
author = {Venkataramani, Girish and Budiu, Mihai and Chelcea,
Tiberiu and Goldstein, Seth Copen},
title = {Global Critical Path: A Tool for System-Level Timing
Analysis},
booktitle = {Proceedings of the 44th ACM/IEEE Design Automation
Conference},
year = {2007},
month = {June},
address = {San Diego, CA},
pages = {783--786},
abstract = {An effective method for focusing optimization effort on
the most important parts of a design is to examine those elements
on the critical path. Traditionally, the critical path is defined
at the RTL level, as the longest path in the combinational logic
between clocked reisters. In this paper, we present a
system-level timing analysis technique to define the concept of a
Global Critical Path (GCP), for predicting system-level
performance. We show how the GCP can be used as a theoretical and
practical tool for understanding, summarizing and optimizing the
behavior of highly concurrent self-timed circuits. We formally
define the GCP and show how it can be constructed using a
discrete event model and hardware profiling techniques. The GCP
provides valuable insight into the control-path behavior of
circuits and in finding system-level bottlenecks. We have
incorporated the GCP construction and analysis framework into a
high-level synthesis and simulation toolchain, thus enabling
complete automation in modeling, analysis and optimization.},
url = {http://www.cs.cmu.edu/~seth/papers/dac07-gcp.pdf},
keywords = {Asychronous Circuits, CAD, Global Critical Path, System
modeling, Hardware profiling}
}
|
|
Self-Resetting Latches for Asynchronous Micro-Pipelines | pdf bib abstract | |
Tiberiu Chelcea, Girish Venkataramani, and Seth Copen Goldstein.
In Proceedings of the 44th ACM/IEEE Design Automation Conference,
pages 986–989, June, 2007.
|
| hide Abstract | Asynchronous circuits are increasingly attractive as low power or high-performance replacements to synchronous designs. A key part of these circuits are asynchronous micropipelines; unfortunatelly, the existing micropipeline styles either improve performance or decrease power consumption, but not both. Very often, the pipeline register plays a crucial role in these cost metrics. In this paper we introduce a new register design, called self-resetting latches, for asynchronous micropipelines which bridges the gap between fast, but power hungry, latch-based designs and slow, but low power, flip-flop designs. The energy-delay metric for large asynchronous systems implemented with self-resetting latches is, on average, 41% better than latch-based designs and 15% better than flip-flop designs. |
| | @inproceedings{dac07-sr,
author = {Chelcea, Tiberiu and Venkataramani, Girish and Goldstein,
Seth Copen},
title = {Self-Resetting Latches for Asynchronous Micro-Pipelines},
booktitle = {Proceedings of the 44th ACM/IEEE Design Automation
Conference},
year = {2007},
month = {June},
address = {San Diego, CA},
pages = {986--989},
keywords = {Asychronous Circuits},
abstract = {Asynchronous circuits are increasingly attractive as low
power or high-performance replacements to synchronous designs. A
key part of these circuits are asynchronous micropipelines;
unfortunatelly, the existing micropipeline styles either improve
performance or decrease power consumption, but not both. Very
often, the pipeline register plays a crucial role in these cost
metrics. In this paper we introduce a new register design, called
self-resetting latches, for asynchronous micropipelines which
bridges the gap between fast, but power hungry, latch-based
designs and slow, but low power, flip-flop designs. The
energy-delay metric for large asynchronous systems implemented
with self-resetting latches is, on average, 41\% better than
latch-based designs and 15\% better than flip-flop designs.},
url = {http://www.cs.cmu.edu/~seth/papers/dac07-sr.pdf}
}
|
|
Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis | pdf bib abstract | |
Tiberiu Chelcea, Girish Venkataramani, and Seth Copen Goldstein.
In Proceedings of the 13th IEEE International Symposium on Asynchronous Circuits and Systems,
pages 117–128, March, 2007.
|
| hide Abstract | Future deep sub-micron technologies will be characterized by large parametric variations, which could make asynchronous design an attractive solution for use on large scale. However, the investment in asynchronous CAD tools does not approach that in synchronous ones. Even when asynchronous tools leverage existing synchronous toolflows, they introduce large area and speed overheads. This paper proposes several heuristic and optimal algorithms, based on timing interval analysis, for improving existing asynchronous CAD solutions by optimizing area. The optimized circuits are 2.4 times smaller for an optimal algorithm and 1.8 times smaller for a heuristic one than the existing solutions. The optimized circuits are also shown to be resilient to large parametric variations, yielding better average-case latencies than their synchronous counterparts. |
| | @inproceedings{chelcea-async07,
author = {Chelcea, Tiberiu and Venkataramani, Girish and Goldstein,
Seth Copen},
title = {Area Optimizations for Dual-Rail Circuits Using
Relative-Timing Analysis},
booktitle = {Proceedings of the 13th IEEE International Symposium on
Asynchronous Circuits and Systems},
year = {2007},
address = {Berkeley, CA},
month = {March},
pages = {117--128},
abstract = {Future deep sub-micron technologies will be
characterized by large parametric variations, which could make
asynchronous design an attractive solution for use on large
scale. However, the investment in asynchronous CAD tools does not
approach that in synchronous ones. Even when asynchronous tools
leverage existing synchronous toolflows, they introduce large
area and speed overheads. This paper proposes several heuristic
and optimal algorithms, based on timing interval analysis, for
improving existing asynchronous CAD solutions by optimizing area.
The optimized circuits are 2.4 times smaller for an optimal
algorithm and 1.8 times smaller for a heuristic one than the
existing solutions. The optimized circuits are also shown to be
resilient to large parametric variations, yielding better
average-case latencies than their synchronous counterparts.},
url = {http://www.cs.cmu.edu/~seth/papers/chelcea-async07.pdf},
keywords = {Asychronous Circuits, CAD}
}
|
| 2006 |
A Better Global Progressive Allocator | pdf bib abstract | |
David Ryan Koes and Seth Copen Goldstein.
In LCTES 06 Student Poster Session,
2006.
|
| hide Abstract | We present an improvement to the simultaneous heuristic allocator component of the global progressive register allocator described in our previous work \cite{koes-pldi2006}. Our improved allocator decomposes the control flow graph into linear traces which are allocated in the same manner as a single basic block. We investigate two methods for handling the control flow within the traces both of which produce better quality allocations than the simultaneous heuristic allocator. |
| | @inproceedings{koes-lctes06,
author = {Koes, David Ryan and Goldstein, Seth Copen},
title = {A Better Global Progressive Allocator},
publisher = {Academia Press},
url = {http://www.cs.cmu.edu/~seth/papers/koes-lctes06.pdf},
booktitle = {LCTES 06 Student Poster Session},
year = {2006},
abstract = {We present an improvement to the simultaneous heuristic
allocator component of the global progressive register allocator
described in our previous work \cite{koes-pldi2006}. Our improved
allocator decomposes the control flow graph into linear traces
which are allocated in the same manner as a single basic block.
We investigate two methods for handling the control flow within
the traces both of which produce better quality allocations than
the simultaneous heuristic allocator.},
keywords = {Compilers:Register Allocation}
}
|
|
A global progressive register allocator | pdf bib abstract | |
David Ryan Koes and Seth Copen Goldstein.
In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation (PLDI'06),
pages 204–215, 2006.
|
| hide Abstract | This paper describes a global progressive register allocator, a register allocator that uses an expressive model of the register allocation problem to quickly find a good allocation and then progressively find better allocations until a provably optimal solution is found or a preset time limit is reached. The key contributions of this paper are an expressive model of global register allocation based on multi-commodity network flows that explicitly represents spill code optimization, register preferences, copy insertion, and constant rematerialization; two fast, but effective, heuristic allocators based on this model; and a more elaborate progressive allocator that uses Lagrangian relaxation to compute the optimality of its allocations. Our progressive allocator demonstrates code size improvements as large as 16.75% compared to a traditional graph allocator. On average, we observe an initial improvement of 3.47%, which increases progressively to 6.84% as more time is permitted for compilation. |
| | @inproceedings{koes-pldi2006,
author = {Koes, David Ryan and Goldstein, Seth Copen},
title = {A global progressive register allocator},
booktitle = {Proceedings of the 2006 ACM SIGPLAN conference on
Programming language design and implementation (PLDI'06)},
year = {2006},
isbn = {1-59593-320-4},
pages = {204--215},
doi = {http://doi.acm.org/10.1145/1133981.1134006},
publisher = {ACM Press},
address = {New York, NY},
abstract = {This paper describes a {\em global progressive register
allocator}, a register allocator that uses an expressive model of
the register allocation problem to quickly find a good allocation
and then progressively find better allocations until a provably
optimal solution is found or a preset time limit is reached. The
key contributions of this paper are an expressive model of global
register allocation based on multi-commodity network flows that
explicitly represents spill code optimization, register
preferences, copy insertion, and constant rematerialization; two
fast, but effective, heuristic allocators based on this model;
and a more elaborate progressive allocator that uses Lagrangian
relaxation to compute the optimality of its allocations. Our
progressive allocator demonstrates code size improvements as
large as 16.75\% compared to a traditional graph allocator. On
average, we observe an initial improvement of 3.47\%, which
increases progressively to 6.84\% as more time is permitted for
compilation.},
keywords = {Compilers:Register Allocation},
url = {http://www.cs.cmu.edu/~seth/papers/koes-pldi2006.pdf}
}
|
|
Hardware Compilation of Application-Specific Memory Access Interconnect | pdf bib abstract | |
Girish Venkataramani, Tobias Bjerregaard, Tiberiu Chelcea, and Seth Copen Goldstein.
IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems,
25(5):756–771,2006.
|
| hide Abstract | A major obstacle to successful high-level synthesis (HLS) of large-scale application-specified integrated circuit systems is the presence of memory accesses to a shared-memory subsystem. The latency to access memory is often not statically predictable, which creates problems for scheduling operations dependent on memory reads. More fundamental is that dependences between accesses may not be statically provable (e.g., if the specification language permits pointers), which introduces memory-consistency problems. Addressing these issues with static scheduling results in overly conservative circuits, and thus, most state-of-the-art HLS tools limit memory systems to those that have predictable latencies and limit programmers to specifications that forbid arbitrary memory-reference patterns. A new HLS framework for the synthesis and optimization of memory accesses (SOMA) is presented. SOMA enables specifications to include arbitrary memory references (e.g., pointers) and allows the memory system to incorporate features that might cause the latency of a memory access to vary dynamically. This results in raising the level of abstraction in the input specification, enabling faster design times. SOMA synthesizes a memory access network (MAN) architecture that facilitates dynamic scheduling and ordering of memory accesses. The paper describes a basic MAN construction technique that illustrates how dynamic ordering helps in efficiently maintaining memory consistency and how dynamic scheduling helps alleviate the variable-latency problem. Then, it is shown how static analysis of the access patterns can be used to optimize the MAN. One optimization changes the MAN interconnect topology to increase concurrence. A second optimization reduces the synchronization overhead necessary to maintain memory consistency. Postlayout experiments demonstrate that SOMA’s application-specific MAN construction significantly improves power and performance for a range of benchmarks. |
| | @article{venkataramani-tcad06,
title = {Hardware Compilation of Application-Specific Memory Access
Interconnect},
author = {Venkataramani, Girish and Bjerregaard, Tobias and Chelcea,
Tiberiu and Goldstein, Seth Copen},
journal = {IEEE Transactions on Computer Aided Design of Integrated
Circuits and Systems},
year = {2006},
volume = {25},
number = {5},
pages = {756--771},
issn = {0278-0070},
abstract = {{A major obstacle to successful high-level synthesis
(HLS) of large-scale application-specified integrated circuit
systems is the presence of memory accesses to a shared-memory
subsystem. The latency to access memory is often not statically
predictable, which creates problems for scheduling operations
dependent on memory reads. More fundamental is that dependences
between accesses may not be statically provable (e.g., if the
specification language permits pointers), which introduces
memory-consistency problems. Addressing these issues with static
scheduling results in overly conservative circuits, and thus,
most state-of-the-art HLS tools limit memory systems to those
that have predictable latencies and limit programmers to
specifications that forbid arbitrary memory-reference patterns. A
new HLS framework for the synthesis and optimization of memory
accesses (SOMA) is presented. SOMA enables specifications to
include arbitrary memory references (e.g., pointers) and allows
the memory system to incorporate features that might cause the
latency of a memory access to vary dynamically. This results in
raising the level of abstraction in the input specification,
enabling faster design times. SOMA synthesizes a memory access
network (MAN) architecture that facilitates dynamic scheduling
and ordering of memory accesses. The paper describes a basic MAN
construction technique that illustrates how dynamic ordering
helps in efficiently maintaining memory consistency and how
dynamic scheduling helps alleviate the variable-latency problem.
Then, it is shown how static analysis of the access patterns can
be used to optimize the MAN. One optimization changes the MAN
interconnect topology to increase concurrence. A second
optimization reduces the synchronization overhead necessary to
maintain memory consistency. Postlayout experiments demonstrate
that SOMA's application-specific MAN construction significantly
improves power and performance for a range of benchmarks.}},
keywords = {Asychronous Circuits, Spatial
Computing,Phoenix,Network-on-a-chip},
url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-tcad06.pdf}
}
|
|
Leveraging Protocol Knowledge in Slack Matching | pdf bib abstract | |
Girish Venkataramani and Seth Copen Goldstein.
In IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
November, 2006.
|
| hide Abstract | Stalls, due to mis-matches in communication rates, are a major performance obstacle in pipelined circuits. If the rate of data production is faster than the rate of consumption, the resulting design performs slower than when the communication rate is matched. This can be remedied by inserting pipeline buffers (to temporarily hold data), allowing the producer to proceed if the consumer is not ready to accept data. The problem of deciding which channels need these buffers (and how many) for an arbitrary communication profile is called the slack matching problem; the optimal solution to this problem has been shown to be NP-complete. In this paper, we present a heuristic that uses knowledge of the communication protocol to explicitly model these bottlenecks, and an iterative algorithm to progressively remove these bottlenecks by inserting buffers. We apply this algorithm to asynchronous circuits, and show that it naturally handles large designs with arbitrarily cyclic and acyclic topologies, which exhibit various types of control choice. The heuristic is efficient, achieving linear time complexity in practice, and produces solutions that (a) achieve up to 60% performance speedup on large media processing kernels, and (b) can either be verified to be optimal, or the approximation margin can be bounded. |
| | @inproceedings{venkataramani-iccad06,
title = {Leveraging Protocol Knowledge in Slack Matching},
author = {Venkataramani, Girish and Goldstein, Seth Copen},
booktitle = {IEEE/ACM International Conference on Computer-Aided
Design (ICCAD)},
year = {2006},
address = {San Jose, CA},
month = {November},
abstract = {{Stalls, due to mis-matches in communication rates, are
a major performance obstacle in pipelined circuits. If the rate
of data production is faster than the rate of consumption, the
resulting design performs slower than when the communication rate
is matched. This can be remedied by inserting pipeline buffers
(to temporarily hold data), allowing the producer to proceed if
the consumer is not ready to accept data. The problem of deciding
which channels need these buffers (and how many) for an arbitrary
communication profile is called the slack matching problem; the
optimal solution to this problem has been shown to be
NP-complete. \par In this paper, we present a heuristic that uses
knowledge of the communication protocol to explicitly model these
bottlenecks, and an iterative algorithm to progressively remove
these bottlenecks by inserting buffers. We apply this algorithm
to asynchronous circuits, and show that it naturally handles
large designs with arbitrarily cyclic and acyclic topologies,
which exhibit various types of control choice. The heuristic is
efficient, achieving linear time complexity in practice, and
produces solutions that (a) achieve up to 60\% performance
speedup on large media processing kernels, and (b) can either be
verified to be optimal, or the approximation margin can be
bounded. }},
keywords = {Asychronous Circuits, Spatial Computing, CAD, Global
Critical Path},
url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-iccad06.pdf}
}
|
|
Brain in a Bottle | pdf bib | |
Seth Copen Goldstein.
In Wild and Crazy Ideas Session of ASPLOS,
October, 2006.
|
| @inproceedings{goldstein-waci06,
author = {Goldstein, Seth Copen},
title = {Brain in a Bottle},
booktitle = {Wild and Crazy Ideas Session of ASPLOS},
year = {2006},
month = {October},
url = {http://www.cs.cmu.edu/~seth/papers/goldstein-waci06.pdf},
keywords = {Brain, Parallel Computing, Self-Assembly}
}
|
|
Hierarchical Motion Planning for Self-reconfigurable Modular Robots | pdf bib | |
Preethi Srinivas Bhat, James Kuffner, Seth Copen Goldstein, and Siddhartha Srinivasa.
In 2006 IEEE/RSJ International Confernce on Intelligent Robots and Systems (IROS),
October, 2006.
|
| @inproceedings{bhat06,
author = {Bhat, Preethi Srinivas and Kuffner, James and Goldstein,
Seth Copen and Srinivasa, Siddhartha},
title = {Hierarchical Motion Planning for Self-reconfigurable
Modular Robots},
booktitle = {2006 IEEE/RSJ International Confernce on Intelligent
Robots and Systems (IROS)},
year = {2006},
month = {October},
keywords = {Claytronics, Planning, Modular Robotics},
url = {http://www.cs.cmu.edu/~seth/papers/bhat06.pdf}
}
|
|
Tartan: Evaluating Spatial Computation for Whole Program Execution | pdf bib abstract | |
Mahim Mishra, Timothy J Callahan, Tiberiu Chelcea, Girish Venkataramani, Mihai Budiu, and Seth Copen Goldstein.
In 12th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS),
pages 163–174, October, 2006.
|
| hide Abstract | Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system. Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energy-efficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads. |
| | @inproceedings{mahim-asplos06,
title = {Tartan: Evaluating Spatial Computation for Whole Program
Execution},
author = {Mishra, Mahim and Callahan, Timothy J and Chelcea, Tiberiu
and Venkataramani, Girish and Budiu, Mihai and Goldstein, Seth
Copen},
booktitle = {12th ACM International Conference on Architecture
Support for Programming Languages and Operating Systems
(ASPLOS)},
year = {2006},
pages = {163--174},
address = {San Jose, CA},
month = {October},
abstract = {Spatial Computing (SC) has been shown to be an
energy-efficient model for implementing program kernels. In this
paper we explore the feasibility of using SC for more than small
kernels. To this end, we evaluate the performance and energy
efficiency of entire applications on Tartan, a general-purpose
architecture which integrates a reconfigurable fabric (RF) with a
superscalar core. Our compiler automatically partitions and
compiles an application into an instruction stream for the core
and a configuration for the RF. We use a detailed simulator to
capture both timing and energy numbers for all parts of the
system. \par Our results indicate that a hierarchical RF
architecture, designed around a scalable interconnect, is
instrumental in harnessing the benefits of spatial computation.
The interconnect uses static configuration and routing at the
lower levels and a packet-switched, dynamically-routed network at
the top level. Tartan is most energy-efficient when almost all of
the application is mapped to the RF, indicating the need for the
RF to support most general-purpose programming constructs. Our
initial investigation reveals that such a system can provide, on
average, an order of magnitude improvement in energy-delay
compared to an aggressive superscalar core on single-threaded
workloads.},
keywords = {Asychronous Circuits, Spatial Computing, Reconfigurable
Computing,Phoenix, Tartan},
url = {http://www.cs.cmu.edu/~seth/papers/mahim-asplos06.pdf}
}
|
|
Distributed Watchpoints: Debugging Very Large Ensembles of Robots | pdf bib abstract talk | |
Michael De Rosa, Seth Copen Goldstein, Peter Lee, Jason D. Campbell, and Padmanabhan Pillai.
In Robotics: Science and Systems Workshop on Self-Reconfigurable Modular Robots,
August, 2006.
|
| | | |
|
Modeling the Global Critical Path in Concurrent Systems | pdf bib abstract | |
Girish Venkataramani, Tiberiu Chelcea, Mihai Budiu, and Seth Copen Goldstein.
Carnegie Mellon University Technical Report No. CMU-CS-06-144,
August, 2006.
|
| hide Abstract | We show how the global critical path can be used as a practical tool for understanding, optimizing and summarizing the behavior of highly concurrent self-timed circuits. Traditionally, critical path analysis has been applied to DAGs, and thus was constrained to combinatorial sub-circuits. We formally define the global critical path (GCP) and show how it can be constructed using only local information that is automatically derived directly from the circuit. We introduce a form of Production Rules, which can accurately determine the GCP for a given input vector, even for modules which exhibit choice and early termination. The GCP provides valuable insight into the control behavior of the application, which help in formulating new optimizations and re-formulating existing ones to use the GCP knowledge. We have constructed a fully automated framework for GCP detection and analysis, and have incorporated this framework into a high-level synthesis tool-chain. We demonstrate the effectiveness of the GCP framework by re-formulating two traditional CAD optimizations to use the GCP, yielding efficient algorithms which improve circuit power (by up to 9%) and performance (by up to 60%) in our experiments. |
| | @techreport{venkataramani-tr06,
author = {Venkataramani, Girish and Chelcea, Tiberiu and Budiu,
Mihai and Goldstein, Seth Copen},
title = {Modeling the Global Critical Path in Concurrent Systems},
institution = {Carnegie Mellon University},
year = {2006},
number = {CMU-CS-06-144},
month = {August},
abstract = {We show how the global critical path can be used as a
practical tool for understanding, optimizing and summarizing the
behavior of highly concurrent self-timed circuits. Traditionally,
critical path analysis has been applied to DAGs, and thus was
constrained to combinatorial sub-circuits. We formally define the
global critical path (GCP) and show how it can be constructed
using only local information that is automatically derived
directly from the circuit. We introduce a form of Production
Rules, which can accurately determine the GCP for a given input
vector, even for modules which exhibit choice and early
termination. \par The GCP provides valuable insight into the
control behavior of the application, which help in formulating
new optimizations and re-formulating existing ones to use the GCP
knowledge. We have constructed a fully automated framework for
GCP detection and analysis, and have incorporated this framework
into a high-level synthesis tool-chain. We demonstrate the
effectiveness of the GCP framework by re-formulating two
traditional CAD optimizations to use the GCP, yielding efficient
algorithms which improve circuit power (by up to 9\%) and
performance (by up to 60\%) in our experiments.},
keywords = {Asychronous Circuits, Spatial Computing,CAD, Global
Critical Path},
url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-tr06.pdf}
}
|
|
Scalable Shape Sculpting via Hole Motion: Motion Planning in Lattice-Constrained Module Robots | pdf bib abstract | |
Michael De Rosa, Seth Copen Goldstein, Peter Lee, Jason D. Campbell, and Padmanabhan Pillai.
In Proceedings of the 2006 IEEE International Conference on Robotics and Automation (ICRA '06),
May, 2006.
|
| hide Abstract | We describe a novel shape formation algorithm for ensembles of 2-dimensional lattice-arrayed modular robots, based on the manipulation of regularly shaped voids within the lattice (“holes”). The algorithm is massively parallel and fully distributed. Constructing a goal shape requires time propor- tional only to the complexity of the desired target geometry. Construction of the shape by the modules requires no global communication nor broadcast floods after distribution of the target shape. Results in simulation show 97.3% shape compliance in ensembles of approximately 60,000 modules, and we believe that the algorithm will generalize to 3D and scale to handle millions of modules. |
| | @inproceedings{derosa-icra06,
author = {De Rosa, Michael and Goldstein, Seth Copen and Lee, Peter
and Campbell, Jason D. and Pillai, Padmanabhan},
title = {Scalable Shape Sculpting via Hole Motion: Motion Planning
in Lattice-Constrained Module Robots},
month = {May},
booktitle = {Proceedings of the 2006 {IEEE} International Conference
on Robotics and Automation (ICRA '06)},
year = {2006},
keywords = {Claytronics, Programmable Matter, Planning, Modular
Robotics},
url = {http://www.cs.cmu.edu/~seth/papers/derosa-icra06.pdf},
abstract = {We describe a novel shape formation algorithm for
ensembles of 2-dimensional lattice-arrayed modular robots, based
on the manipulation of regularly shaped voids within the lattice
(``holes''). The algorithm is massively parallel and fully
distributed. Constructing a goal shape requires time propor-
tional only to the complexity of the desired target geometry.
Construction of the shape by the modules requires no global
communication nor broadcast floods after distribution of the
target shape. Results in simulation show 97.3\% shape compliance
in ensembles of approximately 60,000 modules, and we believe that
the algorithm will generalize to 3D and scale to handle millions
of modules.}
}
|
|
Ultralight Modular Robotic Building blocks for the Rapid Deployment of Planetary Outposts | pdf bib | |
Mustafa Emre Karagozler, Brian Kirby, W.J. Lee, Eugene Marinelli, T.C. Ng, Michael Weller, and Seth Copen Goldstein.
In Revolutionary Aerospace Systems Concepts Academic Linkage (RASC-AL) Forum 2006,
May, 2006.
|
| @inproceedings{karagozler-rascal06,
title = {Ultralight Modular Robotic Building blocks for the Rapid
Deployment of Planetary Outposts},
booktitle = {Revolutionary Aerospace Systems Concepts Academic
Linkage (RASC-AL) Forum 2006},
author = {Karagozler, Mustafa Emre and Kirby, Brian and Lee, W.J.
and Marinelli, Eugene and Ng, T.C. and Weller, Michael and
Goldstein, Seth Copen},
year = {2006},
month = {May},
address = {Cape Canaveral, FL},
url = {http://www.cs.cmu.edu/~seth/papers/karagozler-rascal06.pdf},
keywords = {Claytronics,Modular Robotics,Robotics}
}
|
|
An Analysis of Graph Coloring Register Allocation | pdf bib abstract | |
David Ryan Koes and Seth Copen Goldstein.
Carnegie Mellon University Technical Report No. CMU-CS-06-111,
pages 10, March, 2006.
|
| hide Abstract | Graph coloring is the de facto standard technique for register allocation within a compiler. In this paper we examine the importance of the quality of the coloring algorithm and various extensions of the basic graph coloring technique by replacing the coloring phase of the GNU compiler’s register allocator with an optimal coloring algorithm. We then extend this optimal algorithm to incorporate various extensions such as coalescing and preferential register assignment. We find that using an optimal coloring algorithm has surprisingly little benefit and empirically demonstrate the benefit of the various extensions. |
| | @techreport{koes-tr06,
author = {Koes, David Ryan and Goldstein, Seth Copen},
title = {An Analysis of Graph Coloring Register Allocation},
institution = {Carnegie Mellon University},
year = {2006},
number = {CMU-CS-06-111},
pages = {10},
month = {March},
url = {http://www.cs.cmu.edu/~seth/papers/koes-tr06.pdf},
abstract = {Graph coloring is the de facto standard technique for
register allocation within a compiler. In this paper we examine
the importance of the quality of the coloring algorithm and
various extensions of the basic graph coloring technique by
replacing the coloring phase of the GNU compiler's register
allocator with an optimal coloring algorithm. We then extend this
optimal algorithm to incorporate various extensions such as
coalescing and preferential register assignment. We find that
using an optimal coloring algorithm has surprisingly little
benefit and empirically demonstrate the benefit of the various
extensions.},
keywords = {Compilers:Register Allocation}
}
|
| 2005 |
Demo Abstract: Claytronics---highly scalable communications, sensing, and actuation networks. | pdf bib | |
Burak Aksak, Preethi Srinivas Bhat, Jason D. Campbell, Michael De Rosa, Stanislav Funiak, Phillip B. Gibbons, Seth Copen Goldstein, Carlos Guestrin, Ashish Gupta, Casey Helfrich, James F. Hoburg, Brian Kirby, James Kuffner, Peter Lee, Todd C. Mowry, Padmanabhan Pillai, Ram Ravichandran, Benjamin D. Rister, Srinivasan Seshan, Metin Sitti, and Haifeng Yu.
In Proceedings of the 3rd international conference on Embedded networked sensor systems (SenSys),
pages 299, 2005.
|
| @inproceedings{aksak-sensys05,
author = {Aksak, Burak and Bhat, Preethi Srinivas and Campbell,
Jason D. and De Rosa, Michael and Funiak, Stanislav and Gibbons,
Phillip B. and Goldstein, Seth Copen and Guestrin, Carlos and
Gupta, Ashish and Helfrich, Casey and Hoburg, James F. and Kirby,
Brian and Kuffner, James and Lee, Peter and Mowry, Todd C. and
Pillai, Padmanabhan and Ravichandran, Ram and Rister, Benjamin D.
and Seshan, Srinivasan and Sitti, Metin and Yu, Haifeng},
title = {Demo Abstract: Claytronics---highly scalable
communications, sensing, and actuation networks.},
booktitle = {Proceedings of the 3rd international conference on
Embedded networked sensor systems (SenSys)},
year = {2005},
pages = {299},
url = {http://www.cs.cmu.edu/~seth/papers/aksak-sensys05.pdf},
doi = {http://doi.acm.org/10.1145/1098918.1098964},
keywords = {Claytronics, Programmable Matter}
}
|
|
The impact of the nanoscale on computing systems | pdf bib | |
Seth Copen Goldstein.
In IEEE/ACM International Conference on Computer-Aided Design, 2005 (ICCAD 2005),
pages 655–661, November, 2005.
|
| @inproceedings{goldstein-iccad05,
title = {The impact of the nanoscale on computing systems},
url = {http://www.cs.cmu.edu/~seth/papers/goldstein-iccad05.pdf},
booktitle = {IEEE/ACM International Conference on Computer-Aided
Design, 2005 (ICCAD 2005)},
author = {Goldstein, Seth Copen},
year = {2005},
pages = {655-661},
address = {San Jose, CA},
month = {November},
keywords = {Electronic Nanotechnology,molecular electronics}
}
|
|
The Ensemble Principle | pdf bib | |
Seth Copen Goldstein, Todd C. Mowry, Jason D. Campbell, Peter Lee, Padmanabhan Pillai, James F. Hoburg, Phillip B. Gibbons, Carlos Guestrin, James Kuffner, Brian Kirby, Benjamin D. Rister, Michael De Rosa, Stanislav Funiak, Burak Aksak, and Rahul Sukthankar.
In 13th Foresight Conference of Advanced Nanotechnogy,
October, 2005.
|
| @inproceedings{goldstein05,
author = {Goldstein, Seth Copen and Mowry, Todd C. and Campbell,
Jason D. and Lee, Peter and Pillai, Padmanabhan and Hoburg, James
F. and Gibbons, Phillip B. and Guestrin, Carlos and Kuffner,
James and Kirby, Brian and Rister, Benjamin D. and De Rosa,
Michael and Funiak, Stanislav and Aksak, Burak and Sukthankar,
Rahul},
title = {The Ensemble Principle},
booktitle = {13th Foresight Conference of Advanced Nanotechnogy},
url = {http://www.cs.cmu.edu/~seth/papers/goldstein05.pdf},
year = {2005},
month = {October},
address = {San Francisco, CA},
keywords = {Claytronics, Robotics}
}
|
|
SOMA: A Tool for Synthesizing and Optimizing Memory Accesses in ASICs | pdf bib abstract | |
Girish Venkataramani, Tobias Bjerregaard, Tiberiu Chelcea, and Seth Copen Goldstein.
In IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS),
pages 231–236, September, 2005.
|
| hide Abstract | Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synthesis of large-scale ASIC systems in high-level synthesis. This paper presents SOMA, a synthesis framework for constructing Memory Access Network (MAN) architectures that inherently enforce memory consistency in the presence of dynamic memory access dependencies. A fundamental bottleneck in any such network is arbitrating between concurrent accesses to a shared memory resource. To alleviate this bottleneck, SOMA uses an application-specific concurrency analysis technique to predict the dynamic memory parallelism profile of the application. This is then used to customize the MAN architecture. Depending on the parallelism profile, the MAN may be optimized for latency, throughput or both. The optimized MAN is automatically synthesized into gate-level structural Verilog using a flexible library of network building blocks. SOMA has been successfully integrated into an automated C-to-hardware synthesis flow, which generates standard cell circuits from unrestricted ANSI-C programs. Post-layout experiments demonstrate that application specific MAN construction significantly improves power and performance. |
| | @inproceedings{venkataramani-isss05,
title = {SOMA: A Tool for Synthesizing and Optimizing Memory
Accesses in ASICs},
author = {Venkataramani, Girish and Bjerregaard, Tobias and Chelcea,
Tiberiu and Goldstein, Seth Copen},
booktitle = {IEEE/ACM/IFIP International Conference on
Hardware/Software Codesign and System Synthesis (CODES-ISSS)},
year = {2005},
isbn = {1-59593-161-9},
pages = {231-236},
address = {Jersey City, NJ, USA},
month = {September},
abstract = {Arbitrary memory dependencies and variable latency
memory systems are major obstacles to the synthesis of
large-scale ASIC systems in high-level synthesis. This paper
presents SOMA, a synthesis framework for constructing Memory
Access Network (MAN) architectures that inherently enforce memory
consistency in the presence of dynamic memory access
dependencies. A fundamental bottleneck in any such network is
arbitrating between concurrent accesses to a shared memory
resource. To alleviate this bottleneck, SOMA uses an
application-specific concurrency analysis technique to predict
the dynamic memory parallelism profile of the application. This
is then used to customize the MAN architecture. Depending on the
parallelism profile, the MAN may be optimized for latency,
throughput or both. The optimized MAN is automatically
synthesized into gate-level structural Verilog using a flexible
library of network building blocks. SOMA has been successfully
integrated into an automated C-to-hardware synthesis flow, which
generates standard cell circuits from unrestricted ANSI-C
programs. Post-layout experiments demonstrate that application
specific MAN construction significantly improves power and
performance.},
keywords = {Asychronous Circuits, Spatial Computing,Phoenix,
CAD,Compilers:Memory Optimizations},
url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-isss05.pdf}
}
|
|
The Robot is the Tether: Active, Adaptive Power Routing for Modular Robots With Unary Inter-robot Connectors | pdf bib | |
Jason D. Campbell, Padmanabhan Pillai, and Seth Copen Goldstein.
In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005),
pages 4108–15, August, 2005.
|
| @inproceedings{campbell05,
author = {Campbell, Jason D. and Pillai, Padmanabhan and Goldstein,
Seth Copen},
title = {The Robot is the Tether: Active, Adaptive Power Routing for
Modular Robots With Unary Inter-robot Connectors},
booktitle = {IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS 2005)},
pages = {4108--15},
year = {2005},
address = {Edmonton, Alberta Canada},
month = {August},
keywords = {Claytronics, Robotics},
url = {http://www.cs.cmu.edu/~seth/papers/campbell05.pdf}
}
|
|
Catoms: Moving Robots Without Moving Parts | pdf bib | |
Brian Kirby, Jason D. Campbell, Burak Aksak, Padmanabhan Pillai, James F. Hoburg, Todd C. Mowry, and Seth Copen Goldstein.
In AAAI (Robot Exhibition),
pages 1730–1, July, 2005.
|
| @inproceedings{kirby-aaai05,
author = {Kirby, Brian and Campbell, Jason D. and Aksak, Burak and
Pillai, Padmanabhan and Hoburg, James F. and Mowry, Todd C. and
Goldstein, Seth Copen},
title = {Catoms: Moving Robots Without Moving Parts},
url = {http://www.cs.cmu.edu/~seth/papers/kirby-aaai05.pdf},
booktitle = {AAAI (Robot Exhibition)},
pages = {1730--1},
year = {2005},
month = {July},
address = {Pittsburgh, PA},
keywords = {Claytronics, Robotics}
}
|
|
HLS Support for Unconstrained Memory Accesses | pdf bib abstract | |
Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein.
In IEEE 14th International Workshop on Logic Synthesis (IWLS),
June, 2005.
|
| hide Abstract | A major obstacle in high-level synthesis (HLS) of large-scale ASIC systems is memory access patterns. Typically, most state-of-the-art HLS tools impose constraints on the memory references in the source application, requiring them to exhibit predictable access patterns, and/or requiring dependencies between them to be statically determinable. This paper addresses the HLS problem when such constraints are relaxed. We present an analysis infrastructure that can be used within any HLS toolflow for synthesizing circuits from high-level abstractions, such as ANSI-C, where no assumptions can be made about memory access latencies, and where dependencies between memory references can only be disambiguated dynamically at runtime (pointer aliasing). We start by describing a generic framework to build a dependence-aware, fully distributed, although often conservative, memory-access network (MAN) for a given memory-dependence graph. Then, we propose a suite of optimizations to customize the MAN for the given specification. All these techniques guarantee memory coherency. Experimental results on Mediabench benchmarks, show that such an approach succeeds in maintaining high levels of parallelism, while ensuring memory coherency. The optimizations succeed in lowering the synchronization overhead by as much as 4x. |
| | @inproceedings{venkataramani-iwls05,
title = {{HLS} Support for Unconstrained Memory Accesses},
author = {Venkataramani, Girish and Chelcea, Tiberiu and Goldstein,
Seth Copen},
booktitle = {IEEE 14th International Workshop on Logic Synthesis
(IWLS)},
year = {2005},
address = {Lake Arrowhead, CA},
month = {June},
abstract = {A major obstacle in high-level synthesis (HLS) of
large-scale ASIC systems is memory access patterns. Typically,
most state-of-the-art HLS tools impose constraints on the memory
references in the source application, requiring them to exhibit
predictable access patterns, and/or requiring dependencies
between them to be statically determinable. This paper addresses
the HLS problem when such constraints are relaxed. We present an
analysis infrastructure that can be used within any HLS toolflow
for synthesizing circuits from high-level abstractions, such as
ANSI-C, where no assumptions can be made about memory access
latencies, and where dependencies between memory references can
only be disambiguated dynamically at runtime (pointer aliasing).
We start by describing a generic framework to build a
dependence-aware, fully distributed, although often conservative,
memory-access network (MAN) for a given memory-dependence graph.
Then, we propose a suite of optimizations to customize the MAN
for the given specification. All these techniques guarantee
memory coherency. Experimental results on Mediabench benchmarks,
show that such an approach succeeds in maintaining high levels of
parallelism, while ensuring memory coherency. The optimizations
succeed in lowering the synchronization overhead by as much as
4x.},
keywords = {Asychronous Circuits, Spatial Computing,Phoenix},
url = {http://www.cs.cmu.edu/~seth/papers/venkataramani-iwls05.pdf}
}
|
|
Programmable Matter | pdf bib | |
Seth Copen Goldstein, Jason D. Campbell, and Todd C. Mowry.
IEEE Computer,
38(6):99–101,June, 2005.
|
| @article{goldstein-computer05,
author = {Goldstein, Seth Copen and Campbell, Jason D. and Mowry,
Todd C.},
title = {Programmable Matter},
journal = {IEEE Computer},
volume = {38},
number = {6},
pages = {99--101},
year = {2005},
month = {June},
keywords = {Claytronics, Programmable Matter},
url = {http://www.cs.cmu.edu/~seth/papers/goldstein-computer05.pdf}
}
|
|
Adding Faster with Application Specific Early Termination | pdf bib abstract | |
David Ryan Koes, Tiberiu Chelcea, Charles Onyeama, and Seth Copen Goldstein.
Carnegie Mellon University Technical Report No. CMU-CS-05-101,
pages 20, May, 2005.
|
| hide Abstract | This paper presents a methodology for improving the speed of high-speed adders. As a starting point, a previously proposed method, called speculative completion, is used in which fast- terminating additions are automatically detected. Unlike the previous design, the method proposed in this paper is able to adapt dynamically to (1) application-specific behavior and (2) to adder- specific behavior, resulting in a higher detection rate of fast additions and, consequently, a faster average-case speed for addition. Our experimental results show detection rates of over 99%, and adder average-case speed improvements of up to 14.%. |
| | @techreport{koes-tr05,
author = {Koes, David Ryan and Chelcea, Tiberiu and Onyeama, Charles
and Goldstein, Seth Copen},
title = {Adding Faster with Application Specific Early Termination},
institution = {Carnegie Mellon University},
year = {2005},
number = {CMU-CS-05-101},
pages = {20},
month = {May},
url = {http://www.cs.cmu.edu/~seth/papers/koes-tr05.pdf},
abstract = {This paper presents a methodology for improving the
speed of high-speed adders. As a starting point, a previously
proposed method, called speculative completion, is used in which
fast- terminating additions are automatically detected. Unlike
the previous design, the method proposed in this paper is able to
adapt dynamically to (1) application-specific behavior and (2) to
adder- specific behavior, resulting in a higher detection rate of
fast additions and, consequently, a faster average-case speed for
addition. Our experimental results show detection rates of over
99\%, and adder average-case speed improvements of up to 14.\%.},
keywords = {Asychronous Circuits}
}
|
|
Why area might reduce power in nanoscale CMOS | pdf bib abstract | |
Paul Beckett and Seth Copen Goldstein.
In IEEE International Symposium on Circuits and Systems, 2005, (ISCAS 2005),
volume 3, pages 2329–2332,May, 2005.
|
| hide Abstract | In this paper we explore the relationship between power and area. By exploiting parallelism (and thus using more area) one can reduce the switching frequency allowing a reduction in VDD which results in a reduction in power. Under a scaling regime which allows threshold voltage to increase as VDD decreases we find that dynamic and subthreshold power loss in CMOS exhibit a dependence on area proportional to A^((\sigma^-3)/\sigma) while gate leakage power proportional to A^((\sigma^-6)/\sigma) and short circuit power A^((\sigma^-6)/\sigma). Thus, with the large number of devices at our disposal we can exploit techniques such as spatial computing--tailoring the program directly to the hardware--to overcome the negative effects of scaling. The value of s describes the effectiveness of the technique for a particular circuit and/or algorithm--for circuits that exhibit a value of \sigma <= 3, power will be a constant or reducing function of area. We briefly speculate on how \sigma might be influenced by a move to nanoscale technology. |
| | @inproceedings{beckett-iscas05,
title = {Why area might reduce power in nanoscale CMOS},
url = {http://www.cs.cmu.edu/~seth/papers/beckett-iscas05.pdf},
booktitle = {IEEE International Symposium on Circuits and Systems,
2005, (ISCAS 2005)},
author = {Beckett, Paul and Goldstein, Seth Copen},
year = {2005},
pages = {2329-2332},
volume = {3},
month = {May},
address = {Kobe, Japan},
abstract = {In this paper we explore the relationship between power
and area. By exploiting parallelism (and thus using more area)
one can reduce the switching frequency allowing a reduction in
VDD which results in a reduction in power. Under a scaling regime
which allows threshold voltage to increase as VDD decreases we
find that dynamic and subthreshold power loss in CMOS exhibit a
dependence on area proportional to A^((\sigma^-3)/\sigma) while
gate leakage power proportional to A^((\sigma^-6)/\sigma) and
short circuit power A^((\sigma^-6)/\sigma). Thus, with the large
number of devices at our disposal we can exploit techniques such
as spatial computing--tailoring the program directly to the
hardware--to overcome the negative effects of scaling. The value
of s describes the effectiveness of the technique for a
particular circuit and/or algorithm--for circuits that exhibit a
value of \sigma <= 3, power will be a constant or reducing
function of area. We briefly speculate on how \sigma might be
influenced by a move to nanoscale technology.},
keywords = {Electronic Nanotechnology,Power,Energy}
}
|
|
2029 The 3-D Fax Machine Brings Back the House Call | pdf bib | |
Seth Copen Goldstein.
In Headline from the Future, Popular Science Magazine,
pages 34, March, 2005.
|
| @misc{goldstein-popsci05,
title = {2029 The 3-D Fax Machine Brings Back the House Call},
howpublished = {Headline from the Future, Popular Science Magazine},
author = {Goldstein, Seth Copen},
year = {2005},
url = {http://www.cs.cmu.edu/~seth/papers/goldstein-popsci05.pdf},
month = {March},
pages = {34},
keywords = {Claytronics}
}
|
|
A Progressive Register Allocator for Irregular Architectures | pdf bib abstract | |
David Ryan Koes and Seth Copen Goldstein.
In Proceedings of the International Symposium on Code Generation and Optimization (CGO'05),
pages 269–280, March, 2005.
|
| hide Abstract | Register allocation is one of the most important optimizations a compiler performs. Conventional graph-coloring based register allocators are fast and do well on regular, RISC-like, architectures, but perform poorly on irregular, CISC-like, architectures with few registers and non-orthogonal instruction sets. At the other extreme, optimal register allocators based on integer linear programming are capable of fully modeling and exploiting the peculiarities of irregular architectures but do not scale well. We introduce the idea of a progressive allocator which finds an initial allocation of quality comparable to a conventional allocator, but as more time is allowed for computation the quality of the allocation approaches optimal. This paper presents a progressive register allocator which uses a multi-commodity network flow model to elegantly represent the intricacies of irregular architectures. We evaluate our allocator substituted for gcc’s local register allocation pass. |
| | @inproceedings{koes-cgo05,
author = {Koes, David Ryan and Goldstein, Seth Copen},
title = {A Progressive Register Allocator for Irregular
Architectures},
booktitle = {Proceedings of the International Symposium on Code
Generation and Optimization (CGO'05)},
month = {March},
year = {2005},
isbn = {0-7695-2298-X},
pages = {269--280},
doi = {http://dx.doi.org/10.1109/CGO.2005.4},
publisher = {IEEE Computer Society},
address = {Washington, DC},
abstract = {Register allocation is one of the most important
optimizations a compiler performs. Conventional graph-coloring
based register allocators are fast and do well on regular,
RISC-like, architectures, but perform poorly on irregular,
CISC-like, architectures with few registers and non-orthogonal
instruction sets. At the other extreme, optimal register
allocators based on integer linear programming are capable of
fully modeling and exploiting the peculiarities of irregular
architectures but do not scale well. We introduce the idea of a
\textit{progressive allocator} which finds an initial allocation
of quality comparable to a conventional allocator, but as more
time is allowed for computation the quality of the allocation
approaches optimal. This paper presents a progressive register
allocator which uses a multi-commodity network flow model to
elegantly represent the intricacies of irregular architectures.
We evaluate our allocator substituted for {\tt gcc}'s local
register allocation pass.},
keywords = {Compilers:Register Allocation},
url = {http://www.cs.cmu.edu/~seth/papers/koes-cgo05.pdf}
}
|
|
Dataflow: A Complement to Superscalar | pdf bib abstract | |
Mihai Budiu, Pedro V. Artigas, and Seth Copen Goldstein.
In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),
pages 177–186, March, 2005.
|
| hide Abstract | There has been a resurgence of interest in dataflow architectures, because of their potential for exploiting parallelism with low overhead. In this paper we analyze the performance of a class of static dataflow machines on integer media and control-intensive programs and we explain why a dataflow machine, even with unlimited resources, does not always outperform a superscalar processor on general-purpose codes, under the assumption that both machines take the same time to execute basic operations. We compare a program-specific dataflow machine with unlimited parallelism to a superscalar processor running the same program. While the dataflow machines provide very good performance on most data-parallel programs, we show that the dataflow machine cannot always take advantage of the available parallelism. Using the dynamic critical path we investigate the mechanisms used by superscalar processors to provide a performance advantage and their impact on a dataflow model. |
| | @inproceedings{budiu-ispass05,
author = {Budiu, Mihai and Artigas, Pedro V. and Goldstein, Seth
Copen},
title = {Dataflow: A Complement to Superscalar},
booktitle = {IEEE International Symposium on Performance Analysis of
Systems and Software (ISPASS)},
month = {March},
year = {2005},
pages = {177--186},
address = {Austin, TX},
url = {http://www.cs.cmu.edu/~seth/papers/budiu-ispass05.pdf},
abstract = {There has been a resurgence of interest in dataflow
architectures, because of their potential for exploiting
parallelism with low overhead. In this paper we analyze the
performance of a class of static dataflow machines on integer
media and control-intensive programs and we explain why a
dataflow machine, even with unlimited resources, does not always
outperform a superscalar processor on general-purpose codes,
under the assumption that both machines take the same time to
execute basic operations. We compare a program-specific dataflow
machine with unlimited parallelism to a superscalar processor
running the same program. While the dataflow machines provide
very good performance on most data-parallel programs, we show
that the dataflow machine cannot always take advantage of the
available parallelism. Using the dynamic critical path we
investigate the mechanisms used by superscalar processors to
provide a performance advantage and their impact on a dataflow
model.},
confweb = {http://www.ispass.org/ispass2005},
keywords = {Spatial Computing,Phoenix}
}
|
|
Inter-iteration Scalar Replacement in the Presence of Conditional Control Flow | pdf bib | |
Mihai Budiu and Seth Copen Goldstein.
In 3rd Workshop on Optimizations for DSO and Embedded Systems,
March, 2005.
Also appeared as CMU CS Technical Report, CMU-CS-04-103.
|
| @inproceedings{budiu-odes05,
title = {Inter-iteration Scalar Replacement in the Presence of
Conditional Control Flow},
url = {http://www.cs.cmu.edu/~seth/papers/budiu-odes05.pdf},
booktitle = {3rd Workshop on Optimizations for DSO and Embedded
Systems},
author = {Budiu, Mihai and Goldstein, Seth Copen},
year = {2005},
address = {San Jose, CA},
month = {March},
also = {CMU CS Technical Report, CMU-CS-04-103},
keywords = {Phoenix,Compilers:Loop Optimizations,Compilers:Scalar
Replacement}
}
|
|
Nonphotolithographic Nanoscale Memory Density Prospects | pdf bib abstract | |
Andre DeHon, Seth Copen Goldstein, Phil Kuekes, and Patrick Lincoln.
IEEE Transactions on Nanotechnology,
volume 4, pages 215–228,March, 2005.
|
| hide Abstract | Technologies are now emerging to construct molecular-scale electronic wires and switches using bottom-up self-assembly. This opens the possibility of constructing nanoscale circuits and memories where active devices are just a few nanometers square and wire pitches may be on the order of ten nanometers. The features can be defined at this scale without using photolithography. The available assembly techniques have relatively high defect rates compared to conventional lithographic integrated circuits and can only produce very regular structures. Nonetheless, with proper memory organization, it is reasonable to expect these technologies to provide memory densities in excess of 10/sup 11/ b/cm/sup 2/ with modest active power requirements under 0.6 W/Tb/s for random read operations. |
| | @article{lincoln-tnano05,
title = {Nonphotolithographic Nanoscale Memory Density Prospects},
abstract = {Technologies are now emerging to construct
molecular-scale electronic wires and switches using bottom-up
self-assembly. This opens the possibility of constructing
nanoscale circuits and memories where active devices are just a
few nanometers square and wire pitches may be on the order of ten
nanometers. The features can be defined at this scale without
using photolithography. The available assembly techniques have
relatively high defect rates compared to conventional
lithographic integrated circuits and can only produce very
regular structures. Nonetheless, with proper memory organization,
it is reasonable to expect these technologies to provide memory
densities in excess of 10/sup 11/ b/cm/sup 2/ with modest active
power requirements under 0.6 W/Tb/s for random read operations.},
url = {http://www.cs.cmu.edu/~seth/papers/lincoln-tnano05.pdf},
journal = {IEEE Transactions on Nanotechnology},
author = {DeHon, Andre and Goldstein, Seth Copen and Kuekes, Phil
and Lincoln, Patrick},
year = {2005},
month = {March},
volume = {4},
issue = {2},
pages = {215-228},
keywords = {Fault and Defect Tolerance, electronic nanotechnology,
memory density, memory organization, molecular electronics},
doi = {10.1109/TNANO.2004.837849}
}
|
| 2004 |
Defect Tolerance at the End of the Roadmap | bib | |
Mahim Mishra and Seth Copen Goldstein.
In Nano, Quantum and Molecular Computing: Implications to High Level Design and Validation,
2004.
|
| @incollection{mishra-nqmc04,
title = {Defect Tolerance at the End of the Roadmap},
booktitle = {Nano, Quantum and Molecular Computing: Implications to
High Level Design and Validation},
author = {Mishra, Mahim and Goldstein, Seth Copen},
year = {2004},
editor = {Sandeep K. Shukla and R. Iris Bahar},
publisher = {Kluwer Academic Publishers},
isbn = {1-4020-80670},
keywords = {Electronic Nanotechnology,Fault and Defect
Tolerance,Reconfigurable Computing,Phoenix,molecular electronics}
}
|
|
Claytronics: A scalable basis for future robots | pdf bib | |
Seth Copen Goldstein and Todd C. Mowry.
In RoboSphere 2004,
November, 2004.
|
| @inproceedings{goldstein-robosphere04,
author = {Goldstein, Seth Copen and Mowry, Todd C.},
title = {Claytronics: A scalable basis for future robots},
booktitle = {RoboSphere 2004},
address = {Moffett Field, CA},
month = {November},
year = {2004},
keywords = {Claytronics, Robotics},
url = {http://www.cs.cmu.edu/~seth/papers/goldstein-robosphere04.pdf}
}
|
|
Claytronics: An Instance of Programmable Matter | pdf bib abstract | |
Seth Copen Goldstein and Todd C. Mowry.
In Wild and Crazy Ideas Session of ASPLOS,
October, 2004.
|
| hide Abstract | Programmable matter refers to a technology that will allow one to control and manipulate three-dimensional physical artifacts (similar to how we already control and manipulate two-dimensional images with computer graphics). In other words, programmable matter will allow us to take a (big) step beyond virtual reality, to synthetic reality, an environment in which all the objects in a user’s environment (including the ones inserted by the computer) are physically realized. Note that the idea is not to transport objects nor is it to recreate an objects chemical composition, but rather to create a physical artifact that will mimic the shape, movement, visual appearance, sound, and tactile qualities of the original object. |
| | @inproceedings{goldstein-waci04,
author = {Goldstein, Seth Copen and Mowry, Todd C.},
title = {Claytronics: An Instance of Programmable Matter},
booktitle = {Wild and Crazy Ideas Session of ASPLOS},
year = {2004},
month = {October},
address = {Boston, MA},
keywords = {Claytronics},
url = {http://www.cs.cmu.edu/~seth/papers/goldstein-waci04.pdf},
abstract = {Programmable matter refers to a technology that will
allow one to control and manipulate three-dimensional physical
artifacts (similar to how we already control and manipulate
two-dimensional images with computer graphics). In other words,
programmable matter will allow us to take a (big) step beyond
virtual reality, to synthetic reality, an environment in which
all the objects in a user's environment (including the ones
inserted by the computer) are physically realized. Note that the
idea is not to transport objects nor is it to recreate an objects
chemical composition, but rather to create a physical artifact
that will mimic the shape, movement, visual appearance, sound,
and tactile qualities of the original object.}
}
|
|
Spatial Computation | pdf bib abstract | |
Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, and Seth Copen Goldstein.
In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS),
pages 14–26, October, 2004.
|
| hide Abstract | This paper describes a computer architecture that relies on the direct translation of high-level language programs into Spatial Computation (SC) hardware structures. SC program implementations are completely distributed, without any centralized control. SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation SC structures called ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient. In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs, (2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors and (3) that ASIC implementations of ASH use 3 orders of magnitude less energy compared to high-end superscalar processors, while being within a factor of two in performance. |
| | @inproceedings{budiu-asplos04,
author = {Budiu, Mihai and Venkataramani, Girish and Chelcea,
Tiberiu and Goldstein, Seth Copen},
title = {Spatial Computation},
booktitle = {International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS)},
pages = {14--26},
month = {October},
address = {Boston, MA},
year = {2004},
url = {http://www.cs.cmu.edu/~seth/papers/budiu-asplos04.pdf},
abstract = {This paper describes a computer architecture that relies
on the direct translation of high-level language | |