Compiling path expressions into VLSI circuits

Path expressions were originally proposed by Campbell and Habermann [1] as a mechanism for process synchronization at the monitor level in software. Not unexpectedly, they also provide a useful notation for specifying the behavior of asynchronous circuits. Motivated by this potential application we investigate how to directly translate path expressions into hardware. Our implementation is complicated in the case of multiple path expressions by the need for synchronization on event names that are common to more than one path. Moreover, since events are inherently asynchronous in our model, all of our circuits must be self-timed. Nevertheless, the circuits produced by our construction have area proportional to N log(N) where N is the total length of the multiple path expression under consideration. This bound holds regardless of the number of individual paths or the degree of synchronization between paths.


Introduction
As the boundary brtwccn software and hardware grows less and less distinct. it bccomcs increasingly important to invcstigatc methods of diicctly implementing various programming langungc features in ha&are. Since many of the problems in interfacing hardware devices involve some form of process synchronization. language fcaturcs for synchrcmiration dcscrvc considcrablc attention in such investigations.
In this paper WC consider tbc problem of directly implcmcnting path Permission to copy without fee all or part of this material is granted provided that the copies arc not made or distributed for direct commcnzial advantage, the ACM copyright notice and the title of the publication and its date appear. and notice is given that copying is by permission of the Association for Computing Machinery. To copy othewir.
or to republish, requires a fee and/or specific permission. Which brings us to the topic of this paper: What is the best way to translate path cxprcssions into circuits? Iauer and Campbell have shown how to compile path expressions into Petri nets [cj], and Patil has shown how to implcmcnt Petri nets as circuits by using a PLA-like dcvicc called an asynchronous logic array 1111. Thus. an obvious method for compiling path expressions into circuits would bc to first translate the path cxprcssion into a Petri net and then to implcmcnt the Petri net as a circuit using an asynchrouous logic array. However, careful examination of Iaucr and Campbell's schcmc shows that a multiple path cxprccsion consisting of M paths each of Icngth K can result in a Petri net with K" places. Thus, the naive approach will in gcncral be infeasible if the number of individual p&s in a multiple path expression is large.
For the cast of a path expression with a single path their schcmc dots rcsulf in Petri net whtch is comparabic in size to tltc path expression.
However, direct implcmcnmtion of such a ncr using Path's ideas may still result in a circuit wi!h an unacceptably large area. An asynchronous logic array for a Petri net with P places and '1' transitions wi!l have ~1"s proportional to P.'T' rcgardlcss of the number of arcs in the net. Since the nets obtained from path cxprcssions tend to have sparse cdgc SCLF, this quilcir;ltic bch&or stay w:lFtC significant chip arca.
Pcthaps, the work that is cluscst to ours is due to I .

The Semantics of Path Expressions
In this section WC give a simple but formal semantics for path expressions in terms of partially ordered multiscts of events [12]. We also relate our semantics to the one in terms of Petri Nets given by   A simple purh expression is a regular cxprcssion wit!1 an outermost Klccnc star, The only operators pcrmittcd in the regular cxprcssion arc (in order of prcccdcnce) "*", 'I;", and "+'I. The "*" operator is the Klccne star, ":" is the scqucncing operator, and "+" rcprescnts exclusive choice. Operands are cvcnt names from some set of cvcnts X that WC will assume to bc fixed in this paper. The outermost Kleene star is usually rcprcscntcd by the delimiting keyword path . . . end. Thus (a)' would be rcprescnted as path a end.
A multiple path expression is a set of simple path cxprcssionr. As we will see shortly. each additional simple path cxprcssion further constrains the order in which events can occur. However, we cannot simply take as our semantics for multiple path expressions the intersection of the languages corresponding to the individual patb expressions; two events whose order is not explicitly rcstrictcd by one of the simple path expressions may bc concurrent. For example, in the multiple path cxprcssion for the rcadcrs and writers problrm discussed in the introduction the two mad cvcnts R, and R, $rray occuf simultaneously.
Ncvcrthcless. WC will stil! have occasion to use ordinary regular cxprcssions in giving the semantics for path cxprcssions; if R is an ordinary regular cxprcssion over L, rhcn Z, c E will bc the set of symbols of Z that actually appear in II and I,, c Zi will bc regular language which corresponds to R. 1. 'l'hc cxtcrnal world raises nEoQ, to indicate that it would like to proceed with event e.
2. 'I'hc synchronizer raises ACK~ to allow the cxtcrnal world to proceed with event e.
4. The synchronizer lowers ACK~ signifying the end of the cycle and permission to begin a new enc.
In this implementation, on event will occur during the period between cycles 2 and 3 in this protocol, whcrc both REQ and ACK arc high.
Thus, multiple oecurrcnccs of any event e arc non-overlapping in time. Tl~c output of the latch at the end of the c gate for e, which is lab&d CLR~, is conncctcd to each of the NOR gates in front of the arbiter which corresponds to cvcnt e or to some event mutually exclusive to e.
Ihc following is an informal description of how the circuit works.
The circuit bchavcs as shown in the timing diagram in Figure 3-3.
When RFQC is raised. event e is not a!lowcd to proceed unfea each sequcnccr in S/:'Q, signals that at least one e type transition is enabled by negating t)tSe. Once this happens IN, is raised, provided no mutually exclusive event is executing tha second half of its cyc!c (and hence has its CIA high). If the arbiter dccidcs in favor of some other pending cvcnt mllnially cxclusivc to e, the above process rcpcats until e again gets a chance at the arbiter. Othcrwisc I\CIC~ will bc raised and latched 1. 'Ihc delay of the main NOR gate plus the 2-input NOR gate is less than that of the main Mullcr-C clcmcnt plus the SR Flip-Flop.
2. 'fhc maximum variation in delay for Lhc NOR gates in front of the arbiter is less th:m the minimum delay of the arbiter.
WC begin by introducing some notation that will bc nccdcd in t& proof. LCI the scqucnccrs bc dcnotcd by SI~Q, .,. sIQp corresponding to the path cxprcssions RI .,. Rp c hl. and Ict Z R1 . . . Z,, bc the subsets of Z that actually appear in Rl . . . Rp rcspcctivcly. Let 1 bc a set of time intervals, which may include semi-infinite intervals cxtcnding from some finite instant to infmity. Each clcmcnt in 1 is labcllcd by an elcmcnt in 2. Ilcfinc 'r(I) to be the trace which has an elcmcut for each elcmcnt in I and has the obvious partial order defined bctwecn elcmcnts whose time intcnials are non-overlapping. Rcfcrring to With this notation in place we state some propositions, or axioms, that dcscribc the propertics of the circuit of Figure 3-2. These propcrtics will be used to prove that the circuit is safe and live. The propositions that arc not self-cvidcnt will bc justiticd in later sections of this paper.  In general, any trace P will have a corresponding layered trace T which prcscrvcs most of the parallelism of P. It is easy to show that for any trace P.thcre exists a laycrcd uxc 'f, which differs from P only In that the partial order relation of P is a rcstricrion of that of T.  Recognizers are constructed using the following grammar for simple path cxprcssions.
The terminal symbols in the grammar cormspond to primitive cells; there is one type of ccl1 for the "+'I symbol, one for the I**" symbol.   Thcsc arc strictly combinational circuits. The circuit for ";I' feeds the RIS signal from the circuit at its left into the ENB signal for the circuit to irs right. The circuit for 'I+" broadcasts iis ESB signal to its operands and combines the RES signals from its operands in an OR gate. We define L.B and RES to bc correct if they meet the following conditions l E!GB is true for a sub-circuit if each sequcncc of crcnts satisfying the expression for the sub-circuit may bc the next sequcncc lo occur. l RES is true for a sub-circuit if some sequcncc of events satisfying the sub-circuit has just occurred, and E&JR was uuc bcforc the beginning of that sequence.

R[A] -) (R[A])*
Connect R to the operand port of a l cell. WC shall prove the stronger statcmcnt thal al1 ENR signals in the rccognizcr arc cofrcct. This proof is based upon the structure of the rccognizcr. An t!NR signal in a rccognizcr is set by one of four sources:

R[{e)] + event e USC a cell for e as the circuit for R
. The operand port of a "+ " or "+" cell: l The left operand port of a ";I' ~~11; l The right operand port of a ";" cell; . length II that has k types of input events is laid out in this fashion, the arca of the layout is no more than O(rr(log n + k)). This is due to the structure of the rccognizcr circuits. All rccognizcr circuits arc trees, which can bc laid out with all nodes on a lint and cdgcs running pnrallcl to the lint using no more than O(log II) wiring tracks [7). Thus the height of the circuit in Figure 4.7 is O(log n + k) while its width is o(n).

Implementation of the Arbiter
In this section we briefly elaborate on the arbiter shown in Figure   3-2 to show that the conditions of Proposition 6 can bc met. The main function of the arbiter is to select a single cvcnt from a mutually exclusive set of requests. Furthermore, the arbiter must be fair -any rcqucst that remains asserted must cvcntually be selected.
The following observation helps to simplify the arbiter: a pair of cvcnts occurring in any single path expression must bc mutually exclusive. This is due to the role that each event plays in enforcing  When an cnablcd cvcnt is sclcctcd its priority numhcr is rcinitializcd to the lowot value. On the other hand. ifthc cnablcd cvcnt is not sclcctcd its priority numhcr is incrcmcntcd by enc. II seems that since an cvcnt Ai can have at most II-1 neighbors in the conflict graph, and since each time it is blocked at Icast one of its ncighhors is sclcctcd with a resulting incrcmcnt in its own priority. after the I?' attempt Ai must have the highest priority amonS all the neighboring cvcnts and hence must be sclcctcd. Huwcvcr, an cvcnt may ncvcr bc cnablcd cvcn if its rcqucst is sul! pending bccausc scqucncing conditions imposed by the path cxprcssion may block the cvcnt. In order to make this observation concrctc consider the following path cxprcssion: Assume that the external client always rcqucsts pcnnission to perform ail three cvcnts A. I3 and C. Let the priorities of all three be o's initially.
As a result, initially A and L3 arc enabled. Assume that II is selected, making B's priority 0 and A's priority 1. In the next instant, A and B will again be cnablcd. Dut now A has the higher priority and will be sclcctcd. so that A's priority bccomcs 0 and KS becomes 1. Continuing in this fashion, it is easy to see that the scqucncc chosen will bc B A B A BA . . . . The trouble with this schcmc is that C will never bc cnablcd even .if its request is pending. This cxamplc can be extcndcd to the following lemma. Seitz [14] has shown how to build an arbiter for such a structure using an interlack&mcnt, as shown in Figurc S-2.
Circuit operation in Figure 5-2 is most easily visualized startiun with neither clieut rcqucstin& v1 and vz both near 0 volts, and both outputs high. If any single input, say At,,, is lowered then v1 is driven high.
high thrrshold buff,ra 'fhc conflict graph and ttrc &curt for this cxprcssion arc shown in  An arbiter that can configure itself dynamically for the problem with two rcadcn and one writer is shown in Figure 5   Proofz By induction on the number of rising transitions ofm's : 1. (First transition): Let the corresponding cvenf bc e. By proposition 9 initially all TA'S arc low, and all CLR'S are high, hence all TN's arc low initialiy. By proposition 7 all TA's will remain low until the first rising transition of TRc By the same proposition I%~ will not change until the first rising transition of TR . If DIS WC= not IOW, INc would remain low (see . H%ncc by proposition 6, TR, would remain low, a contradiction. 2. (For a succeeding transition): Let the corresponding cvcnt be p and that of the previous transition q. While TRI is high no 'IA or 'i'R other than TA,, or I'R~ can bc high (proposition 6 and icmma 19). Until CIA goes high, 'TR must remain high (see Figure 3-2). Once Cfno goes high. a% I:~, with a e ZRj, will bc low after a short delay (see klgurc . Assuming the variation in this delay for diffcrcnt ds is less than the delay of the arbiter in lowering TRY: all 'I'R~ with D t q will continue to remain low until CI.Rq IS lowcrcd (see Figure 3-2). All .I.A~. with a z q, also continue to remain low (proposition 7). But CI.R remains high at lcast until TA is lowcrcd (see Figure 7). lkncc by ~Jx time TR is raiscdgall TA's will bc low. Also 'I'R could not have bcc:raiscd if IN were low (proposition 6f But if INS was high when TA' was last lowcrcd then IN would n& bc low (see . assuming the main kOR gate plus the 2-input NOR gate have a lcsscr delay than the Mullcr-C clcmcnt plus the SR Flip-Flop. Morcovcr, rxSp cannot change bcforc TRp is raised (proposition 7). Hcncc DISp must bc low when .rRp is raised.
Cl knma 21: For any scqucnccr Swj , TRc is lowcrcd only if TAI is high.
Prook The NOR gate arrangement in front of the arbiter insures that once TRY is high it remains high until CLRc is raised, and this can occur only ifTAt is high (see Figure 3 'XRj' Hcncc for each Rj E M, such that el E XR,, T(Seq(j)) can be cxtcndcd by ii to give a prefix of some sequence in LR,. lhua by proposition 8, the corresponding sequencers SEQr With ef E zru will have MS, low. This applies to any el in the next subset of T.
Thcrcforc at the beginning of any cycle, when REQ, for any event ef in the next subset of T is raised, all DIScl inputs to the NOR gate fur cvcnt el (see Figure 3-2). will bc low. Also within a finite amount of time all rclcvant .rAe,'s must go low by proposition 8, since the corresponding TR~,'S arc already low. Hence CLRd will go low. and IN,, will go high for each ef in the next subset of T. It follows from proposition 6 and lemma 22 that all ACK'S corresponding to cvcnts in the next subw of'l' will bc raised within a tinitc amount of time.
The proof for the second half of the cycle is more straightforward.
By lemma 8 once all RI:@S are lowered. within a finite time all rclcvant 'TA's will be raised, causing the corresponding CLR'S to go Iligh. As a result all relevant IN's go low (see figure 3-2) and hence by proposition 6 all ACK'S go low within a finite time, completing the cycle. q