next up previous
Next: An Efficient Representation and Up: Grounding the Lexical Semantics Previous: Model Reconstruction

Event Logic

 

Model reconstruction determines the truth values of the force-dynamic relations on a frame-by-frame basis in the input movie. Intervals of constant truth value for a given force-dynamic relation are taken to be primitive event occurrences. LEONARD uses event logic to infer compound event occurrences from primitive event occurrences. For example, for the image sequence in Figure 1(a), model reconstruction determines that the green block supports the red block from Frame 0 to Frame 13 and that the hand is attached to the red block from Frame 13 to Frame 20. This will be denoted as tex2html_wrap_inline8266 and tex2html_wrap_inline8268 , i.e. that the primitive event types tex2html_wrap_inline8270 and tex2html_wrap_inline8272 occurred during the intervals [0,13) and [13,20) respectively. The compound event type tex2html_wrap_inline8278 might be defined as

displaymath606

i.e.  tex2html_wrap_inline8270 followed by tex2html_wrap_inline8272 . (In the above, I use `;' informally as a sequence operator. The precise definition of `;' will be given momentarily.) The task of the event-logic inference procedure is to infer tex2html_wrap_inline8284 , i.e. that the compound event type tex2html_wrap_inline8278 occurred during the interval [0,20).

Event logic provides a calculus for forming compound event types as expressions over primitive event types. The syntax and semantics of event logic will be described momentarily. Event-logic expressions denote event types, not event occurrences. As such, they do not have truth values. Rather, they are predicates that describe the truth conditions that must hold of an interval for an event to occur. In contrast, an event-occurrence formula does have a truth value. If  tex2html_wrap_inline8290 is an event-logic expression that denotes a primitive or compound event type, and  tex2html_wrap_inline8292 is an interval, then tex2html_wrap_inline8294 is an atomic event-occurrence formula that is true if and only if the truth conditions for the event type  tex2html_wrap_inline8290 hold of the interval  tex2html_wrap_inline8292 .

tex2html_wrap_inline8294 denotes coincidental occurrence, the fact that an occurrence of  tex2html_wrap_inline8290 started at the beginning of  tex2html_wrap_inline8292 and finished at the end of  tex2html_wrap_inline8292 . tex2html_wrap_inline8294 would not hold if an occurrence of  tex2html_wrap_inline8290 did not precisely coincide with  tex2html_wrap_inline8292 , but instead overlapped,gif partially or totally, with  tex2html_wrap_inline8292 . Event types have internal temporal structure that render this distinction important. In the case of primitive event types, that structure is simple. Each primitive event type is derived from a predicate. If  tex2html_wrap_inline8334 is a predicate, then  tex2html_wrap_inline8336 denotes the primitive event type derived from  tex2html_wrap_inline8334 . A primitive event type  tex2html_wrap_inline8336 holds of an interval if the corresponding predicate  tex2html_wrap_inline8334 holds of every instant in that interval.gif This means that tex2html_wrap_inline8354 and tex2html_wrap_inline8356 might have different truth values. For example, if  tex2html_wrap_inline8334 is true of every instant in [0,2) and false of every other instant, then tex2html_wrap_inline8362 is true while tex2html_wrap_inline8364 is false. Event logic takes coincidental occurrence to be a primitive notion. As will be demonstrated below, overlapping occurrence is a derived notion that can be expressed in terms of coincidental occurrence using compound event-logic expressions.

Two auxiliary notions are needed to define the syntax and semantics of event logic. First, there are thirteen possible relations between two intervals. Following [4], I denote these relations as =, <, >, tex2html_wrap_inline8372 , tex2html_wrap_inline8374 , tex2html_wrap_inline8376 , tex2html_wrap_inline8378 , tex2html_wrap_inline8380 , tex2html_wrap_inline8382 , tex2html_wrap_inline8384 , tex2html_wrap_inline8386 , tex2html_wrap_inline8388 , and tex2html_wrap_inline8390 and refer to them collectively as Allen relations throughout the paper. The names tex2html_wrap_inline8372 , tex2html_wrap_inline8376 , tex2html_wrap_inline8380 , tex2html_wrap_inline8384 , and tex2html_wrap_inline8388 are mnemonics for meet, overlap, start, finish, and during respectively. The inverse relations, such as tex2html_wrap_inline8374 , whose names end in tex2html_wrap_inline8404 are the same as the corresponding relations, such as tex2html_wrap_inline8372 , whose names do not end in tex2html_wrap_inline8404 except that the arguments are reversed. Figure 8 depicts all thirteen Allen relations graphically. Second, I define the span of two intervals  tex2html_wrap_inline8292 and  tex2html_wrap_inline8412 , denoted tex2html_wrap_inline8414 , as the smallest super-interval of both  tex2html_wrap_inline8292 and  tex2html_wrap_inline8412 .

   figure643
Figure 8: The Allen relations, the thirteen possible relations between two intervals.

The syntax of event logic is defined as follows. We are given finite disjoint sets of constant symbols along with a finite set of primitive event-type symbols, each of a specified arity. Constant symbols, such as tex2html_wrap_inline8420 and tex2html_wrap_inline8422 , denote objects in the input movie while primitive event-type symbols, such as tex2html_wrap_inline8424 , denote parameterized primitive event types. An atomic event-logic expression is a primitive event-type symbol of arity n applied to a sequence of n constants. For example, tex2html_wrap_inline8430 . An event-logic expression is either an atomic event-logic expression or one of the compound event-logic expressions tex2html_wrap_inline8432 , tex2html_wrap_inline8434 , tex2html_wrap_inline8436 , or tex2html_wrap_inline8438 , where  tex2html_wrap_inline8290 and  tex2html_wrap_inline8442 are event-logic expressions and tex2html_wrap_inline8444 .

Informally, the semantics of compound event-logic expressions is defined as follows:

Formally, the truth of an atomic event-occurrence formula tex2html_wrap_inline8294 is defined relative to a model. Let I be the set of all intervals and O be the set of all objects in the movie. A model M is a map from primitive event-type symbols of arity n to subsets of . (M can be viewed as either a model or as a movie.) M thus maps primitive event-type symbols to relations that take an interval as their first argument, in addition to the remaining object parameters. The semantics of event logic is formally defined by specifying an entailment relation tex2html_wrap_inline8594 as follows:

Figure 9 shows the primitive event types currently used by LEONARD. The definitions in Figure 9 are formulated in terms of the predicates tex2html_wrap_inline8252 , tex2html_wrap_inline8254 , tex2html_wrap_inline8256 , and tex2html_wrap_inline8258 that are produced by model reconstruction as described in Section 2. An object is supported if it is not grounded. Two objects contact if their polygons touch and they are on the same layer. Two polygons p and q touch, denoted  tex2html_wrap_inline8648 , if they intersect and their intersection has zero area. Two objects are attached if there is a rigid or revolute joint that joins them. Determining whether an object x supports an object y is a little more complex and requires counterfactual reasoning. Let  tex2html_wrap_inline8654 be the set of polygons in a scene, tex2html_wrap_inline8656 be the most-preferred model of the scene as produced by model reconstruction, and tex2html_wrap_inline8658 be true if the scene  tex2html_wrap_inline8654 is stable under an interpretation I. Stability analysis, i.e. the tex2html_wrap_inline8664 predicate, is a subroutine used by the model-reconstruction component. An object x supports an object y if y is not grounded in the most-preferred model I and a variant of  tex2html_wrap_inline8654 with x removed is not stable under a variant of I where all objects except for y and those rigidly attached, directly or indirectly, to y are grounded. In Figure 9, tex2html_wrap_inline8684 denotes the variant of  tex2html_wrap_inline8654 with x removed, tex2html_wrap_inline8690 denotes the fact that x is rigidly attached to y, tex2html_wrap_inline8696 denotes the reflexive transitive closure of the tex2html_wrap_inline8698 relation, tex2html_wrap_inline8700 denotes the set of objects that are rigidly attached, directly or indirectly, to y, and tex2html_wrap_inline8704 denotes a variant of I where all objects except for y and those rigidly attached, directly or indirectly, to y are grounded.

   figure694
Figure 9: Definition of the primitive event types used by LEONARD.

Figure 10 shows the compound event-type definitions currently used by LEONARD. tex2html_wrap_inline8092 denotes an event type where x picks y up off of z. It is specified as a sequence of three intervals, where x is not attached to and does not support y in the first interval but is attached to and does support y in the third interval. Additionally, z supports y in the first interval but does not support y in the third interval. Furthermore, several conditions must hold in both the first and third intervals: x must be unsupported, y must not support either x or z, x and z must not support each other, and y must not be attached to z. During the second interval, intermediate between the first and third intervals, either x is attached to y or y is attached to z.gif Additionally, several conditions must hold throughout the entire event: x, y, and z must be distinct and y must be supported. tex2html_wrap_inline8094 denotes an event type where x puts y down on z. It is specified in a fashion that is similar to tex2html_wrap_inline8092 but where the three subevents occur in reverse order. tex2html_wrap_inline8096 denotes an event type where w puts x down on y which is resting on z. It is specified as tex2html_wrap_inline8800 , where z supports but is not attached to y and z is distinct from w, x, and y. tex2html_wrap_inline8098 denotes an event type where w picks x up off of y which is resting on z. It is specified as tex2html_wrap_inline8824 , where z supports but is not attached to y and z is distinct from w, x, and y. tex2html_wrap_inline8100 denotes an event type where w picks x up off of y and puts it down on z which is distinct from y. tex2html_wrap_inline8102 denotes an event type where w first puts y down on z then sometime later stacks x on top of y. Finally, tex2html_wrap_inline8104 denotes an event type where w first unstacks x from on top of y (which is resting on z) and then sometime later picks y up off of z. Figure 1 shows sample movies depicting occurrences of the event types tex2html_wrap_inline8092 and tex2html_wrap_inline8094 . Figures 11 through 15 show sample movies depicting occurrences of the event types tex2html_wrap_inline8096 , tex2html_wrap_inline8098 , tex2html_wrap_inline8100 , tex2html_wrap_inline8102 , and tex2html_wrap_inline8104 respectively.

   figure804
Figure 10: The lexicon of compound event types used by LEONARD.

   figure984
Figure 11: An image sequence depicting a stack event.

   figure1005
Figure 12: An image sequence depicting an unstack event.

   figure1026
Figure 13: An image sequence depicting a move event.

   figure1055
Figure 14: An image sequence depicting an assemble event.

   figure1096
Figure 15: An image sequence depicting a disassemble event.

Nominally, all atomic event-logic expressions are primitive event types. However, we allow giving a name to a compound event-logic expression and using this name in another event-logic expression as short hand for the named expression with appropriate parameter substitution. This is simply a macro-expansion process and, as such, no recursion is allowed. This feature is used in Figure 10 to define tex2html_wrap_inline8890 , tex2html_wrap_inline8892 , and tex2html_wrap_inline8894 in terms of tex2html_wrap_inline8896 ; tex2html_wrap_inline8898 , tex2html_wrap_inline8892 , and tex2html_wrap_inline8902 in terms of tex2html_wrap_inline8904 ; tex2html_wrap_inline8902 in terms of tex2html_wrap_inline8898 , which is itself defined in terms of tex2html_wrap_inline8904 ; and tex2html_wrap_inline8894 in terms of tex2html_wrap_inline8890 , which is itself defined in terms of tex2html_wrap_inline8896 .

The overall goal of the event-classification component is to infer all occurrences of a given set of compound event types from a given set of primitive event occurrences. The model-reconstruction component combined with the primitive event-type definitions given in Figure 9 produces a set of primitive event occurrences for a given scene sequence. Figure 10 lists parameterized compound event types. These are instantiated for all tuples of objects in the scene sequence to yield ground compound event-logic expressions. The event-classification component infers all occurrences of these compound event types that follow from the set of primitive event occurrences. Let us define tex2html_wrap_inline8918 to be tex2html_wrap_inline8920 . The model-reconstruction component combined with the primitive event-type definitions given in Figure 9 produces M. Instantiating the parameterized compound event types from Figure 10 for all object tuples yields a set of event-logic expressions. The event-classification component computes tex2html_wrap_inline8918 for every  tex2html_wrap_inline8290 in this set.

In principle, tex2html_wrap_inline8918 could by implemented as a straightforward application of the formal semantics for event logic as specified above. There is a difficulty in doing so, however. The primitive event types have the property that they are liquid. Liquid events have the following two properties. First, if they are true during an interval  tex2html_wrap_inline8292 , then they are also true during any subinterval of  tex2html_wrap_inline8292 . Second, if they are true during two overlapping intervals  tex2html_wrap_inline8292 and  tex2html_wrap_inline8412 , then they are also true during tex2html_wrap_inline8414 and any subinterval of tex2html_wrap_inline8414 . For example, if an object is supported during [1,10], then it also is supported during [2,5], [3,8], and all other subintervals of [1,10]. Similarly, if an object is supported during [1,5] and [4,10], then it also is supported during [1,10] and all of its subintervals. [48] introduced the notion of liquidity and [68], [22], [69], and [30] have observed that many event types have this property. Because the primitive event types are liquid, they will hold over an infinite number of subintervals. This renders the formal semantics inappropriate for a computational implementation. Even if one limits oneself to intervals with integral endpoints, the primitive event types will hold over quadratically many subintervals of the scene sequence. Furthermore, a straightforward computational implementation of the formal semantics would be inefficient, because it requires quantifying over subintervals to implement  tex2html_wrap_inline8540 and quantifying over pairs of subintervals to implement  tex2html_wrap_inline8958 . The central result of this paper is a novel representation, called spanning intervals, that allows an efficient representation of the infinite sets of subintervals over which liquid event types hold along with an efficient inference procedure that operates on that representation. This representation, and the inference procedure that implements tex2html_wrap_inline8918 , are presented in the next section.


next up previous
Next: An Efficient Representation and Up: Grounding the Lexical Semantics Previous: Model Reconstruction

Jeffrey Mark Siskind
Wed Aug 1 19:08:09 EDT 2001