Model reconstruction determines the truth values of the force-dynamic relations
on a frame-by-frame basis in the input movie.
Intervals of constant truth value for a given force-dynamic relation are taken
to be primitive event occurrences.
LEONARD uses *event logic* to infer compound event occurrences from
primitive event occurrences.
For example, for the image sequence in Figure 1(a), model
reconstruction determines that the green block supports the red block from
Frame 0 to Frame 13 and that the hand is attached to the red block from
Frame 13 to Frame 20.
This will be denoted as and
, i.e. that the primitive event types
and occurred
during the intervals [0,13) and [13,20) respectively.
The compound event type
might be defined as

i.e. followed by . (In the above, I use `;' informally as a sequence operator. The precise definition of `;' will be given momentarily.) The task of the event-logic inference procedure is to infer , i.e. that the compound event type occurred during the interval [0,20).

Event logic provides a calculus for forming compound event types as expressions over primitive event types. The syntax and semantics of event logic will be described momentarily. Event-logic expressions denote event types, not event occurrences. As such, they do not have truth values. Rather, they are predicates that describe the truth conditions that must hold of an interval for an event to occur. In contrast, an event-occurrence formula does have a truth value. If is an event-logic expression that denotes a primitive or compound event type, and is an interval, then is an atomic event-occurrence formula that is true if and only if the truth conditions for the event type hold of the interval .

denotes *coincidental* occurrence, the fact that an occurrence
of started at the beginning of and finished at the end of .
would not hold if an occurrence of did not precisely coincide
with , but instead overlapped,
partially or totally, with .
Event types have internal temporal structure that render this distinction
important.
In the case of primitive event types, that structure is simple.
Each primitive event type is derived from a predicate.
If is a predicate, then denotes the primitive event
type derived from .
A primitive event type holds of an interval if the
corresponding predicate holds of every instant in that
interval.
This means that and might
have different truth values.
For example, if is true of every instant in [0,2) and false of every
other instant, then is true while
is false.
Event logic takes coincidental occurrence to be a primitive notion.
As will be demonstrated below, overlapping occurrence is a derived notion that
can be expressed in terms of coincidental occurrence using compound
event-logic expressions.

Two auxiliary notions are needed to define the syntax and semantics of event
logic.
First, there are thirteen possible relations between two intervals.
Following [4], I denote these relations as =, <, >,
, , , , , ,
, , , and and refer to them
collectively as *Allen relations* throughout the paper.
The names , , , , and are
mnemonics for *meet*, *overlap*, *start*, *finish*, and
*during* respectively.
The inverse relations, such as , whose names end in
are the same as the corresponding relations, such as , whose names do
not end in except that the arguments are reversed.
Figure 8 depicts all thirteen Allen relations graphically.
Second, I define the *span* of two intervals and , denoted
, as the smallest super-interval of both and .

**Figure 8:** The Allen relations, the thirteen possible relations between two
intervals.

The syntax of event logic is defined as follows.
We are given finite disjoint sets of constant symbols
along with a finite set of primitive event-type symbols, each of a specified
arity.
Constant symbols, such as and , denote objects in the input
movie while primitive event-type symbols, such as ,
denote parameterized primitive event types.
An atomic event-logic expression is a primitive event-type symbol of arity *n*
applied to a sequence of *n* constants.
For example, .
An event-logic expression is either an atomic event-logic expression or
one of the compound event-logic expressions , ,
, or ,
where and are event-logic expressions
and .

Informally, the semantics of compound event-logic expressions is defined as follows:

- denotes the non-occurrence of .
An occurrence of coincides with if no occurrence of
coincides with .
Note that could be true, even if an occurrence of
*overlapped*with , so long as no occurrence of*coincided*with . - denotes the occurrence of either or .
- denotes the occurrence of both and .
The occurrences of and need not be simultaneous.
The subscript
*R*specifies a set of allowed Allen relations between the occurrences of and . If occurrences of and coincide with and respectively, and for some , then an occurrence of coincides with the span of and . I abbreviate the special case simply as without any subscript. describes an aggregate event where both and occur simultaneously. I also abbreviate the special case as . describes an aggregate event where an occurrence of is immediately followed by an occurrence of . - An occurrence of coinciding with denotes an
occurrence of at some other interval such that for
some .
can act as a tense operator.
Expressions such as ,
,
,
and specify that
happened in the noncontiguous past, noncontiguous future, contiguous past,
or contiguous future respectively.
The operator can also be used to derive overlapped occurrence
from coincidental occurrence.
An occurrence of coincides
with if an occurrence of overlaps with .
I abbreviate simply as
without any subscript.
Note that while indicates that no occurrence of
*coincided*with , indicates that no occurrence of*overlapped*with .

Formally, the truth of an atomic event-occurrence formula is
defined relative to a model.
Let *I* be the set of all intervals and *O* be the set of all objects in
the movie.
A model *M* is a map from primitive event-type symbols of arity *n* to subsets
of .
(*M* can be viewed as either a model or as a movie.)
*M* thus maps primitive event-type symbols to relations that take an interval
as their first argument, in addition to the remaining object parameters.
The semantics of event logic is formally defined by specifying an entailment
relation as follows:

- if and only if .
- if and only if .
- if and only if or .
- if and only if there exist two intervals and such that , for some , , and .
- if and only if there exists some interval such that for some and .

Figure 9 shows the primitive event types currently
used by LEONARD.
The definitions in Figure 9 are formulated in terms of
the predicates , , , and that
are produced by model reconstruction as described in
Section 2.
An object is supported if it is not grounded.
Two objects contact if their polygons touch and they are on the same layer.
Two polygons *p* and *q* touch, denoted , if they intersect
and their intersection has zero area.
Two objects are attached if there is a rigid or revolute joint that joins
them.
Determining whether an object *x* supports an object *y* is a little more
complex and requires counterfactual reasoning.
Let be the set of polygons in a scene,
be the
most-preferred model of the scene as produced by model reconstruction,
and be true if the scene is stable under
an interpretation *I*.
Stability analysis, i.e. the predicate, is a subroutine used by the
model-reconstruction component.
An object *x* supports an object *y* if *y* is not grounded in the
most-preferred model *I* and a variant of with *x* removed is
not stable under a variant of *I* where all objects except for *y* and those
rigidly attached, directly or indirectly, to *y* are grounded.
In Figure 9, denotes
the variant of with *x* removed, denotes
the fact that *x* is rigidly attached to *y*, denotes
the reflexive transitive closure of the relation,
denotes the set of objects that
are rigidly attached, directly or indirectly, to *y*, and
denotes a variant of *I* where all objects except for *y* and those
rigidly attached, directly or indirectly, to *y* are grounded.

**Figure 9:** Definition of the primitive event types used by LEONARD.

Figure 10 shows the compound event-type definitions
currently used by LEONARD.
denotes an event type where *x* picks *y* up off of *z*.
It is specified as a sequence of three intervals, where *x* is not attached
to and does not support *y* in the first interval but is attached to and does
support *y* in the third interval.
Additionally, *z* supports *y* in the first interval but does not support *y*
in the third interval.
Furthermore, several conditions must hold in both the first and third
intervals: *x* must be unsupported, *y* must not support either *x* or *z*,
*x* and *z* must not support each other, and *y* must not be attached to *z*.
During the second interval, intermediate between the first and third
intervals, either *x* is attached to *y* or *y* is attached
to *z*.
Additionally, several conditions must hold throughout the entire event: *x*,
*y*, and *z* must be distinct and *y* must be supported.
denotes an event type where *x* puts *y* down on *z*.
It is specified in a fashion that is similar to but where
the three subevents occur in reverse order.
denotes an event type where *w* puts *x* down on *y*
which is resting on *z*.
It is specified as , where *z* supports but is not
attached to *y* and *z* is distinct from *w*, *x*, and *y*.
denotes an event type where *w* picks *x* up off
of *y* which is resting on *z*.
It is specified as , where *z* supports but is not
attached to *y* and *z* is distinct from *w*, *x*, and *y*.
denotes an event type where *w* picks *x* up off of *y* and
puts it down on *z* which is distinct from *y*.
denotes an event type where *w* first puts *y* down
on *z* then sometime later stacks *x* on top of *y*.
Finally, denotes an event type where *w* first
unstacks *x* from on top of *y* (which is resting on *z*) and then sometime
later picks *y* up off of *z*.
Figure 1 shows sample movies depicting occurrences of the event
types and .
Figures 11 through 15 show sample movies
depicting occurrences of the event types ,
, , , and
respectively.

**Figure 10:** The lexicon of compound event types used by LEONARD.

**Figure 11:** An image sequence depicting a *stack* event.

**Figure 12:** An image sequence depicting an *unstack* event.

**Figure 13:** An image sequence depicting a *move* event.

**Figure 14:** An image sequence depicting an *assemble* event.

**Figure 15:** An image sequence depicting a *disassemble* event.

Nominally, all atomic event-logic expressions are primitive event types. However, we allow giving a name to a compound event-logic expression and using this name in another event-logic expression as short hand for the named expression with appropriate parameter substitution. This is simply a macro-expansion process and, as such, no recursion is allowed. This feature is used in Figure 10 to define , , and in terms of ; , , and in terms of ; in terms of , which is itself defined in terms of ; and in terms of , which is itself defined in terms of .

The overall goal of the event-classification component is to infer all
occurrences of a given set of compound event types from a given set of
primitive event occurrences.
The model-reconstruction component combined with the primitive event-type
definitions given in Figure 9 produces a set of
primitive event occurrences for a given scene sequence.
Figure 10 lists parameterized compound event types.
These are instantiated for all tuples of objects in the scene sequence to yield
ground compound event-logic expressions.
The event-classification component infers all occurrences of these compound
event types that follow from the set of primitive event occurrences.
Let us define to be .
The model-reconstruction component combined with the primitive event-type
definitions given in Figure 9 produces *M*.
Instantiating the parameterized compound event types from
Figure 10 for all object tuples yields a set of
event-logic expressions.
The event-classification component computes for every in
this set.

In principle, could by implemented as a straightforward
application of the formal semantics for event logic as specified above.
There is a difficulty in doing so, however.
The primitive event types have the property that they are *liquid*.
Liquid events have the following two properties.
First, if they are true during an interval , then they are also true during
any subinterval of .
Second, if they are true during two overlapping intervals and , then
they are also true during and any subinterval of
.
For example, if an object is supported during [1,10], then it also is
supported during [2,5], [3,8], and all other subintervals of [1,10].
Similarly, if an object is supported during [1,5] and [4,10], then it also
is supported during [1,10] and all of its subintervals.
[48] introduced the notion of liquidity and [68],
[22], [69], and [30] have observed that
many event types have this property.
Because the primitive event types are liquid, they will hold over an infinite
number of subintervals.
This renders the formal semantics inappropriate for a computational
implementation.
Even if one limits oneself to intervals with integral endpoints, the primitive
event types will hold over quadratically many subintervals of the scene
sequence.
Furthermore, a straightforward computational implementation of the formal
semantics would be inefficient, because it requires quantifying over
subintervals to implement and quantifying over pairs of
subintervals to implement .
The central result of this paper is a novel representation, called
*spanning intervals*, that allows an efficient representation of the
infinite sets of subintervals over which liquid event types hold along with an
efficient inference procedure that operates on that representation.
This representation, and the inference procedure that implements ,
are presented in the next section.

Wed Aug 1 19:08:09 EDT 2001