The best way to understand what the layers in the environment model (EM) do is to look at an example. The following is a description of an EM used for parking lot surveillance.
Look at the figures below that depict a parking lot viewed from the air. Normally, when a person goes to their car they walk directly to their car, perhaps put some baggage in the trunk, get in the car and drive off. In contrast, a potential thief will move through the parking lot, stopping to check several cars (loitering) until he finds one that is a desirable target. The distinction between going to a single car and visiting several cars can distinguish between normal and suspicious behavior. The EM has four sets of layers as follows. For assimilation, each layer needs a dynamic model (for extrapolation of the state) and an update method.
Aerial view of a parking lot showing a typical track of a pedestrian exhibiting normal behavior. The pedestrian walks through to parking lot and stops at a single car.
The same aerial view shown above, but showing a track for suspicious behavior in the parking lot. The pedestrian walks through the parking lot, stopping at several cars in turn, perhaps looking for valuables to steal or a vehicle without an anti-theft device.
A set of layers in the EM maintains the video data acquired from multiple cameras surrounding a parking lot. There is one set of layers for each camera. The first layer is responsible for video acquisition and keeps a current frame of video data. A camera calibration layer maintains camera calibration parameters essential for tracking in three dimensions. Separation of dynamic foreground objects from a static background is done by a segmentation layer. The fourth layer does two-dimensional blob tracking. Blobs are connected components of foreground pixels. The layer keeps a list of blobs being tracked in a single camera view. The list varies in length depending on the number of objects visible, but the information for each blob is the same, i.e., two-dimensional position and velocity.
|Video Acquisition||current video image||no dynamic model||acquire new image from sensor|
|Camera Calibration||camera calibration parameters||no dynamic model||no update for static cameras|
|Segmentation||binary image separating foreground and background||no dynamic model||segment data provided by
|2D Tracking||list of 1st and 2nd moments of image-coordinate position and velocity of foreground blobs||linear - assume blobs have constant velocity||use measured position as provided by the segmentation layer|
Summary of video data layers. One complete set is required for each camera used.
Multiple cameras placed in strategic positions allow us to track objects through the parking lot in three dimensions by combining the two-dimensional blob information and camera calibration data. A single three-dimensional blob tracking layer maintains a list of objects and their three-dimensional positions and velocity.
|3D Tracking||list of 1st and 2nd moments of world-coordinate 3D position and velocity of foreground blobs||linear - assume blobs have constant velocity||use 2D positions from 2D tracking layers and camera calibration layer|
Summary of 3D tracking layer. One required for parking lot surveillance.
A static environment layer stores a priori information about static objects in the parking lot. Information in this layer indicates the positions of important static objects in the environment such as parking stalls and curbs.
|Static Environment||list of positions of static objects||no dynamic model||no update - the static environment is determined a priori|
Summary of static environment layer. One required for parking lot surveillance.
To detect the suspicious loitering activity, we model an activity as a sequence of poses or physical states, then observe the state transitions of a person in the parking lot and determine whether or not those transitions correspond to loitering. This requires two layers in the EM: one to estimate or observe the states, and another to determine if a model of an activity explains the observations. The loitering activity is then defined by a series of state transitions between moving and stopped by a parking stall. A pose observation layer uses the three-dimensional object state and the static environment data to observe the state of a person. An HMM estimation layer takes those observations and estimates the likelihood that the loitering activity explains them.
|State Observation||an enumerated observation of a state in an activity/behavior||no dynamic model||make observations of the states using 3D positions from 3D tracking layer and positions of objects in the static environment layer|
|Activity Recognition||discrete probability distribution indicating which state/activity an object is engaged in||defined for activity of interest - in this case loitering in a parking lot||update based on the observations made by the state observation layer|
Summary of activity recognition layers. One set is required for each activity of interest.
MPI VSAM Home Page --
Executive Summary --
Our Vision --
Project Information --
Papers -- Investigators -- Administrative Staff -- Related Links
Web page design by Jeffrey E. Boyd