MPI-Video Surveillance and Monitoring:
Parking Lot Surveillance Example


The best way to understand what the layers in the environment model (EM) do is to look at an example. The following is a description of an EM used for parking lot surveillance.

Look at the figures below that depict a parking lot viewed from the air. Normally, when a person goes to their car they walk directly to their car, perhaps put some baggage in the trunk, get in the car and drive off. In contrast, a potential thief will move through the parking lot, stopping to check several cars (loitering) until he finds one that is a desirable target. The distinction between going to a single car and visiting several cars can distinguish between normal and suspicious behavior. The EM has four sets of layers as follows. For assimilation, each layer needs a dynamic model (for extrapolation of the state) and an update method.

Aerial view of a parking lot showing a typical track of a pedestrian exhibiting normal behavior. The pedestrian walks through to parking lot and stops at a single car.

The same aerial view shown above, but showing a track for suspicious behavior in the parking lot. The pedestrian walks through the parking lot, stopping at several cars in turn, perhaps looking for valuables to steal or a vehicle without an anti-theft device.

Video Data Layers

A set of layers in the EM maintains the video data acquired from multiple cameras surrounding a parking lot. There is one set of layers for each camera. The first layer is responsible for video acquisition and keeps a current frame of video data. A camera calibration layer maintains camera calibration parameters essential for tracking in three dimensions. Separation of dynamic foreground objects from a static background is done by a segmentation layer. The fourth layer does two-dimensional blob tracking. Blobs are connected components of foreground pixels. The layer keeps a list of blobs being tracked in a single camera view. The list varies in length depending on the number of objects visible, but the information for each blob is the same, i.e., two-dimensional position and velocity.

TypeStateDynamic ModelUpdate
Video Acquisition current video image no dynamic model acquire new image from sensor
Camera Calibration camera calibration parameters no dynamic model no update for static cameras
Segmentation binary image separating foreground and background no dynamic model segment data provided by
acquisition layer
2D Tracking list of 1st and 2nd moments of image-coordinate position and velocity of foreground blobs linear - assume blobs have constant velocity use measured position as provided by the segmentation layer

Summary of video data layers. One complete set is required for each camera used.

Three-Dimensional Tracking Layers

Multiple cameras placed in strategic positions allow us to track objects through the parking lot in three dimensions by combining the two-dimensional blob information and camera calibration data. A single three-dimensional blob tracking layer maintains a list of objects and their three-dimensional positions and velocity.

TypeStateDynamic ModelUpdate
3D Tracking list of 1st and 2nd moments of world-coordinate 3D position and velocity of foreground blobs linear - assume blobs have constant velocity use 2D positions from 2D tracking layers and camera calibration layer

Summary of 3D tracking layer. One required for parking lot surveillance.

Static Environment Layers

A static environment layer stores a priori information about static objects in the parking lot. Information in this layer indicates the positions of important static objects in the environment such as parking stalls and curbs.

TypeStateDynamic ModelUpdate
Static Environment list of positions of static objects no dynamic model no update - the static environment is determined a priori

Summary of static environment layer. One required for parking lot surveillance.

Activity Recognition Layers

To detect the suspicious loitering activity, we model an activity as a sequence of poses or physical states, then observe the state transitions of a person in the parking lot and determine whether or not those transitions correspond to loitering. This requires two layers in the EM: one to estimate or observe the states, and another to determine if a model of an activity explains the observations. The loitering activity is then defined by a series of state transitions between moving and stopped by a parking stall. A pose observation layer uses the three-dimensional object state and the static environment data to observe the state of a person. An HMM estimation layer takes those observations and estimates the likelihood that the loitering activity explains them.

TypeStateDynamic ModelUpdate
State Observation an enumerated observation of a state in an activity/behavior no dynamic model make observations of the states using 3D positions from 3D tracking layer and positions of objects in the static environment layer
Activity Recognition discrete probability distribution indicating which state/activity an object is engaged in defined for activity of interest - in this case loitering in a parking lot update based on the observations made by the state observation layer

Summary of activity recognition layers. One set is required for each activity of interest.


MPI VSAM Home Page -- Executive Summary -- Our Vision -- Project Information --
Papers -- Investigators -- Administrative Staff -- Related Links

Web page design by Jeffrey E. Boyd