MPI-Video Surveillance and Monitoring:
Overview of System Architecture

Figure 1 shows an overview of an MPI-Video system. A set of sensors acquires information about the real world, and feeds the information to the EM which assimilates it into a single, coherent model. The sensors may include a wide variety of disparate sensors such as infrared cameras, motion detector, identification card readers, microphones, range sensors and video cameras. A query system interacts with the EM, treating it as a database that stores information about the environment. Typically, queries will access high-level information from the EM to answer questions such as ``are person A and person B talking to each other?'', or for the application of video surveillance and monitoring, ``is person C engaged in some suspicious activity?''. More generally, the EM can be considered to be an environment information server, providing environment information to clients that request it. A graphical visualization system, such as immersive video or volumetric rendering, can easily take advantage of the data in the EM to render an image for a user.

Figure 1: Schematic overview of information assimilation system based on an environment model.

The EM consists of a set of layers as shown in Figure 2. Each layer is an object (in the object-oriented programming sense) containing both data and methods. The data is the representation of the world maintained by the EM. When the EM is assimilating information, the data in the EM are treated as state vectors in a Kalman filter or hidden Markov model (HMM) estimation. Alternatively, when the EM is acting as an environment information server, the data in the EM are treated as a database. Each layer has a standard set of methods to perform the assimilation and respond to client requests. This includes methods to initialize a layer, extrapolate the state of a layer according to some dynamic system model, update the state of a layer based on data from a sensor or from another layer, and access the value of a state for use by another layer or for a client query.

Figure 2: Schematic of layered organization of a hypothetical environment model. Each layer is an object containing data and methods. Diamonds represent the data, and squares and circles represent the two types of methods, assimilation and state access. Sensors provide data to a single layer only. Assimilation occurs when data is passed up and down between layers. The query system gains access to the database by methods that access the state.

Layers in the EM are organized by abstraction; the bottom layer of the system is the least abstract and the top layer is the most abstract. Typically the least abstract data will be low-level raw sensor data. Examples include range measurements and digitized video. Abstract data consists of high-level information like names of objects and the activities that the objects are engaged in. Data flows both up and down between layers. Although video sensors provide low-level pixel information, other sensors can provide more abstract information. An ID badge reader, for example, gives abstract information directly from the sensor.

A layer, or a group of layers, may carry out specific vision tasks. For example, people-tracking and activity recognition methods can be implemented as a small number of layers useful for tracking people. The MPI-Video infrastructure then integrates these methods into a single system.

Assimilation occurs when layers pass information to each other. This sharing of information is modeled after the Kalman filter (see assimilation). In this way the system achieves strongly coupled assimilation. Each sensor interacts with only a single layer that is responsible for bringing information from that sensor into the EM. The sensor data propagates to other layers as part of the assimilation process.

The EM architecture has several useful properties:

Conceptual Simplicity: The system is built entirely out of layers, where each layer has the same fundamental methods. Layers can be designed such that any individual layer performs only simple operations.
Strongly-Coupled Assimilation: The Kalman filter and HMM estimation methods strongly couple the data in the layers.
Modularity: Layers are the fundamental building block of the system. Once a set of layers has been designed, they can easily be swapped in and out of a system as needs change.

MPI VSAM Home Page -- Executive Summary -- Our Vision -- Project Information --
Papers -- Investigators -- Administrative Staff -- Related Links

Web page design by Jeffrey E. Boyd

MPI-Video Surveillance and Monitoring: Overview of System Architecture

MPI-Video Surveillance and Monitoring:
Overview of System Architecture