High-Zoom Video Hallucination by Exploiting Spatio-Temporal Regularities
Takeo Kanade and
The Robotics Institute
Carnegie Mellon University
Pittsburgh, PA 15213
In this paper, we consider the problem of super-resolving a human
face video by a very high (x16) zoom factor.
Inspired by recent literature on hallucination and example-based
learning, we formulate this task using a graphical model
that encodes 1) spatio-temporal consistencies, and 2) image formation
& degradation processes.
A video database of facial expressions is used to learn
a domain-specific prior for high-resolution videos.
The problem is posed as one of
probabilistic inference, in which we aim to find
the high resolution video that best satisfies
the constraints expressed through the graphical model.
Traditional approaches to this problem using video data
first estimate the relative
motion between frames and then compensate for it, effectively
resulting in multiple measurements of the scene. Our use of time
is rather direct: We define data structures that span multiple
consecutive frames, enriching our feature vectors with a temporal
signature. We then exploit these signatures
to find consistent solutions over time.
In our experiments, a 8x6 pixel-wide face video, subject
to translational jitter and additive noise,
gets magnified to a 128x96 pixel video.
Our results show that by exploiting both space and time, drastic
improvements can be achieved in both video flicker artifacts and
(appears in the Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, CVPR 2004, Vol. 2, pages 151-158.)
Download this document in pdf (492KB)
or gzip'ed postscript (1.6MB) format.
The following set of videos can be downloaded as a single
gzip'ed tar file (5.4MB).