Semantically Supervised Appearance Decomposition for Virtual Staging from a Single Panorama

Dance and Jump

We describe a novel approach to decompose a single panorama of an empty indoor environment into four appearance components: specular, direct sunlight, diffuse and diffuse ambient without direct sunlight. Our system is weakly supervised by automatically generated semantic maps (with floor, wall, ceiling, lamp, window and door labels) that have shown success on perspective views and are trained for panoramas using transfer learning without any further annotations. A GAN-based approach supervised by coarse information obtained from the semantic map extracts specular reflection and direct sunlight regions on the floor and walls. These lighting effects are removed via a similar GAN-based approach and a semantic-aware inpainting step. The appearance decomposition enables multiple applications including sun direction estimation, virtual furniture insertion, floor material replacement, and sun direction change, providing an effective tool for virtual home staging. We demonstrate the effectiveness of our approach on a large and recently released dataset of panoramas of empty homes.


"Semantically Supervised Appearance Decomposition for Virtual Staging from a Single Panorama"
Tiancheng Zhi, Bowei Chen, Ivaylo Boyadzhiev, Sing Bing Kang, Martial Hebert, and Srinivasa G. Narasimhan
ACM Transactions on Graphics (SIGGRAPH), 2022.
[Paper (130 MB)] [Compressed Paper (7 MB)] [Supp] [Dataset] [Code]


Our system consists of three modules: (1) Coarse Effects Localization coarsely localizes lighting effects using automatically generated semantics; (2) Lighting Effects Detection separates specular reflection on the floor and direct sunlight on the floor and the wall (effects are brightened for visualization); (3) Lighting Effects Removal removes the detected specular reflection and direct sunlight, outputting a diffuse image (no specular) and an ambient image (no specular and sunlight). The specular reflection, direct sunlight, diffuse image, and ambient image can be used for various virtual staging applications.

The geometry of lighting effects considered in this work. (a) For specular reflections, when the camera is upright, the light source point (transparent or open window, door, or indoor lamp), the reflection point, and the camera center lie on a vertical plane (shown as gray), corresponding to a column in the panorama. (b) For direct sunlight, the sun direction establishes the mapping between a window (or door) point and a floor point illuminated by direct sunlight. The sunlit floor area can be back-projected to the window according to the sun direction.

A semantic map with 7 classes is computed automatically from a panorama. The map is used to obtain coarse lighting effect masks exploiting the geometric constraints in the figure above. These coarse masks are used to supervise a GAN-based method to obtain accurate specular and sunlit images.

Architecture for lighting effects detection. The lighting effects network takes a panorama as input, and predicts the specular and sunlight components. If the prediction is good, a local discriminator trying to locate specular regions should fail on the image with specular component removed. This is the key supervision signal for training the lighting effects network. The local discriminator is supervised by coarse specular masks obtained from semantics. Similar techniques are applied to sunlight regions and regions with overlapping specular and direct sunlight.

Results and Applications

Scene 1: insert furniture, change material, change sun direction.

Scene 2: insert furniture.

Scene 3: insert furniture.

Scene 4: insert furniture.

Scene 5: change sun direction.


This work was supported by a gift from Zillow Group, USA, and NSF Grants #CNS-2038612, #IIS-1900821.