SAIL: Self-supervised Albedo Estimation from Real Images with a Latent Diffusion Model

1Noah's Ark, Huawei Paris Research Center
2IBISC, Univ. Evry Paris-Saclay,
3L Research

We propose a self-supervised albedo estimation with a latent diffusion model for real images. From a single image under real-world lighting conditions, SAIL extracts high-fidelity albedo by repurposing and finetuning a pretrained latent diffusion model (left). The estimated albedo enables downstream tasks such as single-image virtual relighting, demonstrated using Blender with different environment maps (right).

Results

Albedo estimation for images under complex real-world lighting conditions.

Input LatentIntrinsics SAIL(Ours)
Input LatentIntrinsics SAIL(Ours)
Input LatentsIntrinsics SAIL(Ours)
Input LatentsIntrinsics SAIL(Ours)


Consistency accross diffrent illuminations.

SAIL is more robust and consistent to various illumination of the same scene.




Applications

Unconditioned image relighting

Our method enables uncondtioned realistic relighting from a single image by predicting albedos unaffected by illumination, allowing diverse lighting conditions to be generated without any explicit supervision and conditioning.


Virtual image relighting

We show virtual relighting results using Blender with different environment maps.




Abstract

Intrinsic image decomposition aims at separating an image into its underlying albedo and shading components, isolating the base color from lighting effects to enable downstream applications such as virtual relighting and scene editing. Despite the rise and success of learning-based approaches, intrinsic image decomposition from real-world images remains a significant challenging task due to the scarcity of labeled ground-truth data. Most existing solutions rely on synthetic data as supervised setups, limiting their ability to generalize to real-world scenes. Self-supervised methods, on the other hand, often produce albedo maps that contain reflections and lack consistency under different lighting conditions. To address this, we propose SAIL, an approach designed to estimate albedo-like representations from single-view real-world images. We repurpose the prior knowledge of a latent diffusion model for unconditioned scene relighting as a surrogate objective for albedo estimation. To extract the albedo, we introduce a novel intrinsic image decomposition fully formulated in the latent space. To guide the training of our latent diffusion model, we introduce regularization terms that constrain both the lighting-dependent and independent components of our latent image decomposition. SAIL predicts stable albedo under varying lighting conditions and generalizes to multiple scenes, using only unlabeled multi-illumination data available online.