GIRAFFE is a method for generating scenes in a controllable and photo-realistic manner while training from raw unstructured image collections
Incorporating compositional 3D scene representation with generator → controllable image synthesis
Combining explicit 3D representation with neural rendering pipeline → faster inference and realistic images
2. Related Work
GAN based Image Synthesis
GAN variants allow photo-realistic image synthesis that can be controlled at the object-level
[-] Only works on 2D levels ignoring 3D structure
Implicit Functions
NeRFs combine implicit neural model with volume rendering for novel view synthesis
[-] Require multi-view images with camera poses supervision, training single network per scene, and are unable to generate novel scenes
3D aware Image Synthesis
Generative Neural Radiance Fields(GRAF) achieve controllable image synthesis with high resolution
[-] restricted to single-object scenes and degrade on complex imagery
3. Method
GOAL : Controllable image synthesis pipe line trained from raw image collections without supervision
3-1. Objects as Neural Feature Fields
Neural Radiance Fields
positional encoding introduces inductive bias to learn 3D shape representation in canonical orientations which otherwise would be arbitrary
3D point ($\bold{x} \in \mathbb{R}^3$), viewing direction ($\bold{c} \in \mathbb{S}^2$) to volume density($\bold{\sigma} \in \mathbb{R}^+$), RGB color value ($\bold{c} \in \mathbb{R}^3$)
Generative Neural Feature Fields
Generative Radiance Fields(GRAF)
Shape and appearance codes $\bold{z}_s,\bold{z}_a$ with dimension $M_s, M_a$
GRAF’s 3 dimensional color output is replaced with a more generic $M_f$ dimensional feature $\bold{f}$ and represent objects as Generative Neural Feature Fields