1. Introduction

GIRAFFE is a method for generating scenes in a controllable and photo-realistic manner while training from raw unstructured image collections
- Incorporating compositional 3D scene representation with generator → controllable image synthesis
- Combining explicit 3D representation with neural rendering pipeline → faster inference and realistic images

2. Related Work

GAN based Image Synthesis
- GAN variants allow photo-realistic image synthesis that can be controlled at the object-level
- [-] Only works on 2D levels ignoring 3D structure
Implicit Functions
- NeRFs combine implicit neural model with volume rendering for novel view synthesis
- [-] Require multi-view images with camera poses supervision, training single network per scene, and are unable to generate novel scenes
3D aware Image Synthesis
- Generative Neural Radiance Fields(GRAF) achieve controllable image synthesis with high resolution
- [-] restricted to single-object scenes and degrade on complex imagery

GOAL : Controllable image synthesis pipe line trained from raw image collections without supervision

Untitled

Untitled

positional encoding introduces inductive bias to learn 3D shape representation in canonical orientations which otherwise would be arbitrary

Untitled

3D point ($\bold{x} \in \mathbb{R}^3$), viewing direction ($\bold{c} \in \mathbb{S}^2$) to volume density($\bold{\sigma} \in \mathbb{R}^+$), RGB color value ($\bold{c} \in \mathbb{R}^3$)

Untitled

Untitled

GRAF’s 3 dimensional color output is replaced with a more generic $M_f$ dimensional feature $\bold{f}$ and represent objects as Generative Neural Feature Fields