Stack GAN

This paper's model architecture has many components, so I thought it would be good to layout the specifics of the architecture before implementing it.

Model Architecture

  1. Stage-I GAN

  2. Stage-2 GAN

Stage-I GAN

Input: Text embedding of the text description (φt)(\varphi_t)

Conditioning Augmentation (CA)

Purpose: Create c0^\hat{c_0} vector that captures the meaning of φt\varphi_t with variations.

Process: φt\varphi_t → FC layer → μ0,σ0\mu_0, \sigma_0N(μ0(φt),σ0(φt))\mathcal{N}(\mu_0(\varphi_t),\sigma_0(\varphi_t))c0^\hat{c_0} sampled from this Gaussian distribution

Output:

Last updated