Last updated 2 years ago
This paper's model architecture has many components, so I thought it would be good to layout the specifics of the architecture before implementing it.
Stage-I GAN
Stage-2 GAN
Output:
Input: Text embedding of the text description (φt)(\varphi_t)(φt)
Purpose: Create c0^\hat{c_0}c0^ vector that captures the meaning of φt\varphi_tφt with variations.
Process: φt\varphi_tφt → FC layer → μ0,σ0\mu_0, \sigma_0μ0,σ0 → N(μ0(φt),σ0(φt))\mathcal{N}(\mu_0(\varphi_t),\sigma_0(\varphi_t))N(μ0(φt),σ0(φt)) → c0^\hat{c_0}c0^ sampled from this Gaussian distribution