WTF Is the Latent Space?
Latent space is the storyboard. So, it goes like this. You have a script that you want to make into film.
You slice and dice the script into scenes and how you want them filmed. The document that captures this transition is a storyboard. Then, from that storyboard the film crew and the film machinery do the capture that eventually gets edited and produced into a film.
We just went: script -> [encode] -> sctoryboard -> [decode] -> [film]. IRL the encoder is the director and the decoder are the film crew and everything in the postproduction (plus beancounters, studio, etc.).
With stable diffusion the encoder are your samplers (i.e. Euler A, Heun, etc.) and the decoder are your VAE (yes, I know VAE is encode/decode, but fuck it).
Script (Prompt) | Storyboard (Latent) | Film (Clipspace/Pixelspace) |
|
|
|
The reason why we even have latent space is because it is much cheaper and faster to manipulate. The efforts to manipulate the latent space are one-millionth of what the pixelspace costs to manipulate. Exactly the same reason why we have storyboards.
That's how I map "latent space". Would you have a better example?