Generative models · 2023
Diffusion Model
Building DDPM from the noise up, until the math stopped feeling like magic.
Why I built it
Diffusion models had taken over generative imaging, and I could use them — but I couldn't have derived one on a whiteboard. That gap bothered me. Reading the DDPM paper a third time wasn't closing it, so I did the thing that always works for me: I rebuilt it from scratch, with no reference implementation open in another tab, and refused to move on until each piece earned its place.
What I actually built
The project is a small, readable PyTorch codebase with three parts I implemented deliberately separately so I could poke at each in isolation:
- The forward process — the closed-form noising schedule that lets you jump to any timestep
tin one step, rather than looping. Getting the reparameterisation right is the whole trick. - The reverse denoiser — a compact U-Net that predicts the noise added at step
t, conditioned ontvia sinusoidal time embeddings. - The sampling loop — the iterative denoising that turns pure Gaussian noise into a sample, one small step at a time.
I logged intermediate samples at fixed timesteps so I could watch structure emerge from noise across training, which turned out to be the most useful debugging tool I had.
What surprised me
Two things. First, how much of the difficulty is bookkeeping — the variance schedules, the sqrt(alpha_bar) terms, keeping shapes and broadcasts honest. The conceptual leap is small; the place you actually lose hours is a sign error in the posterior. Second, how forgiving the objective is: predicting the noise (rather than the image) makes the loss a plain MSE, and that simplicity is most of why diffusion trains so stably compared to the GANs I'd fought with earlier.
What I'd do next
Three concrete extensions I scoped but haven't shipped: a DDIM sampler for far fewer denoising steps at inference, classifier-free guidance for conditional generation, and a move to latent-space diffusion to make higher resolutions tractable on a single GPU. They're the natural next rungs, and each maps to a paper I now feel equipped to implement directly.
What I took away
- Re-deriving the reverse process by hand made the noise-prediction objective intuitive in a way no amount of reading did.
- Most of the engineering pain is in the variance-schedule bookkeeping, not the concepts.
- Visualising intermediate denoising steps was the single best debugging tool.