Understanding Diffusion Models for Generative AI | Generated by AI

Home 2025.10

A diffusion model is a type of generative AI model that creates new data (like images or videos) by gradually refining random noise into a structured output. Here’s how it works in simple terms:


Core Idea: Noise to Data


Key Steps in a Diffusion Model

  1. Forward Process (Adding Noise)
    • A real image is gradually corrupted by adding Gaussian noise over many steps.
    • This creates a sequence of increasingly noisy versions of the original image.
  2. Reverse Process (Generating Data)
    • The model learns to reverse this process: starting from pure noise, it predicts and removes noise step-by-step.
    • At each step, the model uses a neural network (often a U-Net or transformer) to estimate and remove the noise, slowly revealing the final image or video.
  3. Conditioning (Optional)
    • The process can be guided by text prompts, class labels, or other inputs to ensure the output matches the user’s request.

Why Diffusion Models?


Example: Image Generation


Diffusion Models in Sora

In Sora, the diffusion model is paired with a transformer to handle both the high-level structure (transformer) and fine visual details (diffusion), enabling the generation of complex, high-quality videos.

Would you like a deeper explanation of any specific part, such as the math behind the noise scheduling or how conditioning works?


Back

mistralai/mistral-medium-3.1

Donate