Colored Noise Diffusion Sampling

The Hebrew University of Jerusalem
TL;DR

Standard diffusion samplers blindly inject white noise - wasting their finite energy budget on frequencies already resolved. CNS dynamically redirects that energy to unresolved frequency bands, steering generation toward the true data manifold. A drop-in sampler swap, substantial FID reductions.

CNS overview — colored noise reallocation across frequency bands

Abstract

Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solvers fail to account for this dynamic, naively injecting uniform white noise throughout the entire process and misusing the finite energy budget.

In this work, we establish a mathematical framework that reconsiders SDE inference as a targeted, frequency-decoupled energy transfer. Leveraging this framework, we introduce Colored Noise Sampling (CNS), a novel, training-free stochastic solver. Rather than injecting uniform white noise, CNS utilizes a dynamic, timestep- and frequency-dependent schedule that more-efficiently allocates injected energy toward structurally unresolved frequency bands. By actively exploiting the model's inherent spectral bias, CNS systematically steers the generated distribution toward the true data manifold.

Extensive experiments demonstrate that CNS significantly outperforms standard ODE and SDE baselines as a strictly plug-and-play, inference-time sampler substitution across diverse architectures (SIT, JIT, FLUX). Compared to standard sampling on ImageNet-256, CNS achieves substantial unguided FID reductions, improving from 8.26 to 6.27 on SiT-XL/2, 32.39 to 26.69 on JiT-B/16, and 11.88 to 8.31 on JiT-H/16, while yielding consistent relative FID improvements with Classifier-Free Guidance.

Temporal Progression

Temporal progression of frequency bands

$\gamma(f,t)$-Matrix. Visualizes how different frequency bands resolve over the diffusion process, demonstrating the model's inherent spectral bias.

Colored Noise PSD

Power Spectral Density of Colored Noises

Power Spectral Density. Different colored noise spectra. The spectra transition from high-frequencies dominant blue noise, through standard uniform white noise, to low-frequencies dominant red noise.

Method

1

Spectral Bias

Diffusion models resolve frequencies in order — global structure first, fine details last. We track each band's progress with $\gamma(f,t) \in [0,1]$: fully resolved when $\gamma \to 1$.

2

Fixed Energy Budget

SDE solvers operate with a bounded total injected noise energy. Standard white-noise injection spreads it uniformly — wasting budget on bands already built, creating a spectral gap.

3

Targeted Reallocation

CNS dynamically routes energy away from resolved bands into lagging ones. The schedule is strictly variance-preserving — no model retraining, no out-of-distribution states.

CNS Noise Scaling Schedule
$$ \beta_{f}(t) = \frac{\sqrt{1-\gamma_{f}(t)}}{\sqrt{\dfrac{1}{D}\sum_{f^{\prime}}(1-\gamma_{f^{\prime}}(t))}} $$

$\gamma_f(t)$ is the progress index of frequency band $f$ at timestep $t$. The denominator normalizes so total injected energy is conserved. Bands with higher structural deficit receive proportionally more energy.

How Noise Becomes Image Structure

Initial Noise Persistence

Initial noise is not discarded — it maps into the final image. ODEs preserve it strongly; stochastic methods retain a significant portion.

Cumulative Injection Transfer

Injected noise is not temporary perturbation — it actively shapes final features. CNS selectively routes this signal into higher frequency bands, where it is needed most.

The Spectral Gap

The Spectral Gap Across Sampling Methods

PSDs of generated distributions vs. ground truth ImageNet. Standard ODE sampling over-generates low-frequency structures and under-generates high-frequency details, while standard SDE sampling exhibits an energy deficit across the entire spectrum.

Signed log error relative to ground truth. By dynamically reallocating the injected noise budget, CNS tightly aligns the generated spectrum with the true data manifold, mitigating the spectral gap.

Average Power Spectrum

PSDs of generated distributions vs. ground truth ImageNet. Standard ODE sampling over-generates low-frequency structures and under-generates high-frequency details, while standard SDE sampling exhibits an energy deficit across the entire spectrum.

Signed Log Error to GT

Signed log error relative to ground truth. By dynamically reallocating the injected noise budget, CNS tightly aligns the generated spectrum with the true data manifold, mitigating the spectral gap.

Results

Class-Conditional Generation — ImageNet-256

SiT-XL/2 (250 steps) and JiT-H/16 (50 steps), Euler solver. FID-50K. Best FID per group in bold.

Model Guidance Sampler FID ↓
SiT-XL/2 None ODE 14.39
SDE 8.26
CNS 6.27
CFG ($w=1.5$) ODE 2.15
SDE 2.06
CNS 1.98
JiT-H/16 None ODE 12.41
SDE 11.88
CNS 8.31
CFG ($w=2.2$) ODE 3.92
SDE 2.08
CNS 2.03

Text-to-Image Generation — FLUX (DrawBench)

50 steps. CFG $w = 3.5$ (FLUX.1-dev) and $w = 4.0$ (FLUX.2-klein). CNS improves human preference, semantic alignment, and visual quality with no retraining.

Metrics: IR = ImageReward, CLIP = CLIPScore, Aes = AestheticScore (higher is better).

Model Sampler ImageRewardIR CLIPScoreCLIP AestheticAes
FLUX.1-dev ODE 0.965 0.681 5.787
SDE 0.990 0.689 5.804
CNS 1.012 0.693 5.812
FLUX.2-klein ODE 0.984 0.735 5.233
SDE 0.924 0.733 5.291
CNS 1.005 0.735 5.295

FLUX.1-dev

SamplerIR ↑CLIP ↑Aes ↑
ODE0.9650.6815.787
SDE0.9900.6895.804
CNS1.0120.6935.812

FLUX.2-klein

SamplerIR ↑CLIP ↑Aes ↑
ODE0.9840.7355.233
SDE0.9240.7335.291
CNS1.0050.7355.295

Visual Results

SiT-XL/2 on ImageNet-256 (with CFG). For each class: ODE (top), SDE (middle), CNS (Ours) (bottom).

BibTeX

@misc{davidson2026colorednoisediffusionsampling,
      title={Colored Noise Diffusion Sampling}, 
      author={Hadar Davidson and Noam Issachar and Sagie Benaim},
      year={2026},
      eprint={2605.30332},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.30332}, 
}