Colored Noise Diffusion Sampling

Davidson, Hadar; Issachar, Noam; Benaim, Sagie

Colored Noise Diffusion Sampling

Hadar Davidson, Noam Issachar, Sagie Benaim

The Hebrew University of Jerusalem

TL;DR

Standard diffusion samplers blindly inject white noise - wasting their finite energy budget on frequencies already resolved. CNS dynamically redirects that energy to unresolved frequency bands, steering generation toward the true data manifold. A drop-in sampler swap, substantial FID reductions.

CNS overview — colored noise reallocation across frequency bands

Abstract

Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solvers fail to account for this dynamic, naively injecting uniform white noise throughout the entire process and misusing the finite energy budget.

In this work, we establish a mathematical framework that reconsiders SDE inference as a targeted, frequency-decoupled energy transfer. Leveraging this framework, we introduce Colored Noise Sampling (CNS), a novel, training-free stochastic solver. Rather than injecting uniform white noise, CNS utilizes a dynamic, timestep- and frequency-dependent schedule that more-efficiently allocates injected energy toward structurally unresolved frequency bands. By actively exploiting the model's inherent spectral bias, CNS systematically steers the generated distribution toward the true data manifold.

Extensive experiments demonstrate that CNS significantly outperforms standard ODE and SDE baselines as a strictly plug-and-play, inference-time sampler substitution across diverse architectures (SIT, JIT, FLUX). Compared to standard sampling on ImageNet-256, CNS achieves substantial unguided FID reductions, improving from 8.26 to 6.27 on SiT-XL/2, 32.39 to 26.69 on JiT-B/16, and 11.88 to 8.31 on JiT-H/16, while yielding consistent relative FID improvements with Classifier-Free Guidance.

Temporal Progression

$\gamma(f,t)$-Matrix. Visualizes how different frequency bands resolve over the diffusion process, demonstrating the model's inherent spectral bias.

Colored Noise PSD

Power Spectral Density of Colored Noises

Power Spectral Density. Different colored noise spectra. The spectra transition from high-frequencies dominant blue noise, through standard uniform white noise, to low-frequencies dominant red noise.

Method

1

Spectral Bias

Diffusion models resolve frequencies in order — global structure first, fine details last. We track each band's progress with $\gamma(f,t) \in [0,1]$: fully resolved when $\gamma \to 1$.

2

Fixed Energy Budget

SDE solvers operate with a bounded total injected noise energy. Standard white-noise injection spreads it uniformly — wasting budget on bands already built, creating a spectral gap.

3

Targeted Reallocation

CNS dynamically routes energy away from resolved bands into lagging ones. The schedule is strictly variance-preserving — no model retraining, no out-of-distribution states.

CNS Noise Scaling Schedule

$$ \beta_{f}(t) = \frac{\sqrt{1-\gamma_{f}(t)}}{\sqrt{\dfrac{1}{D}\sum_{f^{\prime}}(1-\gamma_{f^{\prime}}(t))}} $$

$\gamma_f(t)$ is the progress index of frequency band $f$ at timestep $t$. The denominator normalizes so total injected energy is conserved. Bands with higher structural deficit receive proportionally more energy.

How Noise Becomes Image Structure

Initial noise is not discarded — it maps into the final image. ODEs preserve it strongly; stochastic methods retain a significant portion.

Injected noise is not temporary perturbation — it actively shapes final features. CNS selectively routes this signal into higher frequency bands, where it is needed most.

The Spectral Gap

PSDs of generated distributions vs. ground truth ImageNet. Standard ODE sampling over-generates low-frequency structures and under-generates high-frequency details, while standard SDE sampling exhibits an energy deficit across the entire spectrum.

Signed log error relative to ground truth. By dynamically reallocating the injected noise budget, CNS tightly aligns the generated spectrum with the true data manifold, mitigating the spectral gap.

PSDs of generated distributions vs. ground truth ImageNet. Standard ODE sampling over-generates low-frequency structures and under-generates high-frequency details, while standard SDE sampling exhibits an energy deficit across the entire spectrum.

Signed log error relative to ground truth. By dynamically reallocating the injected noise budget, CNS tightly aligns the generated spectrum with the true data manifold, mitigating the spectral gap.

Results

Class-Conditional Generation — ImageNet-256

SiT-XL/2 (250 steps) and JiT-H/16 (50 steps), Euler solver. FID-50K. Best FID per group in bold.

Model	Guidance	Sampler	FID ↓
SiT-XL/2	None	ODE	14.39
		SDE	8.26
		CNS	6.27
	CFG ($w=1.5$)	ODE	2.15
		SDE	2.06
		CNS	1.98
JiT-H/16	None	ODE	12.41
		SDE	11.88
		CNS	8.31
	CFG ($w=2.2$)	ODE	3.92
		SDE	2.08
		CNS	2.03

Text-to-Image Generation — FLUX (DrawBench)

50 steps. CFG $w = 3.5$ (FLUX.1-dev) and $w = 4.0$ (FLUX.2-klein). CNS improves human preference, semantic alignment, and visual quality with no retraining.

Metrics: IR = ImageReward, CLIP = CLIPScore, Aes = AestheticScore (higher is better).

Model	Sampler	ImageRewardIR ↑	CLIPScoreCLIP ↑	AestheticAes ↑
FLUX.1-dev	ODE	0.965	0.681	5.787
	SDE	0.990	0.689	5.804
	CNS	1.012	0.693	5.812
FLUX.2-klein	ODE	0.984	0.735	5.233
	SDE	0.924	0.733	5.291
	CNS	1.005	0.735	5.295

FLUX.1-dev

Sampler	IR ↑	CLIP ↑	Aes ↑
ODE	0.965	0.681	5.787
SDE	0.990	0.689	5.804
CNS	1.012	0.693	5.812

FLUX.2-klein

Sampler	IR ↑	CLIP ↑	Aes ↑
ODE	0.984	0.735	5.233
SDE	0.924	0.733	5.291
CNS	1.005	0.735	5.295

Visual Results

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

ODE

SDE

CNS (Ours)

SiT-XL/2 on ImageNet-256 (with CFG). For each class: ODE (top), SDE (middle), CNS (Ours) (bottom).

BibTeX

@misc{davidson2026colorednoisediffusionsampling,
      title={Colored Noise Diffusion Sampling}, 
      author={Hadar Davidson and Noam Issachar and Sagie Benaim},
      year={2026},
      eprint={2605.30332},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.30332}, 
}