Standard diffusion samplers blindly inject white noise - wasting their finite energy budget on frequencies already resolved. CNS dynamically redirects that energy to unresolved frequency bands, steering generation toward the true data manifold. A drop-in sampler swap, substantial FID reductions.
Abstract
Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solvers fail to account for this dynamic, naively injecting uniform white noise throughout the entire process and misusing the finite energy budget.
In this work, we establish a mathematical framework that reconsiders SDE inference as a targeted, frequency-decoupled energy transfer. Leveraging this framework, we introduce Colored Noise Sampling (CNS), a novel, training-free stochastic solver. Rather than injecting uniform white noise, CNS utilizes a dynamic, timestep- and frequency-dependent schedule that more-efficiently allocates injected energy toward structurally unresolved frequency bands. By actively exploiting the model's inherent spectral bias, CNS systematically steers the generated distribution toward the true data manifold.
Extensive experiments demonstrate that CNS significantly outperforms standard ODE and SDE baselines as a strictly plug-and-play, inference-time sampler substitution across diverse architectures (SIT, JIT, FLUX). Compared to standard sampling on ImageNet-256, CNS achieves substantial unguided FID reductions, improving from 8.26 to 6.27 on SiT-XL/2, 32.39 to 26.69 on JiT-B/16, and 11.88 to 8.31 on JiT-H/16, while yielding consistent relative FID improvements with Classifier-Free Guidance.
Temporal Progression
$\gamma(f,t)$-Matrix. Visualizes how different frequency bands resolve over the diffusion process, demonstrating the model's inherent spectral bias.
Colored Noise PSD
Power Spectral Density. Different colored noise spectra. The spectra transition from high-frequencies dominant blue noise, through standard uniform white noise, to low-frequencies dominant red noise.
Method
Spectral Bias
Diffusion models resolve frequencies in order — global structure first, fine details last. We track each band's progress with $\gamma(f,t) \in [0,1]$: fully resolved when $\gamma \to 1$.
Fixed Energy Budget
SDE solvers operate with a bounded total injected noise energy. Standard white-noise injection spreads it uniformly — wasting budget on bands already built, creating a spectral gap.
Targeted Reallocation
CNS dynamically routes energy away from resolved bands into lagging ones. The schedule is strictly variance-preserving — no model retraining, no out-of-distribution states.
$\gamma_f(t)$ is the progress index of frequency band $f$ at timestep $t$. The denominator normalizes so total injected energy is conserved. Bands with higher structural deficit receive proportionally more energy.
How Noise Becomes Image Structure
Initial noise is not discarded — it maps into the final image. ODEs preserve it strongly; stochastic methods retain a significant portion.
Injected noise is not temporary perturbation — it actively shapes final features. CNS selectively routes this signal into higher frequency bands, where it is needed most.
The Spectral Gap
PSDs of generated distributions vs. ground truth ImageNet. Standard ODE sampling over-generates low-frequency structures and under-generates high-frequency details, while standard SDE sampling exhibits an energy deficit across the entire spectrum.
Signed log error relative to ground truth. By dynamically reallocating the injected noise budget, CNS tightly aligns the generated spectrum with the true data manifold, mitigating the spectral gap.
PSDs of generated distributions vs. ground truth ImageNet. Standard ODE sampling over-generates low-frequency structures and under-generates high-frequency details, while standard SDE sampling exhibits an energy deficit across the entire spectrum.
Signed log error relative to ground truth. By dynamically reallocating the injected noise budget, CNS tightly aligns the generated spectrum with the true data manifold, mitigating the spectral gap.
Results
Class-Conditional Generation — ImageNet-256
SiT-XL/2 (250 steps) and JiT-H/16 (50 steps), Euler solver. FID-50K. Best FID per group in bold.
| Model | Guidance | Sampler | FID ↓ |
|---|---|---|---|
| SiT-XL/2 | None | ODE | 14.39 |
| SDE | 8.26 | ||
| CNS | 6.27 | ||
| CFG ($w=1.5$) | ODE | 2.15 | |
| SDE | 2.06 | ||
| CNS | 1.98 | ||
| JiT-H/16 | None | ODE | 12.41 |
| SDE | 11.88 | ||
| CNS | 8.31 | ||
| CFG ($w=2.2$) | ODE | 3.92 | |
| SDE | 2.08 | ||
| CNS | 2.03 |
Text-to-Image Generation — FLUX (DrawBench)
50 steps. CFG $w = 3.5$ (FLUX.1-dev) and $w = 4.0$ (FLUX.2-klein). CNS improves human preference, semantic alignment, and visual quality with no retraining.
Metrics: IR = ImageReward, CLIP = CLIPScore, Aes = AestheticScore (higher is better).
| Model | Sampler | ImageRewardIR ↑ | CLIPScoreCLIP ↑ | AestheticAes ↑ |
|---|---|---|---|---|
| FLUX.1-dev | ODE | 0.965 | 0.681 | 5.787 |
| SDE | 0.990 | 0.689 | 5.804 | |
| CNS | 1.012 | 0.693 | 5.812 | |
| FLUX.2-klein | ODE | 0.984 | 0.735 | 5.233 |
| SDE | 0.924 | 0.733 | 5.291 | |
| CNS | 1.005 | 0.735 | 5.295 |
FLUX.1-dev
| Sampler | IR ↑ | CLIP ↑ | Aes ↑ |
|---|---|---|---|
| ODE | 0.965 | 0.681 | 5.787 |
| SDE | 0.990 | 0.689 | 5.804 |
| CNS | 1.012 | 0.693 | 5.812 |
FLUX.2-klein
| Sampler | IR ↑ | CLIP ↑ | Aes ↑ |
|---|---|---|---|
| ODE | 0.984 | 0.735 | 5.233 |
| SDE | 0.924 | 0.733 | 5.291 |
| CNS | 1.005 | 0.735 | 5.295 |
Visual Results
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
ODE
SDE
CNS (Ours)
SiT-XL/2 on ImageNet-256 (with CFG). For each class: ODE (top), SDE (middle), CNS (Ours) (bottom).
BibTeX
@misc{davidson2026colorednoisediffusionsampling,
title={Colored Noise Diffusion Sampling},
author={Hadar Davidson and Noam Issachar and Sagie Benaim},
year={2026},
eprint={2605.30332},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.30332},
}