This note introduces the Bluestein algorithm, also known as the Chirp-Z transform, and outlines how it can be implemented efficiently on GPUs.
Algorithm idea
The discrete Fourier transform of a signal can be written as:
$$ X_k=\sum_{n=0}^{N-1}x(n)e^{-\frac{2i\pi}{N}nk} $$
The cross term $nk$ can be expanded through a square:
$$ (n-k)^2=n^2-2nk+k^2 => nk=-\frac{(n-k)^2-n^2-k^2}{2} $$
Substituting this into the original expression gives:
$$ e^{-\frac{2i\pi}{N}nk}=e^{\frac{2i\pi}{N}\frac{(n-k)^2-n^2-k^2}{2}}=e^{\frac{\pi i(n-k)^2}{N}}e^{-\frac{\pi in^2}{N}}e^{-\frac{\pi ik^2}{N}} $$
So the Fourier transform can be rewritten as:
$$ X_k=e^{-\frac{\pi i k^2}{N}}\sum_{n=0}^{N-1}{x(n)}e^{\frac{\pi i(n-k)^2}{N}}e^{-\frac{\pi in^2}{N}} $$
If we define $A_n=x(n)e^{\frac{\pi i n^2}{N}}$ and $B_n=e^{\frac{\pi in^2}{N}}$, then the transform becomes:
$$ X_k=B_n^{*}\sum_{n=0}^{N-1}A_nB_{n-k} $$
The last term is a convolution, which means that the DFT can be evaluated via convolution instead of direct summation.
Convolution theorem
Convolution is a standard operation in signal processing and image processing. The convolution theorem states that convolution in the time domain corresponds to multiplication in the frequency domain.
For two discrete sequences $a_n$ and $b_n$, with $\star$ denoting convolution, $F$ denoting the Fourier transform, and $F^{-1}$ the inverse transform, we have:
$$ (a_n \star b_n)=F^{-1}(F(a_n) \cdot F(b_n)) $$
This observation reduces the complexity from $O(N^2)$ to roughly $O(N\log N)$ when FFTs are used.
Zero padding
FFT-based convolution assumes periodic signals, so zero padding is needed to avoid circular convolution when the target is linear convolution. In ordinary convolution, the padded length is usually $N_1+N_2-1$. For the Bluestein algorithm, if the FFT length is $N$, the required length becomes $2N-1$. In practice we often round this up to the nearest power of two because radix-2 FFTs are more efficient.
Implementation steps
With the derivation above, the implementation becomes straightforward:
- Zero-pad the input sequence $A_n = x_ne^{-\frac{\pi j}{N}n^2}$.
- Zero-pad the chirp sequence $B_n=e^{\frac{\pi j}{N}n^2}$.
- Compute FFTs for both padded sequences.
- Multiply the transformed sequences elementwise.