This note introduces the Bluestein algorithm, also known as the Chirp-Z transform, and outlines how it can be implemented efficiently on GPUs.

Algorithm idea

The discrete Fourier transform of a signal can be written as:

$$ X_k=\sum_{n=0}^{N-1}x(n)e^{-\frac{2i\pi}{N}nk} $$

The cross term $nk$ can be expanded through a square:

$$ (n-k)^2=n^2-2nk+k^2 => nk=-\frac{(n-k)^2-n^2-k^2}{2} $$

Substituting this into the original expression gives:

$$ e^{-\frac{2i\pi}{N}nk}=e^{\frac{2i\pi}{N}\frac{(n-k)^2-n^2-k^2}{2}}=e^{\frac{\pi i(n-k)^2}{N}}e^{-\frac{\pi in^2}{N}}e^{-\frac{\pi ik^2}{N}} $$

So the Fourier transform can be rewritten as:

$$ X_k=e^{-\frac{\pi i k^2}{N}}\sum_{n=0}^{N-1}{x(n)}e^{\frac{\pi i(n-k)^2}{N}}e^{-\frac{\pi in^2}{N}} $$

If we define $A_n=x(n)e^{\frac{\pi i n^2}{N}}$ and $B_n=e^{\frac{\pi in^2}{N}}$, then the transform becomes:

$$ X_k=B_n^{*}\sum_{n=0}^{N-1}A_nB_{n-k} $$

The last term is a convolution, which means that the DFT can be evaluated via convolution instead of direct summation.

Convolution theorem

Convolution is a standard operation in signal processing and image processing. The convolution theorem states that convolution in the time domain corresponds to multiplication in the frequency domain.

For two discrete sequences $a_n$ and $b_n$, with $\star$ denoting convolution, $F$ denoting the Fourier transform, and $F^{-1}$ the inverse transform, we have:

$$ (a_n \star b_n)=F^{-1}(F(a_n) \cdot F(b_n)) $$

This observation reduces the complexity from $O(N^2)$ to roughly $O(N\log N)$ when FFTs are used.

Zero padding

FFT-based convolution assumes periodic signals, so zero padding is needed to avoid circular convolution when the target is linear convolution. In ordinary convolution, the padded length is usually $N_1+N_2-1$. For the Bluestein algorithm, if the FFT length is $N$, the required length becomes $2N-1$. In practice we often round this up to the nearest power of two because radix-2 FFTs are more efficient.

Implementation steps

With the derivation above, the implementation becomes straightforward:

  1. Zero-pad the input sequence $A_n = x_ne^{-\frac{\pi j}{N}n^2}$.
  2. Zero-pad the chirp sequence $B_n=e^{\frac{\pi j}{N}n^2}$.
  3. Compute FFTs for both padded sequences.
  4. Multiply the transformed sequences elementwise.