Fourier Transform Coefficients Of Real Valued Audio Signals

2018-02-10 - By Robert Elder

This article is effectively an appendix to the article The Fast Meme Transform: Convert Audio Into Linux Commands. In this article, we will review various properties of the coefficients that result from applying the Discrete Fourier transform to a purely real signal. This discussion will place a particular emphasis on storing as little information as possible in order to recover the original waveform.

Waveform:

Phase Shift (in radians):

Data Points Per Cycle:

Animation Speed:

Add Random Imaginary Part Noise To Input:

Purely Real Input Waveform Elements 0...N-1:

Imaginary Frequency Components 0...[N/2]	Imaginary Frequency Components [N/2]+1...N-1
Real Frequency Components 0...[N/2]	Real Frequency Components [N/2]+1...n

Low Frequency Imaginary Time Domain Contribution 0...N-1	High Frequency Imaginary Time Domain Contribution 0...N-1
Low Frequency Real Time Domain Contribution 0...N-1	High Frequency Real Time Domain Contribution 0...N-1

Purely Real Output Waveform 0...N-1 (Imaginary Part Discarded):

What Does The Above Visualization Mean?

The above diagrams are intended to provide insights into how the coefficients of the discrete Fourier transform affect the resulting output signal. The graph at the very top, titled 'Purely Real Input Waveform Elements' represents a purely real time domain signal. This could be something like the amplitude of a sound wave, for example from a PCM wave format, or it could be the amplitude of a radio signal over time. In any case, the point here is that we're particularly interested in input signals where there is no imaginary part to the input.

Underneath the input sequence graph, there are four graphs that, together, describe all of the Fourier coefficients that result from taking the Fourier transform of the input sequence. Normally, these four graphs would be displayed joined together: Either into a separate chart for the imaginary and one for the real part, or simply as a single chart that only displays magnitude. In this article, they have been separated in order to examine various meaningful effects that result from ignoring certain subsections of the Fourier transform coefficients. This is what the 'Force To Zero' buttons are for.

More formally, if we consider the general formula for the discrete Fourier transform:

$X_k = \sum_{n=0}^{N-1} x_n \cdot e^{\frac{-i2{\pi}kn}{N}}, \{X_k\} \coloneqq X_0, X_1, ... X_{N-1}$

you can see that the chart titled 'Imaginary Frequency Components 0...[n/2]' consists of the following sequence:

$X_k = \operatorname{\mathbb{I}m}\{\sum_{n=0}^{N-1} x_n \cdot e^{\frac{-i2{\pi}kn}{N}}\}, \{X_k\} \coloneqq X_0, X_1, ... X_{\frac{N}{2}}$

and the chart titled 'Real Frequency Components 0...[n/2]' consists of the following sequence:

$X_k = \operatorname{\mathbb{R}e}\{\sum_{n=0}^{N-1} x_n \cdot e^{\frac{-i2{\pi}kn}{N}}\}, \{X_k\} \coloneqq X_0, X_1, ... X_{\frac{N}{2}}$

The other two charts 'Imaginary Frequency Components [n/2]+1...N-1' and 'Real Frequency Components [n/2]+1...N-1' show basically the same thing, but for the interval

$\{X_k\} \coloneqq X_{\frac{N}{2}+1}, X_{\frac{N}{2}+2}, ... X_{N-1}$

The next four charts clearly isolate the contribution of the low frequency ( $\{X_k\} \coloneqq X_{0}, X_{1}, ... X_{\frac{N}{2}}$ ) and high frequency ( $\{X_k\} \coloneqq X_{\frac{N}{2}+1}, X_{\frac{N}{2}+2}, ... X_{N-1}$ ) contributions to the time domain representation of the original function. Clicking the 'Force To Zero' button of either real or imaginary part of the low frequency coefficients will have effects on both the real and imaginary parts of the reconstructed time domain signal (because of the complex multiply in the inverse Fourier transform). Note that the four charts that describe the high and low frequency Fourier coefficients are of length N/2 +/-1, but the four charts that show the reconstructed time domain signal are of length N.

The chart 'Low Frequency Real Time Domain Contribution 0...N-1' shows the result of calculating the following sequence:

$x_k = \operatorname{\mathbb{R}e}\{\frac{1}{N} \cdot \sum_{n=0}^{\frac{N}{2}} X_n \cdot e^{\frac{i2{\pi}kn}{N}}\}, \{x_k\} \coloneqq x_0, x_1, ... x_{N-1}$

and similarly, the other three charts depict the high frequency real part contribution, the low frequency imaginary part, and the high frequency imaginary part.

The final graph is simply the sum of the previous two graphs that show the real part time domain contributions. When none of the 'Force To Zero' buttons have been pushed, this is simply the result you'd get after taking the inverse Fourier transform of the coefficients, which should recover the original input exactly. Since we indicated that we're only considering signals that are purely real, this inverse Fourier transform should only yield real results, but in cases where we force certain coefficients to zero, this might no longer be the case. In either case, the imaginary part of the output is not displayed here to save space.

One final note: The above visualization will never set to zero the coefficients for the 0 or N/2 frequency even though they appear in the interval depicted above. This was done to make the analysis of high and low frequencies more obvious since when N is an even number, the zero and N/2 frequency does not have a matching pair in the relation $X_{N-k} = X_{-k} = \overline{X_{k}}$ . If more time was available, the above visualization would have been improved to better represent the 0 or N/2 bin, since these are really special cases that don't contribute to the symmetry we wish to discuss in this article. This is explained in more detail later in this article.

The Symmetry Of Real Valued Signals

If a signal is purely real, it has some interesting symmetries. Specifically, the discrete Fourier transform of a purely real valued sequence will result in a sequence where N-2 of the values are simply complex conjugates of each other[1][2][3]:

$X_{N-k} = X_{-k} = \overline{X_{k}}$

See Hermitian function for more details.

This symmetry is interesting because it implies that we don't actually need to remember all of the Fourier coefficients of a real valued signal in order to perfectly re-construct it. In fact, as long as we save the coefficients from N = 0 to N/2, we could just use the above relationship to get the missing coefficients back and complete the inverse Fourier transform.

What If We Just Drop Coefficients From N/2+1 to N?

Let's consider what would happen if we wanted to be even lazier and completely drop the coefficients from N/2+1 to N instead of indirectly calculating them. Let's begin with the formula for the inverse Fourier transform:

$x_k = \frac{1}{N} \sum_{n=0}^{N-1} X_n \cdot e^{\frac{i2{\pi}kn}{N}}$

If we first consider values of N that are even, we can perform a trivial split on the inverse transform summation as follows:

${x_k = \frac{1}{N} [ \sum_{n=0}^{0} X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \sum_{n=1}^{\frac{N}{2}-1} X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \sum_{n=\frac{N}{2}}^{\frac{N}{2}} X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \sum_{m=\frac{N}{2}+1}^{N-1} X_m \cdot e^{\frac{i2{\pi}km}{N}}]}$

Note that in the above expansion, the sum interval between 1 and N/2 -1 represents the 'low frequencies' and the sum interval between N/2 +1 and N-1 represents the 'high frequencies' (aka negative frequencies). Keep an eye on these terms as we'll draw some conclusions about them later. Then, for clarity's sake, let's simplify the special cases when n=0 or N/2 and stuff them into the variable C:

$C = \sum_{n=0}^{0} X_n \cdot e^{\frac{i2{\pi}kn}{N}}+\sum_{n=\frac{N}{2}}^{\frac{N}{2}} X_n \cdot e^{\frac{i2{\pi}kn}{N}}\\=X_0 \cdot e^0+X_{\frac{N}{2}} \cdot e^{\frac{i2{\pi}kN}{2N}}\\=X_0+X_{\frac{N}{2}} \cdot [cos(2{\pi}k) + isin(2{\pi}k)]\\=X_0+X_{\frac{N}{2}} \cdot cos(2{\pi}k)$

We then have:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \sum_{m=\frac{N}{2}+1}^{N-1} X_m \cdot e^{\frac{i2{\pi}km}{N}}]}$

Then consider that we can express the last summation formula as a count down from N-1 instead of a count up to N-1 and perform the substitution m = N - n:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \sum_{n=1}^{\frac{N}{2}-1} X_{N-n} \cdot e^{\frac{i2{\pi}k(N-n)}{N}}]}$

Now, remember that $X_{N-k} = X_{-k} = \overline{X_{k}}$ for purely real signals:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \sum_{n=1}^{\frac{N}{2}-1} \overline{X_{n}} \cdot e^{\frac{i2{\pi}k(N-n)}{N}}]}$

And perform further simplifications noting that e raised to a power that is an integer multiple of $i2\pi$ will be one:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \sum_{n=1}^{\frac{N}{2}-1} \overline{X_{n}} \cdot e^{i2{\pi}k} \cdot e^{\frac{-i2{\pi}kn}{N}}]}$
${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} [ X_n \cdot e^{\frac{i2{\pi}kn}{N}} + \overline{X_{n}} \cdot e^{\frac{-i2{\pi}kn}{N}} ]]}$

If we then decompose $X_n$ and $\overline{X_n}$ into real and imaginary parts, plus the complex the exponential:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} [ [\mathbb{R}e(X_n) + i\mathbb{I}m(X_n)] \cdot [cos(\frac{2{\pi}kn}{N}) + isin(\frac{2{\pi}kn}{N})] + [\mathbb{R}e(X_n) - i\mathbb{I}m(X_n)] \cdot [cos(-\frac{2{\pi}kn}{N}) + isin(-\frac{2{\pi}kn}{N})]]]}$

If we then note that $cos(-x) = cos(x)$ and $sin(-x) = -sin(x)$ we get:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} [ 2 \cdot [ \mathbb{R}e(X_n) \cdot cos(\frac{2{\pi}kn}{N}) - \mathbb{I}m(X_n) \cdot sin(\frac{2{\pi}kn}{N}) ] + [\mathbb{R}e(X_n) \cdot sin(\frac{2{\pi}kn}{N}) - \mathbb{I}m(X_n) \cdot cos(\frac{2{\pi}kn}{N}) ] {\cdot} i - [\mathbb{R}e(X_n) \cdot sin(\frac{2{\pi}kn}{N}) - \mathbb{I}m(X_n) \cdot cos(\frac{2{\pi}kn}{N}) ] {\cdot} i]]}$

And after cancelling the imaginary parts, this simplifies to:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\frac{N}{2}-1} [ 2 \cdot [ \mathbb{R}e(X_n) \cdot cos(\frac{2{\pi}kn}{N}) - \mathbb{I}m(X_n) \cdot sin(\frac{2{\pi}kn}{N}) ] ]]}$

A similar analysis can be done when N is odd, but in this case the 'middle' frequency around N/2 now has a matching frequency through the $X_{N-k} = X_{-k} = \overline{X_{k}}$ relationship and it can be shown that the term C will be simply $C = X_0$ .

The resulting final simplification for N odd will be:

${x_k = \frac{1}{N} [ C + \sum_{n=1}^{\lfloor\frac{N}{2}\rfloor} [ 2 \cdot [ \mathbb{R}e(X_n) \cdot cos(\frac{2{\pi}kn}{N}) - \mathbb{I}m(X_n) \cdot sin(\frac{2{\pi}kn}{N}) ] ]]}$

Now that we've gone through all of the math, we should be able to review the calculations above and consider how the result would change if we simply didn't even include the 'high' frequencies (aka negative frequencies) in the calculation of the inverse Fourier transform of a purely real signal. It should be possible to make the following observations:

1) The amplitude of the real part will be half of what it would have been if the high (aka negative) frequency contribution had been kept.
2) There will be a lingering imaginary part that would have otherwise been cancelled with the time domain contribution of the high (aka negative) frequencies.

Fact 2) isn't really a problem if we know that we're reconstructing purely real signals since we know that we can safely throw away any imaginary part time domain contribution. Fact 1) isn't really a problem either because we know that the overall shape of the waveform will be preserved and that we can simply multiply by two to achieve the amplitude of the original waveform.

In conclusion, if we want to accurately re-construct the time domain signal from the Fourier coefficients of a purely real signal, we can do so as long as we keep all of the Fourier coefficients up to $\lceil \frac{N}{2} \rceil$ . Due to the symmetry $X_{N-k} = X_{-k} = \overline{X_{k}}$ , the higher (aka negative) frequencies contain redundant information and can be derived from the lower frequencies. Furthermore, we don't even need to include the calculated contribution of the higher (aka negative) frequencies in the inverse Fourier transform because their net contribution to the real part of the output is identical to the contribution from the lower frequencies. $X_0$ is a special case that must always be included, and $\frac{N}{2}$ must always be included when N is even.

Sampled Audio And Band-limiting

If a signal is band-limited then it can be sampled and fully recovered only when observing the Nyquist Criterion[4] This is relevant to our consideration of which coefficients are important when re-constructing audio examples when we consider that some of the most commonly encountered audio sample rates are around 44khz to 48khz. The highest frequencies that humans can hear are around 20 to 24khz[5], and applying the Nyquist sampling Criterion to this would require a sampling rate of around 40khz to 48hkz. Given this sampling rate, analysis of the audio samples would produce frequency bins representing frequencies in the 24khz to 48khz range, but given that these are above the range of human hearing anyway omitting them shouldn't cause any audible difference.

What If We Just Drop Imaginary Parts From Coefficients?

The real part of coefficients quantifies the 'evenness' of the signal, and the imaginary part quantifies the 'oddness' of the signal. In the context of analysing frequencies found in audio, if all of the samples we considered always consisted of even function (cos for example), then discarding the imaginary parts would have no effect, since the original signal could be completely expressed in terms of even functions. However, in reality, whatever audio interval you analyze likely consists of cos functions that can be shifted 90 degrees to produce a sin function which is odd! You can see this in the example diagram above if you set it to play a cos wave with changing phase shift and in addition use the 'Force To Zero' buttons on both imaginary parts. When the cos function is in-phase, you get exactly the same signal. As it shifts toward being 90 degrees out of phase, the re-constructed signal gets attenuated to zero. As you go past 90 degrees to 180 degrees it gets inverted. After 180 degrees it attenuates back again, then after 270 degrees becomes a cos wave again.

Therefore, to address the question again: What if you simply drop the imaginary part of your stored Fourier transform coefficients? In this case, whenever your signal has frequencies that are in-phase with a cosine function, they will be unmodified. Any frequencies that are out of phase with a perfect cosine wave will tend to be attenuated as they approach 90 degrees out of phase, and will become inverted as they approach 180 degrees out of phase. The human ear is not particularly sensitive to phase information in audio, so whenever the phase shift is 0 or 180, there shouldn't be a perceptible difference in the tone. As the phase shift approaches 90 or 270 degrees, there should be a noticeable decrease in volume until it becomes inaudible for that particular frequency. Furthermore, complex sounds are made up of many frequencies, so these effects also create the opportunity to cause complex constructive and destructive interference effects. This will obviously cause distortion and inaccurate re-production of the original signal, but in practice I have found it to produce audio which is actually still quite understandable. In this case, if changes are made in the frequency domain in an ad-hoc manner, further work will be needed to be done in order to prevent popping noises due to endpoint discontinuities. A method such as overlap add or overlap save can address this problem.

What Are Negative Frequencies In The Fourier Transform?

Throughout the literature on the subject and in this document, the frequency bins from N/2 to N-1 are commonly referred to as the 'negative' frequencies. This interpretation of these frequencies as being negative is accurate since you can simply plug them into the sin or cos function with a negative sign and everything will work out mathematically. It is also correct to consider them as though they were positive frequencies too. In reality, the Discrete Fourier transform can't tell the difference. If you take the discrete Fourier transform of a sequence of N values, the contribution of each bin $X_k$ represents the contribution of a signal that oscillates with a frequency of $k + x \cdot N$ times over the interval where x is an integer. Therefore, if you had $N = 5$ , the contributions from frequency bin 4 could indicate the presence of a signal that oscillates with a frequency of 4 times per interval, 9 times per interval, or even -1 times per interval. The significance of a negative frequency would depend on the context, but it could have a physical meaning like rotation of an object where forward frequencies represent clockwise rotation and negative frequencies represent counter-clockwise. You may have watched a video of a car driving at high speed where it looks like the wheels are spinning backwards even if the car is driving forward. In this case, the frame rate of the camera dictates the sample rate of the reference points we see on the wheel. If the frame rate was 24 fps (and the shutter speed was fast enough), it would be impossible to distinguish between pictures of a wheel rotating at 24hz, 240hz, or 24000hz intuitively. Therefore, you brain needs to pick one, and sometimes it guesses wrong. According to Wikipedia, this subject is called Wagon Wheel Effect

Below, is an animation that shows how this effect of ambiguity between frequencies can occur with discrete sampling:

Conclusion

In this article, we've reviewed a number of different aspects of the discrete Fourier transform coefficients and how they affect the reconstructed audio. It was found that for purely real signals, half of the information in the Fourier coefficients is redundant. In addition, it was also found that it is possible to avoid half of the calculation when performing the inverse Fourier transform of real signals. This assumes that you discard the unnecessary un-cancelled imaginary part, and in addition, it must be noted that the resulting signal will have half amplitude. Furthermore, if distortion can be tolerated, it is possible to simply ignore the imaginary parts of the coefficients since the effect on the reconstructed audio will be to cause phase shifts and volume changes.

As a final note, this article represents the limits of my knowledge on this subject, and if you find any mistakes you can email me as info@robertelder.org.

References

[1] http://northstar-www.dartmouth.edu/doc/idl/html_6.2/Displaying_FFT_Results.html

[2] https://www.dsprelated.com/freebooks/sasp/Symmetry_DTFT_Real_Signals.html

[3] https://en.wikipedia.org/wiki/Discrete_Fourier_transform#The_real-input_DFT

[4] https://www.maximintegrated.com/en/app-notes/index.mvp/id/3628

[5] https://en.wikipedia.org/wiki/Hearing_range

Overlap Add, Overlap Save Visual Explanation Published 2018-02-10	Buy Now -> Alan Turing, Ada Lovelace, Kurt Gödel Portaits	Using Fourier Transforms To Multiply Numbers - Interactive Examples Published 2019-01-10	Endpoint Discontinuities and Spectral Leakage Published 2018-02-10
The Regular Expression Visualizer, Simulator & Cross-Compiler Tool Published 2020-07-09	Why Is It so Hard to Detect Keyup Event on Linux? Published 2019-01-10	Myers Diff Algorithm - Code & Interactive Visualization Published 2017-06-07	Interfaces - The Most Important Software Engineering Concept Published 2016-02-01

Why Bother Subscribing?

Free Software/Engineering Content. I publish all of my educational content publicly for free so everybody can make use of it. Why bother signing up for a paid 'course', when you can just sign up for this email list?
Read about cool new products that I'm building. How do I make money? Glad you asked! You'll get some emails with examples of things that I sell. You might even get some business ideas of your own :)
People actually like this email list. I know that sounds crazy, because who actually subscribes to email lists these days, right? Well, some do, and if you end up not liking it, I give you permission to unsubscribe and mark it as spam.