This page describes how to perform some basic sound processing functions in Julia. We’ll be using the WAV.jl module to read a wav file, and the PlotlyJS module for plotting. These modules can be installed as follows:
Pkg.add("WAV")
Pkg.add("PlotlyJS")
once the modules are installed, we import them it with the following statement:
using WAV, PlotlyJS
Next we read in a wav file. You can download it here 440_sine.wav, it contains a complex tone with a 440 Hz fundamental frequency (F0) plus noise.
snd, sampFreq = wavread('440_sine.wav')
The wav file has two channels and 5060 sample points
> size(snd)
(5292, 2)
considering the sampling rate (sampFreq
= 44110) this corresponds to a duration of 120 ms
> 5292 / sampFreq
0.12
we’ll select and work with only one of the channels from now onwards
s1 = snd[:,1]
Unfortunately there is not an immediate way of listening to the sound directly from julia at the time I’m writing this post.
Plotting the Tone
A time representation of the sound can be obtained by plotting the pressure values against the time axis. However, we need to create an array containing the time points first:
timeArray = (0:(5292-1)) / sampFreq
timeArray = timeArray * 1000 #scale to milliseconds
now we can plot the tone
plot(scatter(;x=timeArray, y=s1),
Layout(xaxis_title="Time (ms)",
xaxis_zeroline=false,
xaxis_showline=true,
xaxis_mirror=true,
yaxis_title="Amplitude",
yaxis_zeroline=false,
yaxis_showline=true,
yaxis_mirror=true))
Plotting the Frequency Content
Another useful graphical representation is that of the frequency content, or spectrum of the tone. We can obtain the frequency spectrum of the sound using the fft
function, that implements a Fast Fourier Transform algorithm. We’ll follow closely the technical document available here to obtain the power spectrum of our sound.
n = length(s1)
p = fft(s1)
notice that compared to the technical document, we didn’t specify the number of points on which to take the fft
, the fft
is computed on the number of points of the signal n
. Since we’re not using a power of two the computation will be a bit slower, but for signals of this duration this is negligible.
nUniquePts = ceil(Int, (n+1)/2)
p = p[1:nUniquePts]
p = abs(p)
the fourier transform of the tone returned by the fft
function contains both magnitude and phase information and is given in a complex representation (i.e. returns complex numbers). By taking the absolute value of the fourier transform we get the information about the magnitude of the frequency components.
p = p / n #scale
p = p.^2 # square it
# odd nfft excludes Nyquist point
if n % 2 > 0
p[2:length(p)] = p[2:length(p)]*2 # we've got odd number of points fft
else
p[2: (length(p) -1)] = p[2: (length(p) -1)]*2 # we've got even number of points fft
end
freqArray = (0:(nUniquePts-1)) * (sampFreq / n)
plot(scatter(;x=freqArray/1000, y=10*log10(p)),
Layout(xaxis_title="Frequency (kHz)",
xaxis_zeroline=false,
xaxis_showline=true,
xaxis_mirror=true,
yaxis_title="Power (dB)",
yaxis_zeroline=false,
yaxis_showline=true,
yaxis_mirror=true))
The resulting plot can bee seen below, notice that we’re plotting the power in decibels by taking 10*log10(p)
, we’re also scaling the frequency array to kilohertz by dividing it by 1000.
To confirm that the value we have computed is indeed the power of the signal, we’ll also compute the root mean square (rms) of the signal. Loosely speaking the rms can be seen as a measure of the amplitude of a waveform. If you just took the average amplitude of a sinusoidal signal oscillating around zero, it would be zero since the negative parts would cancel out the positive parts. To get around this problem you can square the amplitude values before averaging, and then take the square root (notice that squaring also gives more weight to the extreme amplitude values):
>>> rms_val = sqrt(mean(s1.^2))
>>> rms_val
0.013828833374606244
since the rms is equal to the square root of the overall power of the signal, summing the power values calculated previously with the fft
over all frequencies and taking the square root of this sum should give a very similar value
>>> sqrt(sum(p))
0.013828833374606244
References
- Hartmann, W. M. (1997), Signals, Sound, and Sensation. New York: AIP Press
- Plack, C.J. (2005). The Sense of Hearing. New Jersey: Lawrence Erlbaum Associates.
- The MathWorks support. Technical notes 1702, available: https://web.archive.org/web/20120615002031/http://www.mathworks.com/support/tech-notes/1700/1702.html
Great little article. It helped me connect the dots between theory and implementation of the dft in julia. Cheers