[JPG image artwork that is a companion to the text]

Sound Resampling



introduction

We saw in the section called Real-Instrument Synthesis that there are fundamentally three ways to make sounds like a real musical instrument, and the easiest way is resampling. This is the standard synthesis technique in digital synthesizers.

You could easily write your own C code to do resampling. After you read a sound file into the computer's memory the sound will be an array of numbers that each represent the amplitude of the waveform at successive moments in time. If you want to convert the sound to a lower pitch, then you add more samples by interpolating between the array numbers you already have (so that each wave cycle now takes longer to finish). If you want to convert the sound to a higher pitch you skip samples.

You don't have to write your own C code. There are free software programs available to help you do this. I recommend using Csound as the best overall way to do synthesis on a computer. Csound already has the necessary tools for resampling such as reading the wave file into a table, rescaling the pitch and looping. I will show you a resampling example in Csound. But first we will use some basic programming, such as C code and shell scripts for studying the basic techniques for resampling.



reading the sound file

If you are using Csound, a method for reading the sound file is provided as one of its tools, which we will see later. If you are writing your own code and need to read one of the standard sound file formats such as WAV, there are at least three ways: (1) you can learn the binary file format, (2) you can download a sound file C library such as libsndfile and link it into your C code project, or (3) you can use a sound-processing program such as Sox or Ecasound to convert the sound format into a raw data file, then read the raw data with your C code.

In this tutorial, we will use the tools provided by the Scilab program and by Csound to read a sound file.



interpolating between samples

You have to do interpolation or else you can never change the pitch of the sound sample. Interpolation is like guessing. Since there are no samples between your sound file samples, there are no real or actual values to find. You are putting in filler such that your brain doesn't know the difference when you hear it. The best guess and the most efficient computation is to imagine a straight line between each successive sound sample: this is called linear interpolation.

Linear interpolation is a procedure in which you find the equation for a line using two sound samples for your input data, that is, the line that connects your two sound samples. This subject is taught in high school algebra or in college at the entry level. In this tutorial, we just need to see the procedure. After you have an equation for the line, you use that equation to find the wave amplitude at the time between samples. With each new sample that you take from the sound array you must calculate a new line equation to the next sample.

We don't need to write interpolation code for this tutorial because we already have an awesome computer program, Csound, that will do it for us. But we will feel better knowing what happens inside the interpolation, and we will understand the Csound documentation better.

Imagine that the sound samples are arranged on a graph like the following example plot created with Scilab commands shown below. We take a section from a sound waveform, actually just a sine wave, but fairly representitive of an actual sound wave at 440 cycles per second.

// sample rate
Fs = 44100;

// frequency
f = 440;

// the 3 sample points
N = 22073:22075;

// all the points in the curve
N2 = 22000:22100;

x = 2 * %pi * (f / Fs) * N;

tx = N / Fs;

x2 = 2 * %pi * (f / Fs) * N2;

tx2 = N2 / Fs;

y = sin(x);

// the whole curve
y2 = sin(x2);

//
// save the 3 sample points to a file
//
fprintfMat('wavint2.dat', [tx', y'], '%f');

[JPG image artwork that is a companion to the text]

This is a plot of the amplitude of the wave versus time. Now let's zoom into the waveform so that we can see just three samples. The following picture shows just three points taken from near the top of the wave peak.

[JPG image artwork that is a companion to the text]

The three points are at

x1 = 0.500522, y1 = 0.991699
x2 = 0.500544, y2 = 0.997806
x3 = 0.500567, y3 = 0.999994

Let's say we want to replay this waveform at a slightly lower pitch, say 220 cycles per second. This is exactly half the frequency, and so the samples from this file should be output to the sound card at have the speed. But the sound card expects 44100 samples every second. So you still have to give the sound card a sample in between each of the samples from the file.

With linear interpolation we estimate what the sound sample amplitude should be in between points 1 and 2, and in between points 2 and 3. This is a lesson from algebra. With points 1 and 2 we will create the equation of a line,

f(x) = m * x + b

where

y1 = f(x1)
y2 = f(x2)

we already know where the middle x value is at: it is half way between 1 and 2.

x = x1 + (x2 - x1)/2

Then, after we know what the line equation, f(x), is, we will plug x into the function. In the equation for a line above, m is the slope of the line: the "rise over the run", that is, the change in y over the change in x.

m = (y2 - y1) / (x2 - x1)
= (0.997806 - 0.991699) / (0.500544 - 0.500522)
= 277.59091

Be careful to remember that the values of x here are the actual time in seconds, not the phase angle of the sine function.

Now solve for b in f(x) = mx + b by matching this equation, using the value of m that we found, at the known data point x1 = 0.500522, y1 = 0.991699.

y1 = f(x1)
y1 = m * x1 + b
0.991699 = 277.59091 * 0.500522 + b

Now solving for the value of b, we get

b = - 137.94866

So, now we can estimate any of the points between 1 and 2 using the equation of the line

f(x) = 277.59091 * x - 137.94866

These steps must be repeated between all the data points from the file. The C code in the file resamp.c shows how this is actually done. This is just for illustration. If you really want to do this the easy way, then use the Sox program using the speed option. And for real resampling work I would recommend that you just use the Csound program. The resampling in this tutorial will be done using Csound.



looping the middle part

It should be emphasized that looping is not going to work for drum sounds or anything that is not a held note. An electric guitar with sustain would be better. The note played on a clarinette or trumpet would be better. If there is any time variation of the timbre during the note, then this will be heard during the looping, and it will sound like looping. For notes that contain time variation we must use a frequency decomposition, additive synthesis technique if not just physical modeling, as we saw on the Real-Instrument Synthesis page.

Let us take a real sound file downloaded from the University of Iowa Electronic Music Studios Musical Instrument Samples website, using an alto saxophone sample, AltoSax.NoVib.mf.C4B4.aiff. This is a recording of the notes between C4 and B4. The first step is to find a section with the least time variation. I choose the fifth note, between 22 and 24 seconds into the sample. I use the Sox program to cut out the piece that I want:

sox AltoSax.NoVib.mf.C4B4.aiff asn5a.wav trim 22.0 2.7

and create the file asn5a.wav, 2.7 seconds long. Now I will cut this file to get the section with no sound variation. To see what this waveform looks like I will convert the sound file into data, then plot the waveform on a graph. To get the data I use the Sox program:

sox asn5a.wav asn5a.dat

which creates the ASCII file asn5a.dat. This ASCII data file must be edited before it can be plotted. The first two lines must be deleted. Plotting this with the GNUPLOT program,

plot 'asn5a.dat' with lines

results in the following graph.

[JPG image artwork that is a companion to the text]

This is not too bad. There is a little bit of time variation: the player gradually decreased the sound amplitude over the duration of the note.

Now let's try zomming a section of this graph between 1.14923 seconds and 1.22844 seconds (to zoom in GNUPLOT, click and hold the right mouse button and draw a box; to unzoom, press the p key).

[JPG image artwork that is a companion to the text]

This waveform looks fairly uniform, although there is some variation happening with the harmonics. When you cut out a piece of sound, you want to cut it at the beginning and end of a cycle. If you don't, the result will sound like an obvious loop. This rule applies to the harmonics, as well. But you can't see the cycles in the harmonics as clearly as you can with the fundamental wave.

To take the first point on the first rising slope of this graph you can just look at the ASCII file, asn5a.dat, and find the first positive amplitude value (in the second column) at or around the time of 1.15167 seconds. Scroll down the file, and you'll find that this is line number (sample number) 50783 (we are numbering the samples counting from the number 1, not from 0). Double checking that the sample increments correspond to the correct time, take 50783 and divide by the sample rate, 44100:

50783 / 44100 samples per second = 1.15154 seconds

Now, at the other end of the graph, find the last sample on the rising edge but just below the zero line. This is near the time of 1.22709 seconds. Scroll down the data file to line number 54116.

This cut, from sample number 50783 to sample number 54116, will be our first trial in a lengthy effort of trial and error. As I said above, despite the fact that we have chosen as close as possible the beginning and end of the natural sinusoidal variation of the waveform, this is not necessarily where the harmonics are beginning and ending.

We can use use this cut in several ways. We can create three separate sound files and loop the middle file by manually repeating that section, creating a final sound file that is longer than the original. Or we can use a program such as Csound that uses the original file in one piece and does the looping for us. The second method is the obvious choice since it is less work. The primary method in this tutorial will be to use Csound. However, you should see that it is not very hard to do this just using Sox and the standard tools on your Linux (or other unix-like) PC.

Now we have three ASCII files that were derived from the first cut, asn5a.dat.

asn5a1.dat (samples 1 to 50782)
asn5a2.dat (samples 50783 to 54116)
asn5a3.dat (samples 54117 to 119068)

These files can be concatenated back into one, long ASCII file at the command prompt with the following command:

cat asn5a1.dat asn5a2.dat asn5a3.dat > asn5a_123.dat

Then you need to add back the two-line header from the Sox program:

; Sample Rate 44100
; Channels 1

Now the conversion back to WAV format can be done using the Sox program by typing the following command:

sox asn5a_123.dat asn5a_123.wav

The above steps contain no looping. To get some simple looping, we can redo the cat command but repeat the middle file. There is one little detail to consider before we do that. The first column in these files is the time in seconds at which the sample occurs. If we simply repeat the middle section the same span of time will be repeated, and the Sox program will not understand it. To make the time come out right, for each extra middle section that is added we must shift the time of each sample for every file that comes after that. When the final file is created there should be no gaps or repeats in the time column. The last time in the middle file, asn5a2.dat, is 1.2270975. So, if we add an extra middle section right after it we must restart counting the time at 1.2270975 seconds. That's not hard to do since we know that each time increment is just the reciprocal of the sample rate,

1 / 44100 = 0.000023 seconds

The major thrust of this tutorial is learning looping and doing it the easy way with Csound. However, it can be done with the simple unix (Linux) command tools we already have. The time shifting and file concatenation explained above can be done with the following commands, without help from Csound.


    #
    #   looping script
    #

    #
    # beginning -- this just copies the first file
    # line-by-line into the target file
    #
    cat asn5a1.dat | awk '{tval=$1; aval=$2; print tval,aval;} \
        END {print tval > "tval.dat"}' > asn5a_loop.dat

    #
    # middle section -- repeat this line as often as needed
    #
    export TVAL=`cat tval.dat`
    cat asn5a2.dat | awk --assign tval=$TVAL 'BEGIN {tinc= 1/44100;} \
        {tval += tinc; aval=$2; print tval,aval;} END {print tval > \
        "tval.dat"}'  >> asn5a_loop.dat

    #
    # end section
    #
    export TVAL=`cat tval.dat`
    cat asn5a3.dat | awk --assign tval=$TVAL 'BEGIN {tinc= 1/44100;} \
        {tval += tinc; aval=$2; print tval,aval;} END {print tval > \
        "tval.dat"}' >> asn5a_loop.dat

        

You may look aghast upon the above lines (you need to learn how to use a computer), but this really quite simple. The files are each redirected to the AWK program that comes as a standard tool on all unix-like systems such as Linux. There are AWK commands in the

    'BEGIN {     }  {     } END {     }'
            

part. We pass a variable to AWK using the

    --assign tval=$TVAL
            

part. The output is then send to the same file called asn5a_loop.dat. After you enter the first two lines (the biginning and middle), the middle section can be repeated as often as you like to make a longer and longer file by using the up arrow key to repeat the command. Then enter the last command (the end). The final data file, asn5a_loop.dat, can now be converted into a WAV sound file according to the steps given above.

The result after about ten repetitions of the middle section can be heard in asn5a_loop.wav. Compare it to the original cut, asn5a.wav. This was an easy example because the original sample file was recorded for the purpose of scientific investigation.



csound resampling

The looping program of choice is Csound, and for Csound we don't even need to cut the original file into three pieces: Csound will do that work for us. If we had started with a stereo sound file, the above procedure using the text editor and Sox would have been a lot harder. With a stereo file, the looping points for each channel will not necessarily occur at the same time. But Csound has the capability to handle stereo files. The real reason for using Csound is not just because it is easier to perform resampling, but because Csound is a complete music composition program as well as a signal processing tool.

In the following example we will see how to loop the file asn5a.wav using Csound. The Csound example that I create here is based on the reference in the online Csound manual, so you can read there for further help. I have no affiliation with the Csound project: I am just one person who has used Csound. They have never heard of me, so you should not send email to any of them with complaints about this tutorial.

The Csound program creates a sound waveform. It can play the waveform directly to your sound card or it can create a sound file, such as a WAV file. We saw in earlier tutorials, such as creating a sound file , how to create a sine wave and write to a file. Think of the wave plotted on a graph with the time along the horizontal axis and the wave amplitude on the vertical axis. For each time we created a wave amplitude by simply plugging the time (after converting it to a phase angle) into the sine function. We performed additive synthesis by creating individual sine waves of different frequencies and different amplitudes, then added them together. But we could have done that a different way: we could have added the waves together point by point instead of creating each one separately and adding. That is, we could have said, let

f(x) = A1*sin(x1) + A2*sin(x2) + A3*sin(x3)
x = 2 * pi * N / Fs
x1 = x * f1
x2 = x * f2
x3 = x * f3

and now, for each instance in time (meaning each value of x) we calculate

y = f(x)

The entire file is created sample-by-sample this way:

    sample 1:    y1  =  f(x1)
    sample 2:    y2  =  f(x2)
    sample 3:    y3  =  f(x3)
    sample 4:    y4  =  f(x4)
    sample 5:    y5  =  f(x5)
        .        .        .
        .        .        .
        .        .        .
    

This is how sound, and hence music, is created with Csound. With Csound you create your own version of the f(x) with an instrument that you write (you become a programmer) and insert into its orchestra file. You can then create other instrument functions and tell Csound to use them at different times with its score file. In later versions of Csound it is now possible to combine these two files into one, but I will show you the simpler way.

We need to know what pitch, that is what note, was played so that we can change the key with Csound. The original sound file contained notes between C4 and B4, and we cut out the fifth note: E4. What frequency is E4? There is a standard numbering system in use today for orchestral pitches and MIDI notes. Middle C is referenced as C5 for MIDI instruments. I have seen middle C commonly referenced as C4 in the traditional music theory sources. However, "middle C" has a frequency of about 261.63 Hz whether or not it is denoted as 3, 4, or 5. Since we already have a data file, asn5a.dat, containing the sound waveform, we can look and see for ourselves the pitch. If we zoom on one complete cycle of the waveform we get the following plot:

[JPG image artwork that is a companion to the text]

If we measure the time interval on the plot between the first peak and the second peak we will obtain a frequency ( reciprocal of the time interval ) of 330.25 Hz. This is the E note just above middle C. So now we know that the University of Iowa musical instrument samples count middle C as C4. I will distinguish this note naming as the "concert note" name, rather than the "midi note" name.

As a basis for conversion, the following table might be helpful:

Name Frequency Concert Note Midi Note MIDI Number
middle C 261.63 Hz C4 C5 60

What frequency should the E note above middle C be? This can be calculated. An octave is a factor of 2, so the C above middle C will have a frequency of

261.63 * 2 = 523.26

or, 523.26 Hz. There are 12 half tones in an octave, so you increase the frequency by one half tone by multiplying by 2 raised to the one-twelvth power:

    2  =  2^(1/12) * 2^(1/12) *  . . .  (12 times) * 2^(1/12)
    

Two raised to the one-twelvth power is 1.059463. If we multiply any pitch frequency by 1.059463 we will raise the pitch by one half tone. If we multiply middle C, 261.63 Hz, four times in a row by 1.059463 we will raise its pitch to E4. The answer is 329.63 Hz. This is the pitch we use in our Csound instrument as the base frequency of the note.

The Csound instrument file, asn5a.orc, for looping the file asn5a.wav is given below:

    ; asn5a.orc
    ;
    ; derived from the example for loscil in the Csound manual

    sr = 44100
    kr = 4410
    ksmps = 10
    nchnls = 1


    ; Instrument #1.
    instr 1
    kamp = 5000

    ; If you don't know the frequency of your audio file,
    ; set both the kcps and ibas parameters equal to 1.

    kcps = 329.63    ; the resampled pitch
    ifn = 1          ; function table "f1" in the score file
    ibas = 329.63    ; E4

    istart = 50782   ; 1 less than the 50783 we used before
    iend = 54116


    a1 loscil kamp, kcps, ifn, ibas,  1,  istart, iend
    out a1
    endin
    ;
    ; end asn5a.orc
    

This is quite simple. The heart of the instrument function is the function call to loscil. The value a1 is the amplitude of the output waveform at any given time. We needed to calculate the fundamental frequency of the sound recording in order to tell Csound, or rather, tell the loscil function what the base frequency of the note is. This is the parameter ibas. But we can "play" this note back at any pitch we like using the resampling capability of Csound. This is why we use the parameter kcps above. We could play back this note as a concert A, 440 Hz, by setting the kcps parameter to 440.

In order to play this instrument, instr 1, at a certain time and for a certain duration, we need one more file: the score file. The following score file, asn5a.sco, is all we need for this example.

    ; asn5a.sco

    ; wave file input
    f1    0   0   1   "asn5a.wav"  0   0   0

    ; score lines
    i1  0.0  5.0

    e
    ;
    ; end asn5a.sco
    

In the score file listing above, the line that starts with i1 is the score section. It means that instr 1 (i1) will start at time 0.0 seconds and be played for a duration of 5.0 seconds.

The last step is to create the sound file. Use the folloing command to create a WAV file, asn5a_csound.wav.

csound -d -o asn5a_csound.wav -W asn5a.orc asn5a.sco


looping considerations

We have created two looping files using two different techniques: (1) manually, and (2) using Csound. These files sound good except for one thing. They contain looping noise. If you increase the amplitude to the maximum you will hear a low fluttering in the background. You can try to remove this looping noise with digital filtering, but you will expend much less work if you just try to find a different looping location in the file. We have already used our best guess by visually examining the waveform and picking the most uniform location, and so further visual inspection will not necessarily be productive.

To understand why the noise is there you need to think about the waveform as the sum of multiple sinusoidal waves. This is the subject of frequency analysis, that is fourier analysis, whose burning sands we will eventually have to cross, but not yet. In the above example we chose a section of the waveform that started rising from zero and ended returning to zero from below. The strategie was to pick a section that formed a "complete" sinusoid. It didn't really need to start and end at zero as long as it ends in a place that corresponds to the starting place in terms of the sinusoid's phase angle. In a nutshell, we were trying not to create a skip in the waveform. However, we could not avoid getting skips in the harmonics, that is, the component frequencies. You don't need to understand frequencies to realize that a sound played from an instrument is a living entity that is flowing and subtly changing organically: the volume is changing, the timbre is evolving. The harmonics all combine to produce these effects. To cut out a section of this evolving entity is to necessarily interrupt a natural process.

The least looping noise will result by selecting the part of the waveform that looks the most simple. A more ambitious technique would be to compute the frequency spectrum of each candidate section with an FFT program, and from this calculate the sum of the squared frequency components. The section with the smallest sum-squared number should be the best for looping.



summary

We have seen how to do resampling in its simplest form. There is a beginning, middle and end to the sample. We try to replay the sample with different time durations by looping the middle part. We try to replay the sample with different pitches by sampling faster or slower and interpolating between the original sample points. We were introduced to several sound processing tools which can help up get a resampled sound file:

  • Sox
    • a command-line sound file editor
    • converts between sound file formats
    • can create data files that can be plotted from the sound file
  • Gnuplot
    • a powerful, stand-along plotting program
  • Csound
    • a complete music composition and signal processing program
    • has built-in looping capability
    • has built-in pitch scaling capability



Music Synthesis and Physics Home
[JPG image artwork that is a companion to the text]
© Alfred Steffens Jr., 2007