Audio Processor

Overview

Audio processing attempts to enhance certain features in audio recordings and suppress others. An audio processor evaluates formulas that operate on audio recordings and produce new audio recordings. It does this by reading audio recordings, converting them to an internal representation referred to here as an audio signal, applying a formula that combines basic operations to these audio signals to produce a new audio signal, and converting the resulting signal to an audio recording. This document describes the basic operations, formulas, and audio recordings supported by an audio processor.

Audio Recordings and Audio Signals

WAVE files in uncompressed, pulse-code modulated (PCM/uncompressed) format provide the audio recordings that the audio calculator will deal with. The file specifies the number of samples, number of channels, sample rate, and resolution (bits per sample) of the recording. The number of channels will be either one or two, and the file will have only two chunks: a format chunk and a data chunk.

 

The audio calculator can reject recordings with more than two channels or extra chunks. By convention, the first sample in the recording is the sample for time zero, and subsequent samples are at successive multiples of the sample rate.

The audio calculator will include an operation for reading WAVE files to produce an internal representation of the information in a recording. This document refers to the entity delivered by reading a WAVE file as an audio signal.  The audio calculator will also include an operator to convert audio signals to recordings (that is, to write audio signals in the form of WAVE files). The operators in the binary-io-utilities teachpack may be used to read or write the byte-sequence that comprises a file in WAVE format.

The internal structure used to represent audio signals is up to the designers of the audio calculator. The only restriction is that the amplitudes in the recordings (that is, the values recorded for the channels in samples) must be represented in the audio signal by rational numbers.

Basic Operators

delay

Operands

1.      d - a rational number

2.      x  - an audio signal

Result

 r - an audio signal with r(c, t) = x(c, t - d)

cut

Operands

1.      d - a positive, rational number

2.      e - a positive, rational number

2.      x  - an audio signal

Result

r - an audio signal like x, but with d seconds cut from the beginning and e seconds cut from the end

boost

Operands

1.      b - a positive, rational number

2.      x  - an audio signal

Result

r - an audio signal with  r(c, t) = b*x(c, t)

overdub

Operands

1.    x  - an audio signal

2.    y  - an audio signal

Result

 r - an audio signal with  r(c, t) = x(c, t) + y(c, t)

fade-in

Operands

1.      f - a positive rational number

2.      x  - an audio signal

Result

r - an audio signal like x, but with amplitudes scaled by a ramp that goes from 0% to 100% between time zero and time f

fade-out

Operands

1.      t - a positive rational number

2.      x  - an audio signal

Result

r - an audio signal like x, but with amplitudes scaled by a ramp that goes from 100% to 0% between time (e - f) and time e, where e is the time of the last sample in the audio signal.

fuzz

Operands

1.      h - a rational number between zero and one

2.      x  - an audio signal

Result

r - an audio signal with  r(c, t) = max(-f, min(f, x(c, t)), where f = h*g, g = amplitude of greatest magnitude in x

chipmunk

Operands

1.      p - a rational number

2.      x  - an audio signal

Result

r - an audio signal like x, but playing at a rate that is p times faster (or slower if p is less than one)

filter

Operands

1.    w - a sequence of rational 2k+1 numbers, w(-k), w(-k+1), ... w(0), w(1), ... w(k), where k is a positive integer

2.   x - an audio signal

Result

r - an audio signal with  r(c, t) = sum{w(j)*x(c, t+s*j) :  j = -k, -k+1, ... 0, 1, ... k}, where s is the sample rate of x

Note on filtering: x(c, k/s), k = 0, 1, 2, ... is the sequence of amplitudes specified in the wave file for channel c. So, x(c, t+j/s),  j = -n, -n+1, ... 0, 1, ... n is a sequence of adjacent amplitudes from that wave file (or zeros for silence if t+j/s is outside the range of times for which the wave file specifies amplitudes of the wave form). The filter w is a window of coefficients that slides across the amplitude sequence and to form a new sequence of amplitudes. At each step in the slide across, the coefficients in the filter line up, one by one, with a sequence of adjacent amplitudes from the sequence in the wave file. When the filter is centered over the k-th element in the amplitude sequence from the wave file, the sum of the products of corresponding amplitudes and filter coefficients determines the value of the filtered wave-form's amplitude during the k-th sample period - that is, during the time between k/s seconds and (k+1)/s seconds.

equalizer

Operands

1.    b - a sequence of pairs of numbers, each of which specifies the lower and upper bounds for a range of frequencies

2.   w - a sequence of numbers, each of which specifies a boost

3.   x - an audio signal

Result

r - an audio signal with  r(c, t) = sum{w(j)*x(c, t+s*j) :  j = -k, -k+1, ... 0, 1, ... k}, where s is the sample rate of x

Note: The equalizer will us the Fourier transform to convert the audio signal from the time domain to the frequency domain, then back to the time domain after applying the appropriate frequency boosts. Implementing an Fourier transform fast enough for practical audio processing is an engineering challenge. One part of the challenge is to figure out the necessary level of precision needed for audio processing, given the level of precision in audio signals, to record the Fourier coefficients at the appropriate precision, and to avoid repeated computation of those coefficients. Needless to say, the FFT algorithm will be a must, since it reduces the amount of computation from O(n^2) to O(n log n), where n is the number of amplitudes in the audio signal.

special-effect (or some other name chosen by the designer)

Operands

1.   x - an audio signal

2. designer's choice - There may be other operands, or maybe not.

Result

r - an audio signal with properties chosen by the designer, some kind of special processing of one or more input signals

Signal and Filter Input

The get-signal function, which may be invoked only to specify a top-level argument in an audio formula, constructs an audio signal from an audio recording. The argument of the get-signal function is a string specifying a path to an audio recording (WAVE file). The function delivers the audio signal represented by the recording.

The get-filter function, which may be invoked only to specify top-level in arguments in audio commands, constructs a filter from a filter file. The argument of the get-filter function is a string specifying a path to a filter file. A filter file contains a sequence of comma-separated, decimal numerals. The function delivers the sequence of rational numbers represented by the decimal numerals in the filter file. If the file specifies a filter with no center element (that is, a filter with an even number of elements), append a zero to make it have a center element.

Signal Output

The put-signal function has two arguments: (1) a string specifying a path for a new audio recording, (2) an audio signal. The put-signal function, which may be invoked only at the top level in audio commands, writes an audio recording representing the audio signal specified in the second argument at the file path specified in the first argument.

 

The display-signal function has two arguments: (1) a string specifying a path for a CSV file, (2) an audio signal. The display-signal function, which may be invoked only at the top level in audio commands, writes a comma-separated file (CSV file) that spreadsheet software can display as a line graph. The CSV file contains a sequence of decimal numerals representing the sequence of amplitudes in the audio signal supplied as the second argument.

Audio Formulas

Audio formulas are ACL2 expressions in which the function is a lambda expression and the arguments are audio signals, filters, or numbers that are appropriate for the lambda expression.

Arguments in audio formulas that represent audio signals may be constructed by invocations of the get-signal function. Filter arguments may be constructed by invocations of the get-filter function. Numbers are specified as ACL2 rationals. Functions in lambda expressions must be basic audio operators.

Example audio formula

     ((lambda (w wav1 wav2)
        (fade-in (fade-out 5 (overdub wav1 (filter w wav2)))))
      (get-filter "filters/blur.flt")
      (get-signal "tracks/operator.wav")
      (get-signal "tracks/street-noise.wav"))

Audio Commands

An audio command is an ACL2 expression in which the function is put-signal or display-signal and the arguments are a path and an audio formula.

Example audio command

     (put-signal
         ((lambda (w wav1 wav2)
            (fade-in 5 (fade-out 5 (overdub wav1 (filter w wav2)))))
       (get-filter "filters/blur.flt)
       (get-signal "tracks/operator.wav")
       (get-signal "tracks/street-noise.wav")))

Tests and Theorems

As a minimum, every function be accompanied by tests and theorems that state that if the function is supplied with arguments with the expected kind, it will deliver values of the expected kind.

Test Data and Example Computations

You will need a collection of WAVE files that fit the constraints of the problem (one or two channels, PCM/uncompressed, etc). You will also need to do some research and experimentation to figure out interesting things to do that demonstrate the capabilities of your audio processor. Do this up front.