Part 10: Achieving Simple PCM with Wavetable Synthesis
The idea behind pulse-code modulated sound (PCM) is remarkably simple. A PCM waveform consists of a set of samples which describe the relative volume at a given time interval of constant length. (Note the terminology of "sample" in this context, which has nothing to with a sample as we know it from MOD/XM, for example). For playback, each sample is translated into a discrete voltage level, which is then amplified and ultimately sent to an output device, typically a loudspeaker. The samples are read and output sequentially at a constant rate until the end of the waveform has been reached.
When attempting this on a 1-bit device, we face the problem that we obviously can't output variable voltages. Instead we only have the choice between two levels, silence or "full blast". So how can we do it, then?
In order to understand how we can output PCM on a 1-bit device, let's first recap how Pulse Interleaving works. The underlying principle of P.I. is that we can keep the speaker cone in a floating state between full extension and contraction by changing the output state of our 1-bit port at a very fast rate, thanks to the inherent latency of the cone. So we're actually creating multiple volume levels. I'm sure you've realized by now that the same principle can be applied for PCM playback.
So, say we want to output a single PCM waveform at a constant pitch. All we need to do is interpret the volume levels described by the samples as the amount of time we need to keep our 1-bit port switched on. So we just create a loop of constant length, in which we
- read a sample
- switch the 1-bit port on for a the amount of time which corresponds to the sample volume
- switch the 1-bit port off for the remaining loop time
- check if we've reached the end of the waveform, and loop if we haven't.
That's all - on we go with the next sample, rinse and repeat until the entire waveform has been played.
Loop duration is a critical parameter here, of course. We can't make our loop too long, or else the "floating speaker state" trick won't work. It seems that
a loop time of around 1/15000 seconds is the absolute maximum, but ideally you should do it a bit faster than that.
With common PCM WAVs, we'll run into a problem at this point. An 8-bit PCM WAV has samples which can take 256 different volume levels, take the more popular 16-bit ones and you've already got 65536 levels. How are we supposed to control timing that precisely in our loop? 1/15000 seconds corresponds to around 233 cycles on the ZX Spectrum. The fastest output command - OUT (n),A - takes 11 cycles, which means we can squeeze at most 21 of those into the loop - and that's not taking into account all the tasks we need to perform besides outputting. So how do we output 256 or even 65536 levels? The answer is: We don't. Instead, we'll reduce the sample depth (that is, the number of possible volume levels) to a suitable level. This will obviously degrade sound quality, but hey, it's better than nothing.
As far as the Spectrum is concerned, 10 levels seems to be a convenient choice. You might be able to do more with clever code (or on a faster machine), but for the purpose of this tutorial, let's keep it at 10. That is, if we want to output just a single waveform. But of course we want to mix multiple waveforms at variable pitches, let's say two of them. In this case, our source PCM waveforms should have 5 volume levels.
As you might have already guessed, we'll need to develop our own PCM data format to encode these 5 levels. How this format will look like depends on your sound loop code as well as the device you're targetting - anything goes to make things as fast as possible. On the Spectrum, we may take two things into account:
- bit 4 sets the output state (let's ignore the details for now...)
- we have a fast command available for rotating the accumulator.
So, our samples bytes might look like this:
volume binary hex
level 76543210
______________________
0% 00000000 #00
25% 00010000 #10
50% 00011000 #18
75% 00011100 #1c
100% 00011110 #1e
This reasoning behind this may not be self-evident, but it'll become clear when we look at a possible sound loop.
Unfortunately, this custom PCM format still won't allow us to create a sound loop that is fast enough, so let's apply another restriction - use waveforms with a fixed length of 256 byte-sized samples. You'll see in a moment why this comes in handy.
Our sound loop might look like this:
set up sample pointer channel 1 ld bc,waveform1
set base frequency ch1 ld de,noteval1
clear add counter ch1 ld hl,0
exx
set up sample pointer channel 2 ld bc,waveform2
set base frequency ch2 ld de,noteval2
clear add counter ch2 ld hl,0
set timer ld ix,0
loop:
load channel 1 sample byte to accumulator ld a,(bc)
output accu to beeper out (#fe),a
rotate left accumulator rlca
output accu to beeper out (#fe),a
rotate left accumulator rlca
output accu to beeper out (#fe),a
rotate left accumulator rlca
output accu to beeper out (#fe),a
add base frequency ch1 to counter ch1 add hl,de
IF counter overflows, advance sample pointer ch1 adc a,0 \ add a,c \ ld c,a
exx
load channel 2 sample byte to accumulator ld a,(bc)
output accu to beeper out (#fe),a
rotate left accumulator rlca
output accu to beeper out (#fe),a
rotate left accumulator rlca
output accu to beeper out (#fe),a
rotate left accumulator rlca
output accu to beeper out (#fe),a
add base frequency ch2 to counter ch2 add hl,de
IF counter overflows, advance sample pointer ch2 adc a,0 \ add a,c \ ld c,a
decrement timer and loop if not 0 dec iy \ ld a,iyh \ or iyl \ jp nz,loop
Now you also see why limiting waveforms to 256 bytes is useful - this way, we can loop through them without ever having to reset the sample pointer, which of course saves time.
However, there's a whole array of problems with this code. First of all, it's still quite slow - 218 cycles. Secondly, you can see that the last output from each channel last significantly longer than the first 3. A bit of difference in length is actually not a big problem, but in this case, the last frame is 3 times longer - that's simply too much. Thirdly and most critically, I/O contention has not been taken care of (this mainly concerns the Speccy, of course).
If you've followed the discussion in this thread, you'll have noticed that I normally don't pay as much attention to I/O contention as other coders, but in this case, aligning the outputs to 8 t-state limits does make a huge difference. I'll let you figure this out on your own though. Check my wtfx code if you need further inspiration.
I will tell you one important trick for speeding up the sound loop though. Credits for this one go to sorchard from World of Dragon.
In the above sample, we're actually using 24 bit frequency resolution, since we're keeping track of the overflow from adding our 16-bit counters. But 16 bits are quite enough to generate a sufficiently accurate 7-8 octave scale. So in the above example, instead of doing "adc a,0 \ add a,c \ ld c,a" to update the sample counter, you could simply do "ld c,h", saving a whopping 22 cycles in total. The high byte of our add counter thus becomes the low byte of our sample pointer. The downside of this is that our waveforms need to be simple - e.g. just one iteration of a basic wave (triangle, saw, square, etc.). It's less of a problem than it sounds though, as you won't be creating really complex waveforms in 256 bytes anyway. And for a kick drum or noise, you can simply use a frequency value <256, making sure that you step through every sample in the waveform.
And that's all for now, hope you find the information useful, and as always, let me know if you find any errors or have any further suggestions/ideas.