So, here's a test modified to use CARN, for comparison. The test implementation will occasionally produce errors (manifesting themselves as short blips), but other than that it sounds quite alright to my ears as well 
The great thing is that since there's one variable less to track, CARN can be implemented in less cycles, which is always a plus. But both COVN and CARN obviously have their uses considering the differences in sound. So ideally I'd want to come up with an implementation that allows switching between the two at runtime. I'm sure it just needs a nice piece of self-modifying code
Only problem, as mentioned, I'm out of practise as far as assembly coding is concerned, but I definately want to get back to this and produce a proper engine.
In the first test example, there may be some additional problems in addition to the bad RNG. Also, the pulse density is actually fairly high in both examples. The envelope in the COVN one starts with a window size of 9 and goes up to 129. For CARN, the envelope starts with numbers < 8 and goes up to numbers < 256. In any case, the random number sequence is actually not random at all, since I'm using a fixed seed for the PRNG. If you are interested in studying the characteristics, the sequence is calculated as follows:
// for COVN
// initialization
uint_16 seed = 0x2157
uint_16 state = 0
uint_8 window_size = x // x ∈ {0x9, 0x11, 0x21, 0x41, 0x81}
// the actual generator
// as wasteful as it looks, it's actually just 3 bytes/ 19 cycles in Z80 asm - add hl,de; rlc h
uint_16 next = (state + 0x2157)
uint_16 temp = (((next >> 8) & 0xff) << 1)
state = (next & 0xff) + (temp << 8) + (temp & 0x100)
// by coincidence, this prevents two consecutive pulses colliding at the end of the current/beginning of next window
uint_8 next_pulse_delay = ((state >> 8) & (window_size - 2)) + 1
For CARN it's essentially the same, except that the window_size is replaced with a "range" value x where x ∈ {0x7, 0xf, 0x1f, 0x3f, 0x7f, 0xff}, and the pulse delay is calculated as
uint_8 next_pulse_delay = ((state >> 8) & range) + 1
The COVN example runs at 17412 Hz and the CARN example runs at 19662 Hz (actually slightly lower, because there is a short delay every 256 samples for updating the length counter, and there can also be some minor fluctuation depending on internal machine state). The window size/range is updated every 256*32 samples. There is an additional quirk: pulses across all channels will never coincide. If more than one pulse is triggered on the given sample, the additional pulse(s) will be delayed until the next sample(s). The tone channels always trigger a "double" pulse. I found this to yield a resonable volume balance. In an actual engine we'd want to make the pulse length variable for the tone channels to fake volume envelopes for those.
There are a number of different seeds that will work reasonably well, I'm just sticking to this one because I'm lazy and and it's the only one I can remember off the top of my head. Might also be worth lifting some ideas from Shiru's noise generator code to get better random numbers.
In any case, thanks for the detailed explanation regarding the spectral characteristics. I have to admit that the point in your paper where you start to talk about Power Spectral Density was about my brain switched off, especially with those scary-looking equations on page 4 