Alright, I needed some place to start with exploring the possibilities of the Arudino and AVR MCUs application to the beeper music, and after some thoughts chose to start from the simplest thing possible - recreating the original Octode engine for Arudino Uno, in C. A very special kind of fun, should I say.
The engine port is nothing special. I decided to stick to the 8-bit counters, and kept the same sample rate as the original to avoid changes in the note range, so it has the same sound quality. I didn't use the Arduino hardware PWM stuff, although it can be done quite easily, and I think it may give some good improvements. The clicky drums algorithm is not 100% match to the original, but sounds more or less along the same lines.
The issues I've had with this small project were mostly related to the Arduino platform itself, not programming. I admit and appreciate its convinience for rapid prototyping, and use it time to time, but don't really enjoy it, because somehow it always full of really weird issues on my side, with no real solution to be found anywhere. For one, I can't just upload a sketch to any of the boards I have (a few different kinds), as it constantly fails with all kind of error messages. Attempting to do that a few times in row, maybe a few tens of times even, solves the problem, eventually sketch will upload successfully. Randomly mashing the Arduino reset button during this increases chances of luck. Another issue is that while Arduino is connected to the PC, it often happens that it starts to constantly reset on its own every ten seconds or so, at random times. So sometimes it is hard to tell whether the sketch code has issues, or it is just Arduino feels funky. Luckily, powering it from a stand alone 5V source makes it work stable.
Code wise, some other issues came from the fact that C compiler is more high level than assembly. Of course I could just write sound generation parts in assembly code, but I wanted to stick to C in order to make the thing more accessible for beginners, hoping that someone may eventually get interested in making 1-bit stuff with Arduino and will need simple examples to learn on them.
It is difficult (haven't found an easy way yet) to see the assembly output of the C compiler in Arduino environment, and it may eventually change after some compiler updates anyway, so consciously designed timed code is pretty much out of question, although it is possible to make something like that with a good deal of guess work and measurements. A more reliable way to go is to use the timer-driven interrupts that can fire up as often as a few tens to a hundred KHz, and in case when different pulse width is needed the hardware PWM unit should be utilized. This makes Music Box-like pulse interleaving and Fuzz Click-like PWM/PFM engines difficult to implement - something is possible, but inconvinient under these circumstances. That's why I chose Octode, as it only have one sample output per loop.
There is a way to guesstimate how much CPU time a code piece takes without extra tools or profiling, by compiling the code, remembering how many bytes it takes, then deleting a line, compiling again, and comparing the sizes. Almost all but few AVR opcodes are two bytes long, and mostly take 1-2 cycles, sometimes 3. So, guessing that average opcode timing is 1.5 cycles, the size difference multiplied by about .75 should give a rough idea. Could be useful to plan how to not exceed the time available in the timer interrupt, which is for this engine port is about 16000000/12727=1257 cycles (Arduino clock divided by sample rate).
Another inconvinience in porting from assembly code to C is that code in beeper engines very often relies to the carry flag. Like, in assembly code you can simply increment or add to a register, and execute extra code only if the result exceeded the register capacity, with the result value wrapping around properly by itself. There is no way in C to mimic that with equal efficiency, without increasing width of a variable by a byte or adding more conditions. For one, when you add a value to variable and want to detect carry, it may look something like this:
n=counter+value;
if(n<counter) { /* carry will happen, new counter value in n */ }
counter=n;
Shifting out a bit out of a register also takes extra code or rethinking the algorithm, like remembering the bit before shift. This makes already not very efficient compiled C code to bloat further. Luckily, Arduino seem to have enough power to run C ports of any ZX Spectrum beeper engine, written in C as efficient as it gets with straight rewrite, but to make more advanced ones in C some common code tricks and general design ideas likely going to be developed.
To convert music data I modified my recently written, specifically for this project, Python port of the XM to Octode converter. For simplisity reason data format remained the same as in the original Octode, although Arduino has enough power to allow much more efficient format with packing. Can't recall who and when (~2011) gave me the XM file of the song I'm using for testing, Mister Beep I guess? Not publishing the source module just in case.
There was an issue with putting ~16K of music data into the memory. By default Arduino copies all arrays and variables into the RAM, even large constant ones. That's because of the AVR's Harvard architecture, with separate memory systems for code (Flash ROM) and data (RAM). Compiler is simply not aware of possibility to put data into program memory. There is a way (different between Arduino IDE versions) to mark arrays and variables with attribute so compiler will put them into program memory. However, this data can't be just accessed by a pointer. Special functions has to be used instead. In AVR assembly code it is similar, but easier and efficient, just another opcode to load from program memory.
Overall player architecture is very simple.
There is a timer interrupt that calls its handler at sample rate (12727 Hz), that's basically sound synthesis part. Handler outputs last calculated output bit to the speaker pin (Port D, bit 7 by default) before of all, to avoid jitter. Then it checks whether the synth is in tone or clicky drums mode. For tone mode it calculates all eight counters just like the original Octode does. I used unrolled code there both for speed and clarity. Constant array indexes seemingly not affecting the code size or speed. For clicky drums mode it calculates a bit for percussive timbre using a set of counters, which is a bit different to the algorithm used in the original Octode. The handler keeps track of the drums mode with a counter. When player needs to play a drum, it simply sets the counter, and while counter ticks down each handler call, drum mode is active. This allows to avoid the need of tempo compensation, drums cause no tempo fluctuation, and can have any reasonable duration. Interrupt handler also decrements another, 16-bit counter until it gets zero. This counter is used to keep sync with the player.
Song data parsing and playing happens in the main thread. Once a row is parsed, player sets the sync counter to the speed value, counted in sample rate units. Then it just waits until the interrupt handler will decrement it down to zero, and next row gets processed. One major weird issue I had was related to this part. Initially I waited for the counter to be zero with just while(counter);. This caused major tempo fluctuations, with both speed increase and decrease. Once I changed it to while(counter) delay(1); it started to work, although the 1 ms delay may change the tempo a bit. Still not figured why the original approach didn't work as expected.
Sorry, no sound example yet, I need to set up a way to record it directly, with proper quality. At the moment I'm using an old transistor radio as an amplifier with speaker for testing.