Hi, I didn't mean to move LD A, IXH/L for saving the T-states, I meant to move the instructions for reading the current speaker states back into A register because checking if IX is zero changes A and would completely destroy the sound. And even if A wouldn't be changed when checking if IX is zero the routine wouldn't update the speaker states of channel1 and channel2 when JR NZ jumps are taken which is most of the time. This is what I meant:

            ld h,0
            ld l,0
soundloop   dec b
            ld a,h                  ; I moved this above JR NZ
            jr nz,skip1
            xor #10
            ld h,a
            ld b,c
skip1       out (#fe),a
            dec d
            ld a,l                  ; I moved this above JR NZ
            jr nz,skip2
            xor #10
            ld l,a
            ld d,e
skip2       out (#fe),a
            dec ix
            ld a,ixh                ; These destroy A register which is holding the speaker state in...
            or ixl                  ; ...case of both JR NZ jumps are taken which is most of the time
            jr nz,soundloop

How exactly did you mean to move LD A, IXH/L to save 8 T-states?

Hi, excellent tutorial!

While I was reading I noticed a few things that might or might not have to be corrected. In Pulse Interleaving routine most of the time the loop would jump through both JR NZ,SKIPx jumps and at the end the A register would be changed to IXH OR IXL and then output to port 254. That could be avoided by moving LD A,H and LD A,L above JR NZ,SKIPx instructions.

In Variable Pulse Width routine I think the state would be changed when couter wraps from #7FFF to #8000 and not when it wraps from #8000 to #8001 because carry is 1 when H goes from #00 to #7F and when H becomes #80 then carry becomes zero. The other reason why the state can't be changed when counter wraps from #8000 to #8001 is because we are checking only high byte so the state can't be changed when the change is in only low byte.

In Achieving Simple PCM with Wavetable Synthesis there are multiple RCLA instructions where there should be RLCA

3

(15 replies, posted in Other Platforms)

garvalf wrote:

@chupo_cro: I've tried your code, and it seems to fix the problem, thanks.

I don't remember how was Shiru's code before he fixed the tempo problem, but now we have both the correct tempo, and the songs won't halt after a while (the "bourrasque" tune lasts 2 minutes, until the end, after it loops, instead of 10 seconds before)

Like for the other engines, we can add a nice led for lighning during the beats:

if(drum_pulse_length)
    {
        --drum_pulse_length;
               digitalWrite(LED,HIGH);
    }

Hi, I am not aware how the code looked like before, I have downloaded it this night. Since the variable used in a while loop is 8 bit, the only change was to remove the delay. That is possible because parser_sync_enable is always zero when loading parser_sync_l and parser_sync_h. If that would not be the case cli() and sei() would have to be added before parser_sync_l and after parser_sync_h respectively to prevent an interrupt in between loading low and high byte.

As for the 'light show', I have this little nice ATmega128 board (a picture) which has two SMD LEDs connected to the lowest two bits of the PORTA so the nice effect can be done by:

if(n>=2&&n<128)  //drum sound
    {
        drum_data=(const unsigned char*)pgm_read_word(&(drum_sample_list[n-2]));

        drum_output=0;
        drum_pulse_length=0;
        drum_sync=0;
        drum_request=1;

        PORTA ^= 0x03;        // turn off one of the LEDs and turn on the other one
                
        ++pattern_ptr;
    }

For that to work I added:

// LEDs
DDRA = 0x03;        // PA0 & PA1 = output
PORTA = 0x01;        // light on one of the LEDs

The same code could be used with Arduino by changing the port (and maybe the pins) and connecting one more LED.

The positions to add 'light show' code are:

Phaser1:

else  //118..127 is a drum
{
    drum_ptr=0;
    PORTA ^= 0x03;        // turn off one of the LEDs and turn on the other one
    drum_sample=1<<(tag-118); //bit mask for the drum sample

    cli();
    parser_sync=MUSIC_FRAME*1;  //drum always take one tempo unit, so it normally followed by the wait command
    sei();
}

Octode:

if(n>=0xf0)
{
    click_drum=7-(n-0xf0);
    click_drum_len=128;
    PORTA ^= 0x03;        // turn off one of the LEDs and turn on the other one

    ++pattern;  //skip to the next byte
}

Qchan:

if(tag>=0x81) //a drum
{
    tag=(tag-128)<<1;
     
    click_drum_cnt_1=tag;
    click_drum_cnt_2=tag;
      
    click_drum_len=0;
    PORTA ^= 0x03;        // turn off one of the LEDs and turn on the other one
 
    ++pattern_ptr; 
}

Fuzzclick:

if(tag>=0x81&&tag<0x81+4)
{
    drum_data=(const unsigned char*)pgm_read_word(&(drum_sample_list[tag-0x81]));
                
    drum_output=0;
    drum_pulse_length=0;
    drum_request=1;       //interrupt handler will get changed in the next tone interrupt

    PORTA ^= 0x03;        // turn off one of the LEDs and turn on the other one
}

It is quite nice to watch the dot jumping on every drum beat, I might record a video.

I am BTW using this ATmega128 board only when developing because the programs can fit in the memory even with the disabled compiler optimizations which can sometimes be neccesarry and because I can use cheap JTAG. The PCBs which I like to use for final products are this one and this one. These are Dropbox links to picture albums, no need to register.

4

(15 replies, posted in Other Platforms)

Since parser_sync_enable is just one byte, this time you can fix the sync code just by:

//set up parser sync counters

parser_sync_l=song_tempo&255;
parser_sync_h=song_tempo>>8;
parser_sync_enable=1;

//wait for the next row
//delay 1 is important, by some reason song tempo starts to jump a lot without it

//while(parser_sync_enable) delay(1);
while(parser_sync_enable)
    ;

5

(37 replies, posted in Other Platforms)

AtariTufty wrote:

Very interesting discussion guys.

Welcome to the forum chupo_cro , it's great to see new faces here smile

Thank you very much for the welcome! :-) I am looking forward to learn from the works presented by forum members and I hope I could add a bit here and there. Although my 'first' assember is Z80 I did write some 6502 code back in the 80s as well so I will be watching for those sound generation routines too.

6

(37 replies, posted in Other Platforms)

utz wrote:

Well, let's just say everybody has their own goals and methods, alright? For my part, I'm quite interested to hear about yours, chupo_cro (even though I'm personally more interested in coding AVR asm than C). Either way, probably best to continue this discussion in a new thread.

Yes, that would be great :-) I shall these days start a new thread where I'll write you an answer to this post (regarding data format and row transitions) and where we can discuss all kinds of possibilities that could be used for 1-bit sound generation using AVRs/Arduinos programmed in C, assembler or C + inline assembler.

7

(37 replies, posted in Other Platforms)

Shiru wrote:

For this particular application I didn't need to know exact cycle count or do assembly optimization. If I was to write AVR assembly code by hand, I sure would know how many cycles it takes - it is common practice to memorize opcode cycles (super easy on AVR), and count them in time critical parts when writing for 8/16 bit systems in assembler.

AVRs are like a 8 bit computer + peripherals, you might soon meet the situation where you would have to know the number of cycles between two events or between two conditions of the system.

I memorized a lot of Z80 opcode cycles and opcodes back in the 80s, I could write 20-30 byte long (or short? :-) ) routines directly in hex - without loading the assembler (Zeus or GENS). Now I still remember only #c9 and a few more opcodes.

Shiru wrote:

What I needed was just to see compiler generated code, to check if particular changes in C code makes resulting code longer or shorter, which is totally enough to estimate efficiency of edits - another common practice when writing for 8/16 bit systems in C. It is super simple task, I don't see why it should be overcomplicated like that with all that porting to other environments, using simulators, debuggers, and stuff like that.

Well... Could you notice the interrupt happening just in between loading or storing the 2-byte variable and ruining the result by just observing the code? ;-)

8

(37 replies, posted in Other Platforms)

Shiru wrote:

Thanks for explaination of the issue.

As for looking up to the generated code, I have my doubts about these ideas. First, I just need to see what compiler spits out, in order to see how efficiently certain C code compiles with certain C compiler. That's purely software issue, normally you don't even have to do anything to just get the intermediate assembly output. Using simulators, switching platforms, using JTAG or any extra HW is all total overkill for this. Also, I doubt that latest Arduino and AVR Studio 4.18 use the same exact C compiler with same exact options, and if this is not the case, it just won't help at all.

I have no doubts that a lot more can be pulled off with Arduino or plain AVR. Something along the lines of Casio CZ series or Korg Poly 800 (with external filter) should be totally doable. We here not trying to squeeze out all that is possible, we're just playing with concepts of porting over ZX beeper engines, with all their quirks and specifics, and maybe applying some of this experience to do like 'super' versions of those engines. And using Arduino over plain AVR is an important feature, because that's certainly more accessible platform for strangers and newcomers.

Yes, I am aware the primary goal is to port the Spectrum's sound engines to Arduino. I agree JTAG might be an overkill but when I saw $7 Chineese JTAG ICEs that connect to USB port I just had to buy one to see how/if it works - despite I most of the time use just five wires as a 'programmer'.

While code generated by Arduino IDE and old AVR Studio is not the same, the compiler is - in both cases avr-gcc is used to compile the code (although in Arduino you can use C++ in addition to C). One of the reasons why I said it is better to use an old AVR studio than new Atmel Studio is because old compiler produces smaller and mostly faster code. Despite Arduino beeing a target platform it is much easier to develop/debug the routines using old AVR Studio (good simulator, disassembler and cheap in circuit debugger) and to make just a few small modifications to move the code to Arduino IDE when everything works as expected. When I modified your Phaser1 code to work with bare metal ATmegax, I just had to remove loop() and move the code to main(), insert a call to setup(), get rid of boolean, insert a few #includes and make cast from pointer to int. The other way would be as easy too. Then, if you noticed that Arduino's compiler produced worse code than old avr-gcc, you could just insert a few lines of inline assembler. But 16 MHz AVR with just 1-5 cycle instructions is fast enough so I don't think there would be a need for assembly optimizations.

But still, fast and precise count of the cycles between two instructions is something that should be available at all the times. For that you might use a simulator which is part of the AVR Studio, or an oscilloscope (toggle the pin upon entering and before leaving the interrupt routine) or use Proteus vith virtual oscilloscope.

Although I don't have Arduino I do use Arduino IDE to write code for ESP8266 so I was examining the Arduino sources. That way I know Arduino is using Timer0 to do some things, the Timer0 is used by delay(), millis(), micros() etc. I might be wrong but I believe the sound generation routines might work better if Timer0 interrupt would be disabled (unless not used by the sound routine itself). In that case the delay() wouldn't work but you can #include <util/delay.h> and use _delay_ms() which does not use interrupts. Although I don't think the delay needs to be used in a sound generation routine.

9

(37 replies, posted in Other Platforms)

utz wrote:

Now that's what I call an introductory post wink Welcome to the 1-bit forum, and thanks for all the info.

Thank you very much for your kind words!

utz wrote:

Sound example or it didn't happen xD

:-) I will post the source code and a sound example once I translate the comments into the English language. And since the routine could easily be expanded for more channels and/or effects (reverb, vibrato, ...) I might first add some more code to spend some more cycles that are available.

I used 8000000/510 = 15686.27 Hz PWM frequency meaning there is 63.75 µs of time to spend inside the interrupt routine. 1 channel with portamento took only 8 µs of processing time but when I added the decay envelope code (which was straightforward and not optimized) the processing time exceeded 63.75 µs. However, after optimizing the code 1 channel + portamento + decay took only 12 µs. After I added one more channel the time increased to only 19 µs so there are still plenty of cycles that could be used. I think I spent about 10 more µs after adding some more commands (decay on/off, waveform change, ...) but everything could be further optimized and I believe more than 10 high quality channels could be done even with ATmega µC running on 8 MHz internal oscillator. Since Arduino works with 16 MHz clock (external crystal oscillator), the available number of cycles to spend in the interrupt routine is twice more than I had when using ATmega8 without a crystal.

Something that might be interesting - my song data format is not tracker-like (patterns of tones/commands) but is rather an array of tone[|CMD], duration[,command][, parameter] per channel (array of uint8_t) so it is quite easy to enter the song by looking at the musical scores and typing the data. Some commands can be specified just by OR-ing the tone with the command and some need to be specified by using CMD flag and the additional byte as command specifier and maybe one more byte as a parameter. The durations are specified as t1 (a whole note), t2 (a half note), t4 (a quarter note), t8, t16 or as the combination - for example t4+t8 is a quarter note with the dot.

However, I am much more excited by Spectrum's sound routines which I think are the state of the art. I still have Konami's/Imagine's sound routine from Ping Pong start screen written on paper from back in the 80s when I spent quite a few months analysing how it works.

Best Regards

10

(37 replies, posted in Other Platforms)

Shiru wrote:

Song data parsing and playing happens in the main thread. Once a row is parsed, player sets the sync counter to the speed value, counted in sample rate units. Then it just waits until the interrupt handler will decrement it down to zero, and next row gets processed. One major weird issue I had was related to this part. Initially I waited for the counter to be zero with just while(counter);. This caused major tempo fluctuations, with both speed increase and decrease. Once I changed it to while(counter) delay(1); it started to work, although the 1 ms delay may change the tempo a bit. Still not figured why the original approach didn't work as expected.

Hi, I have downloaded and examined your Phaser1 engine for Arduino where there is the same issue you described:

// delay 1 is important, by some reason sng tempo starts to jump a lot without it
while(parser_sync>0)
    _delay_ms(1);

The reason why this doesn't work is because reading the parser_sync variable in a while loop might look something like:

loop    lds       r24,0x0114
        lds       r25,0x0115
        or        r24,r25
        brne      loop

The problem is because when the interrupt happens in between the two lds instructions and since the loop is very fast it happens quite often. For example if parser_sync is 0x0100 and the interrupt happens after the first lds instruction r24 is going to be 0x00. Inside the interrupt routine the value stored in locations 0x0114 (low byte) and 0x0115 (high byte) will change to 0x00ff and upon returning from the interrupt routine the second lds will fetch the value 0x00 effectively reading the variable as 0x0000 instead as 0x00ff. Similarly the variable might 'jump' from 0x0b00 to 0x0a00 instead to 0x0aff etc. and because during the while loop there are many interrupts the error quickly accumulates. When you added the 1 ms delay you considerably slowed down the while loop - lowered the frequency of the lds instructions (because now the loop spends most of the time in the delay loop) so the error happens only sporadically. It is similar to as when you are drawing the sprite into the Spectrum's video RAM and the electron beam 'runs' over the location where you are changing the bytes :-)

Similarly, the line:

parser_sync=MUSIC_FRAME*tag;  //wait for given number of tempo units before playing next row

might look something like:

        ldd       r24,y+1
        mov       r11,r24
        lsl       r11
        lsl       r11
        lsl       r11
        clr       r10
        sts       0x0115,r11
        sts       0x0114,r10

where everything goes wrong if the interrupt happens after the first sts. In that case the high byte of the variable would inside interrupt routine be decreased by one (because the lower byte is still zero), the lower byte would become 0xff but it would be overwritten immediately upon returning from the interrupt by the second sts. This is in fact the actual code generated by the compiler, high byte is stored the first which is worse.

The solution is - you must not allow the interrupt to happen in between load or store instructions which are accesing or writing the variable which takes more than one byte. Here is one way how that could be done:

unsigned int p_sync;
do {
   cli();
   p_sync = parser_sync;
   sei();
} while (p_sync);

and

cli();
parser_sync=MUSIC_FRAME*tag;  // wait for given number of tempo units before playing next row
sei();

Despite parser_sync variable is volatile, that isn't enough. Declaring the variable volatile only prevents the compiler to store that variable in registers because in that case the interrupt routine would modify the registers after push instructions and at the end of the interrupt routine the pop instructions would destroy all the changes.

The other solution (which I used more often than cli/sei one) is to use one byte flag to signal from the main program to the interrupt routine and/or from the interrupt routine to the main program when the data is ready to read (or when the while loop should end). In that case you might declare a flag:

uint8_t    sync_over;

which would send a signal from the interrupt routine when parser_sync counter is zero - and then you can use:

while(!sync_over)
    ;

Similarly you can in the main program load the new counter value in some temporary variable and then use another flag to signal the interrupt routine to read the value and to update the real counter.

I have BTW modified your code to run on ATmega328 and on ATmega128 because I don't have an Arduino. I can see you are looking for the best way to debug the C code by stepping through assembler instructions generated by the compiler. I will describe you in one of my next posts what I think is the best approach. In short - AVR Studio 4.18 (not Atmel Studio!! and not the latest AVR Studio) + $8 ATmega128 PCB with JTAG connector + $7 USB JTAG ICE. That way you can debug the code both in the real hardware (using hardware breakpoints) almost for free - at least compared to newer and 'better' options :-) and in the simulator where you can count cycles and time. Since you didn't use any C++ code it took me just 10 minutes to modify Arduino code for using with the AVR Studio meaning you can, after debugging with ATmega128 board, change the code to work on Arduino in minutes.

And one more thing, I did a 2 channel 1-bit sound routine (using PWM and waveform tables) with portamento and decay envelope which runs on ATmega8 with 8 MHz internal oscillator and the results are quite good, some of the sounds could be compared with low end keyboards. Not to say what could be done with 8 pin ATtiny85 which has 64 MHz (yes, megahertz :-) ) PLL clock and can generate 250 kHz PWM output.