Found an optimization for the CARN code which speeds it up quite a bit. Also, the tone channels now have envelopes.
The core loop is currently 185 cycles (18918 Hz), which means I can probably get away with cramming in another tone channel. I'm focussing on CARN at the moment because the optimization will not work for COVN.

Also that low shelf characteristic of COVN got me thinking. What if we were to also apply a lower limit to the random number range? Then the result should become tonal at some point, right?

Edit: uh-oh, the gears in my good ol' brains are rattling pretty hard right now. For one, if the spectrum of COVN can be limited in this way, then I could use that to basically simulate a band-pass filter in 1-bit. That would obviously open up a whole new range of sound possibilities. It's going to be quite tricky though because this requires a way of generating random numbers within an arbitrary range. Secondly I think I just understood why velvet noise is useful for reverb. Hmmm 1-bit reverb... hehehe...

So, here's a test modified to use CARN, for comparison. The test implementation will occasionally produce errors (manifesting themselves as short blips), but other than that it sounds quite alright to my ears as well wink

The great thing is that since there's one variable less to track, CARN can be implemented in less cycles, which is always a plus. But both COVN and CARN obviously have their uses considering the differences in sound. So ideally I'd want to come up with an implementation that allows switching between the two at runtime. I'm sure it just needs a nice piece of self-modifying code big_smile Only problem, as mentioned, I'm out of practise as far as assembly coding is concerned, but I definately want to get back to this and produce a proper engine.

In the first test example, there may be some additional problems in addition to the bad RNG. Also, the pulse density is actually fairly high in both examples. The envelope in the COVN one starts with a window size of 9 and goes up to 129. For CARN, the envelope starts with numbers < 8 and goes up to numbers < 256. In any case, the random number sequence is actually not random at all, since I'm using a fixed seed for the PRNG. If you are interested in studying the characteristics, the sequence is calculated as follows:

// for COVN
// initialization
uint_16 seed = 0x2157
uint_16 state = 0
uint_8 window_size = x  // x ∈ {0x9, 0x11, 0x21, 0x41, 0x81}

// the actual generator
// as wasteful as it looks, it's actually just 3 bytes/ 19 cycles in Z80 asm - add hl,de; rlc h
uint_16 next = (state + 0x2157)
uint_16 temp = (((next >> 8) & 0xff) << 1)
state = (next & 0xff) + (temp << 8) + (temp & 0x100)

// by coincidence, this prevents two consecutive pulses colliding at the end of the current/beginning of next window
uint_8 next_pulse_delay = ((state >> 8) & (window_size - 2)) + 1

For CARN it's essentially the same, except that the window_size is replaced with a "range" value x where x ∈ {0x7, 0xf, 0x1f, 0x3f, 0x7f, 0xff}, and the pulse delay is calculated as

uint_8 next_pulse_delay = ((state >> 8) & range) + 1

The COVN example runs at 17412 Hz and the CARN example runs at 19662 Hz (actually slightly lower, because there is a short delay every 256 samples for updating the length counter, and there can also be some minor fluctuation depending on internal machine state). The window size/range is updated every 256*32 samples. There is an additional quirk: pulses across all channels will never coincide. If more than one pulse is triggered on the given sample, the additional pulse(s) will be delayed until the next sample(s). The tone channels always trigger a "double" pulse. I found this to yield a resonable volume balance. In an actual engine we'd want to make the pulse length variable for the tone channels to fake volume envelopes for those.

There are a number of different seeds that will work reasonably well, I'm just sticking to this one because I'm lazy and and it's the only one I can remember off the top of my head. Might also be worth lifting some ideas from Shiru's noise generator code to get better random numbers.

In any case, thanks for the detailed explanation regarding the spectral characteristics. I have to admit that the point in your paper where you start to talk about Power Spectral Density was about my brain switched off, especially with those scary-looking equations on page 4 yikes

Alright, a new test, this time noise is combined with two pulse-frequency modulated tone channels. Terrible code, seems my asm skills have gotten awfully rusty. Well, it works, at least.

A question, it sounds like there is a drop in pitch with increasing window size. Is this expected, or is it an effect of the bad PRNG?

Thanks a lot for the detailed explanation! It's all very cle... wait, so there's no backtracking involved, eg. the pulse location in window n+1 is independent of the pulse location in window n? If so, what's the difference between COVN and CARN, then?

Anyway, I'm just playing around with some naïve test code at the moment, assuming the above is true. It does sound noisy alright, even using my not-so-uniform el-cheapo XORshift. Computation takes 49 cycles, which is reasonable (though I'm cutting some corners at the moment). We normally can spend up to 224 cycles on the synthesis loop (a bit more when using Pulse Frequency Modulation), which will give use a sampe rate of 15625 Hz with the Spectrum's 3.5 MHz clock.

The attached example runs at 48.6 KHz, using a window size of 16 (so a density of around 3000 if I'm not mistaken). In any case it's just a quick&dirty test, more experiments to follow later. (It's a Spectrum tape file, so you'll need an emulator - I recommend Fuse if you aren't sure which one to pick).

kurt.james.werner wrote:

I'm realizing now that perhaps old-school 1-bit trackers don't even have a fixed sample rate the way that modern audio processing (e.g. in a VST) does, but that in your context you deal just with raw clock cycles of the computer (up in the MHz?). Is that correct?

Correct, we normally deal with raw clock cycles, though it's trivial to approximate the sample rate. Assuming the synthesis loop length is fixed (which is true to some extend for most existing 1-bit engines on Spectrum), and ignoring some hardware quirks, it's a matter of simply dividing the CPU clock speed by the number of cycles in the synthesis loop.

kurt.james.werner wrote:

And if so, does that mean that on the ZX Spectrum you don't think in terms of audio samples in discrete time (e.g. a pulse wave like 000100010001) but rather just the time to flip up to +1 and a time to flip back down to 0, which need to be very close for a pulse train?

I can only speak for myself here, but I tend to think about it terms of samples in discrete time more often that not. It depends a bit on the platform - for example for PC speaker the latter approach might be more natural, due to how the hardware works. On Spectrum the common approach is to run a fixed-length synthesis loop, so it makes more sense to think about it in discrete time terms.

kurt.james.werner wrote:

I probably have a lot to learn about the terminology and basic idea of your approaches, so thanks for bearing with me here.

Don't worry too much about our terminology, it's not very formalized and everybody kind of uses their own terms. You'll probably know better how to actually name things than us most of the time. In any case, if you have any questions please feel free to ask, of course!

In any case, I have to thank you again for the great input you have been providing. I haven't been playing so much with 1-bit synthesis lately, mainly because of working on a new, experimental tracker, but also because of a lack of inspiration. Well, this discussion certainly brought back some of the latter wink If you have any more "tricks" to teach, please don't hold back! Generally any sort of signal processing with low computational requirements is of interest (never mind the "1-bit-ness" of things, it's possible to simulate multiple volume levels on 1-bit).

I think I've mostly figured it out, the only missing puzzle piece is the discrete pulse index m resp. impulse location and how it relates to the sample index n.

Regarding a real-time implementation, the main challenge will be to generate the two random number sequences. The Spectrum hardware does not provide a source of random numbers, so computing such sequences is very costly (even more so considering the computation needs to happen in constant time). The question is how much we can deviate from a uniform distribution. But that will be something to explore via experimentation, I guess.

Anyway, please take your time with the response, I won't run away in the meantime wink

Been thinking a bit more about a feasible implementation. Since we have very tight memory constraints on our machines, ideally we'd want to generate the pulse sequences in realtime. However, I'm afraid the computation might be too expensive despite it not requiring multiplication. But since the pulse sequences are sparse, it should be possible to devise a format in which they can be stored efficiently after precalculating them - essentially we could just store the distance between the pulses. However, this would take away quite a bit of flexibility. Hmmm... my main struggle at this point though is still understanding how to generate the actual kCOVN/kCARN/kCTRN sequences.

This is very, very exiting. As you correctly noted, good quality noise is pretty much uncharted territory in the realm of practical 1-bit music, even more so the possibility to control the spectrum and volume. We do have some primitive means of controlling the volume (by means of tweaking a threshold value applied to a simplified XORshift algo), but judging from the demo tune your methods obviously produce much cleaner and versatile results.

I have to admit that due to my limited knowledge of both dsp and mathematics, I can only grasp the very rudimentary basics of your paper. However, I'm very interested in implementing this on ZX Spectrum. So, I would be very grateful if you would be willing to help me understand how these techniques work. Could you provide a dumbed down explanation, or ideally some C/Python/Nyquist/pseudo-code? Also, more specifically, what do the terms "discrete-time sample index", "discrete pulse index", and "window width" mean in this context?

By the way, much obliged for mentioning us in the paper. One small note regarding this: I think Shiru deserves some credit as well, as he broke a lot of ground for the "modern" 1-bit scene as it exists today.

Anyway, thanks a lot for bringing this to the attention of our humble little internet hangout.

Neat! I guess your next music album will be an "enhanced 1-bit" one?

Great news, thanks for all your hard work. Looking forward to any hardware video captures smile

335

(21 replies, posted in Sinclair)

Thanks Shiru, that's a neat idea for spreading 1-bit sounds outside the 1-bit world.

Meanwhile I know what's up with that carrier hiss. Basically the issue seems to be caused by things leaking into bit 3 of port #fe. Which is by design in this engine, so can't be trivially fixed. Will try to rewrite the engine some day but have no time for it at the moment.

336

(4 replies, posted in General Discussion)

Another update with good news! I reimplemented the compiler generator. It's still a bit rough around the edges, but it's already much more robust than the previous version, and does not require any hard-coded work-arounds anymore. The latest incarnation of the MDCONF Standard replaces the XML-based format with an s-expression based one, which is much more compact. Needless to say it also allows me to drop all XML related dependencies (though ssax/sxml is a breeze to work with, generally). Also, MDAL now has a built-in multi-target assembler (though it only supports z80 atm), which means there's no need to provide pre-compiled binaries anymore.

Next step is to work some more on Bintracker again, so I get a better idea what kind of things MDAL needs to provide on the tracker API front.

Regarding feedback, welcome to human nature, I guess. I think it's inevitable. Still sucks, of course.

I've briefly looked into VST in the past, and indeed it seemed like a huge mess. Sadly Steinberg's stance and actions don't help to advance open source development in that field either.

In regards to DAW plugins in general, the situation is hardly better on Linux & Co. This is from the LV2 documentation:

What not to do: If you are new to LV2, do not look at the specifications and API references first! These references provide detail, but not necessarily a clear high-level view.

And then they wonder why adoption of this standard is so slow (even though the standard itself if quite good imo). And the API reference actually isn't bad, but it indeed provides no overview whatsoever.

Hehe, don't worry, it was just a stupid joke. I've been using Linux for 15+ years, but I certainly don't think it's the greatest thing in the universe. Imo it has become less attractive in recent years, too, due to certain developments. I do believe though that MS will eventually ditch their own kernal in favour of a Linux based one. 3 things that make me think that:

- MS is now one of the biggest sponsors of the Linux Foundation
- WSL2
- dropping Chakra for Chrome

While the latter has no direct connection to Linux, but it shows that the company philosophy is shifting away from doing everything in-house.

Anyway, let me just say I still very much appreciate that you took the time to make 1tracker build smoothly on Linux, even more so considering the story you told.

Very cool. Considering the popularity of bytebeat & co has only been growing over the years, I imagine this could become quite popular.

Also yup, I guess 32-bit is on its way out. Not a big fan of this development, but I don't have 32-bit compatibility on my main machine anymore either, as it mainly just increases the number of packages that need to be updated. Then again you should just come over to the dark side and develop for *nix wink Guess in a few years Microsoft will to switch to a Linux kernel for Windows anyway.

340

(4 replies, posted in General Discussion)

Unexpectedly got some more free time before that other project I mentioned. So here's a little sneak preview:

https://i.ibb.co/CPVzhg9/btng001.png

Hi Bushy, welcome aboard!

Glad to see someone being so enthusiastic about 1-bit sound wink And great to have you doing all these engine ports. My first successful attempt at 1-bit code was a Huby port as well wink I'm sure after a while you'll figure out how to write your own engines as well. If you have any questions, feel free to ask, of course.

I'll see if I can dig up some .1tm files from my archive and send them your way. Unfortunately Gmail has a nasty habit of blocking attachments of unknown file types, so if you don't receive a mail from me please let me know here.

342

(4 replies, posted in General Discussion)

More progress. Got rid of most of the hard-coded compiler parts, so the compiler generator is getting there slowly. (Yes, you read correctly: the new libmdal generates a custom compiler for each target engine, rather than using one big standard compiler with loads of conditionals like in libmdal v1).

Also, I was able to implement a new, much more robust parser using comparse, thanks to its friendly author who gave me an in-depth tutorial last weekend.

Last but not least I did some much needed refactoring, finally breaking up that horrible 3000 lines-of-code core module into various submodules.

Unfortunately in the coming weeks I'll have to work on another, unrelated project, so development will be stalled again until at least the end of the month.

343

(4 replies, posted in General Discussion)

Great news! The new libmdal compiler produced it's first correct output today. Still cheating a bit by hard-coding a few parameters, but nevertheless it's a major step forward. Testing with Huby at the moment which may not sound all that exiting. However, Huby has some not-so-common quirks so it's good for testing a number of different features (fixed pattern size so MDAL source patterns must be resized, size of order list must be stored in data, which requires multiple compiler passes, and so on).

Still a lot of work to do, but for now I'm very happy smile

344

(20 replies, posted in Sinclair)

I don't think there is a way to fix it. Even changing the pitch will not make the samples play as intended.
Interrupts and beeper sound don't mix well. And iirc Nirvana's interrupts take up almost the entire frame.

345

(20 replies, posted in Sinclair)

Hi,

Since Shiru seems to be busy, I'll try to answer in his place.

I don't think there is any code for the method Shiru describes, because nobody has implemented it. The idea for continuous sound would be not to call the sound generator through an ISR (because 50 Hz is too slow to make anything but the lowest audible tones), but to interleave it with the multicolour code at regular intervals. The usual restrictions with contention apply there, so it would still be quite tricky. Generally, PFM/pin pulse and Zilogat0r's Squeeker method would work best because they both can work with a low sample rate.

For sfx and simple melodies consisting of short blips, doing this on interrupts works fine. Avoid the Basic Beep, as it has a lot of overhead. You can achieve the same result with just three commands:

  add hl,de
  ld a,h
  out (#fe),a

Init DE with the desired frequency divider (for tests, something like #0040 will be fine), and run in a loop for something like 1000 times.

In z88dk, there's also a modified version of the Tritone engine with can be used in combination with game logic. It basically runs all the time and then periodically runs your game code on note changes. If your game logic is simple and doesn't need a lot of CPU time, it might be a good option.

346

(3 replies, posted in Sinclair)

Hahaha this made my day:

;да

I always fail to find an elegant way of commenting a conditional jump. I think this is the perfect way.

Other than that, I can only reiterate what Shiru said: This code is meant to be called through an IM2 interrupt, or some other form of loop.
This appears to be the relevant z88dk documentation: https://github.com/z88dk/z88dk/wiki/interrupts

Released a new XM parser today. It's provided as an extension for the Chicken 4 implementation of the Scheme language. If you have Chicken 4 installed, you can just run "chicken-install xmkit" to use it. The nice thing about Chicken is that it allows you to run code interpreted (for quick prototyping), as well as compile to fairly efficient C code.

The parser handles pretty much everything, including parsing pattern data, performing file integrity checks, and extracting and converting sample data. If you only need to parse patterns then it might be overkill, consider Shiru's Python converter instead.

source
documentation

Dictionary Encoding of Song Data

Since the advent of music trackers, chiptune modules have traditionally been encoded using a pattern/sequence approach. Considering the constraints of 1980s consumer computer hardware, this is undoubtedly a quite optimal approach.

However, in the present day, perhaps the capabilities of modern computers and cross development offer new possibilities for the encoding of chiptune data?

Back in the day, the two core requirements aside from providing a reasonably efficient storage format were:

  • Fast encoding from the editor side with limited memory, as musicians like to listen to their work in progress instantly at the push of a button.

  • Fast decoding, since enough CPU time must be left to perform other tasks. In the realm of 1-bit music, this is especially critical, since on most target machines we cannot perform any tasks while synthesizing the output. That means that reading in music data must be as fast as possible, in order to minimize row transition noise. For this reason, we tend to avoid using multi-track sequences, which are very efficient in terms of storage size, and are fast to encode, but are much slower to decode than a combined sequence for all tracks.

Obviously nothing has changed regarding the second requirement. However, the first requirement no longer stands, when using editors running on a PC. So, how can we use this to our advantage?


A Dictionary Based Encoding Algorithm

I propose the following alternative to the traditional pattern/sequence encoding:

  1. Starting from a uncompressed module with sequence/pattern data, produce a dictionary of all unique pattern rows in the module. The dictionary may consist of at most 0x7fff bytes, otherwise the module is not a viable candidate.

  2. Construct a sequence as follows:

    1. Replace each step in the original sequence with the rows of corresponding pattern.

    2. Replace the rows with pointers to the corresponding dictionary entries.

    1. Beginning with a window size of 3 and a nesting level of 0, find all consecutive repetitions of the given window in the new sequence and replace them with a control value that holds the window size and number of repetitions. The control value is distinguished by having bit 15 set. A dictionary pointer will never have bit 15 set. (On ZX Spectrum, this would be reversed, as the dictionary will reside in upper memory.)
      Substitution shall not take place inside compressed blocks. So, assuming the following sequence:
      <ctrl window:4 length:4>1 2 3 4 5 6 7 3 4 5 6 7...
      the packer will not replace the second occurance of 3 4 5 6 7, because the first occurance is part of the compressed block <ctrl...>1 2 3 4.

    2. If executing step 3A replaced any chunk, increment nesting level by 1.

    3. Increment window size by 1 and repeat from step 3A until reaching a predefined maximum window size, or reaching a predefined maximum nesting level.


Testing the Algorithm

It all may sound good on paper, but some reliable data is needed to prove it actually works.
For this purpose, I ran some tests on a large set of nearly 1800 XM files obtained from ftp.modland.com.

From the XMs, input modules were prepared as follows:

  • Only note triggers are considered, any other data is discarded.

  • From the extracted note data, all empty rows are discarded.

  • Each note row is prepended with a control value, which has bits set depending on which channels have a note trigger. Empty channels are then discarded.

  • The maximum window size is determined as the length of the longest note pattern, divided by 2.

Some larger modules were included in the test set, just to get a picture how the algorithm would perform on larger data sets. In reality, any module that produces a dictionary larger than 0x7fff bytes would of course not be usable.

I then ran the dictionary encoding algorithm on the prepared modules, using 3 different settings:

  1. No recursion applied (step 3 skipped)

  2. Recursion with a maximum depth (nesting level) of 1

  3. Recursion with a maximum depth of 4


Results

modules tested: 1793

Successes
no recursion: 1641
recursion depth = 1: 1676
recursion depth = 4: 1701
An optimization is considered a success if the dictionary based algorithm produced a smaller file than the pattern/sequence approach.

Average savings
no recursion: 29.1%
recursion, depth = 1: 31.6%
recursion, depth = 4: 34.1%
Average savings ratios are obtaining by comparing output size against the pattern/sequence encoded input data.

Best result:
no recursion: 72.3%
recursion, depth = 1: 91.5%
recursion, depth = 4: 91.7%

Worst result:
no recursion: -110.4%
recursion, depth = 1: -78.2%
recursion, depth = 4: -59.7%

results graphic
.svg version

The results are quite surprising, in my opinion.

  • Even the non-recursing algorithm performs worse than the pattern/sequence approach for 152 modules (8.5% of all files) could not be compressed better. Recursing up to a depth of 4 nesting levels leaves further reduces the failures to 92 files (5.1%). I had expected that the dictionary algorithm would lose to pattern/sequence encoding in at least 1/3 of the cases.

  • I had assumed that dictionary encoding performance would correlate with module size, but the test data does back up this hypothesis. However, when dictionary encoding performs worse than pattern/sequence encoding, it generally happens at smaller module sizes.

  • Recursive dictionary encoding on average offers only marginal benefits over non-recursive dictionary encoding. However, there were a number of cases where recursive encoding performed significantly better. This includes almost all cases where dictionary encoding performed worse that pattern/sequence encoding.

  • Within the margin of recursive dictionary encoding approaches, the algorithm performs significantly better at higher nesting levels.


Conclusions

I think it is safe to conclude that plain dictionary based encoding of chiptune song data (without recursion) offers significant benefits over the traditional pattern/sequence approach. Considering that non-recursively dictionary encoded data can also be decoded easily and quickly, it can definately be considered a viable alternative.

Whether recursive dictionary encoding is worth it remains debatable. I think that the cost in terms of CPU time will be acceptable, as nested recursions will occur infrequently, so it would be comparable to performing a sequence update on pattern/sequence encoded data. However, it will incur a significant cost in terms of CPU registers resp. temporary memory required. I would argue that it depends on the use case. You might consider it if you are intending to optimize small modules, and have registers to spare.


Test Code
https://gist.github.com/utz82/7c52da950 … 1c2bad5b6b

349

(11 replies, posted in General Discussion)

Excellent write-up. Glad to see it is generating quite some attention, too.

350

(166 replies, posted in Sinclair)

I finally found a way of consistently reproducing the "Engine not provide any data" bug that I mentioned above.

To reproduce:

1) load the attached .1tm
2) F5 to play, F5 to stop again
3) Move cursor to last column, press 1

I don't know if it is reproducible under Windows, if not then probably some memory is not being initialized to 0.

Some more info: Bug happens regardless of the engine used. As mentioned, it almost always happens when trying to enter something at the beginning of a block. Once it occurs, it is persistent, ie no data that would trigger a row play can be entered at this position anymore regardless of any further actions taken.