Sunday, September 13

Geek

Imaginary Sounds

Previously I worked out a potential DSP loop for wave table synthesis for the Imagine.  It looked something like this:

Instruction     Cycles Comment
BANK 1          1      'Use embedded memory bank 1
LD W, R0        1      'Load the base register for the voice
LD X, R2        1      'Load the current offset
ADD X, R4       1      'Add the step
CMP XH, R7      1      'Compare the high byte with the sample size
IFGE 1          1      'Next instruction only executes if the offset is past the end of the sample
CLR X           1      'Set the offset back to the start
ST X, R2        1      'And save it
ADD W, XH       1      'Add the high byte of the current offset to the address
LD B, (W)       2      'Load the sample into the low byte of AB
MAC AL, B, R8   1+N    'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, R9   1+N    'And the same for the right

That was before working out the full instruction encodings.  Earlier I added in a switchable bank of 10 memory-mapped registers to reduce the number of memory accesses required, but that is between tricky and impossible to map into the opcode space - and there's actually a better way to do it, that we can steal from the Z80.

When working through the CPU programming model I also added in a LEAF instruction - Load Effective Address, Fractional - explicitly for table interpolation.  With this new model our algorithm turns out somewhat different:

Instruction     Cycles  Comment
GRP 1           1       'Switch to register group 1
LEAF X, X+Y     2       'Calculate an effective address in interpolated mode
CMP X, Z        1       'Check if we've reached the end of the sample
IFGE 1          1       'Next instruction only executes if the offset is past the end of the sample
  LD X, W       1       'Reset to the first byte of the sample
LD B, (X)       2       'Load the sample
MAC AL, B, C    1+N     'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N     'And the same for the right

Oops, that's wrong.  Sorry about your burned-out speakers.  Let's try again.

Instruction     Cycles  Comment
BANK 1          1       'Switch to register group 1
ADD X, Y        1       'Increment the offset by the step size
AND X, Z        1       'Restrict the offset to the sample size
LD B, (W+XH)    2       'Load the sample
MAC AL, B, C    1+N     'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N     'And the same for the right

The trick we've stolen from the Z80 is just to have multiple sets of registers.  The Z80 had two register sets in 1976, we need five or six in 1983.  That doesn't seem too implausible.

The LEAF instruction replaces the complicated high/low byte address fiddling.  It only saves three cycles but it's a lot easier to read.  If you're reading through some Imagine assembler and you see LEAF, you immediately know it's doing some kind of table interpolation.  (And if you see LEAP, that's a jump table.  It's actually LEA P, but the assembler will accept LEAP.)

Update 2020-09-14

We no longer need to use the LEAF instruction explicitly; if you specify (W+XH) as the index mode in any instruction that can take indexing postbyte, it calculates the address in interpolated mode.

This version of to code removes the CMP / IF / CLR logic and goes with the method used in the Ensoniq 5503 as found in the Apple IIgs.  Sample banks are a fixed power-of-two length - 64, 128, 256, 512, or 1024 bytes - so we can simply AND the offset with a bit mask to clip it to the appropriate range.

One other thing: When I was first thinking of the custom chips for the Imagine I called this a fixed-function DSP, even though it was fully-programmable.  It just seemed like a good name.  Then I realised exactly why this version of the chip would be described as fixed-function.

It has a small - maybe 256 bytes - mask-programmed ROM containing a bunch of DSP algorithms, like the code above.  You can still write your own custom algorithms, but if you're running code out of the on-board ROM, and you have your settings loaded into the registers, we only need to access main memory three times per sample: Two reads to issue the subroutine call, and one when the subroutine loads the sample data.

That means that for a sample rate of 18.75kHz - our nominal HSYNC rate and a useful audio sample rate - we'd make 187,500 access to main memory per second, 6.25% of available cycles.

And now we have a wavetable chip that makes sense.  The only remaining question is, does it have six entire register banks, or do we go back to the 64-byte onboard RAM?

With the latter, the code could look like this (instructions in bold access main memory).

Instruction     Cycles   Comment
LD U, $00000    2        'Point the user stack into on-chip RAM
JSR $41E        2        'This is the nominal location in mask ROM
POP U, WXYZ     5        'Load all the sample bank settings
POP U, CD       3        'Load the volume settings
ADD X, Y        1        'Increment the offset by the step size
AND X, Z        1        'Restrict the offset to the sample size
ST X, (U-8)     3        'Write the offset back to RAM
LD B, (W+XH)    2        'Load the sample
MAC AL, B, C    1+N      'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N      'And the same for the right
RET             2        'Return and process the next request in the queue

That's a lot more cycles than before, but now we don't need to have half a dozen complete register banks, and if we want just one more voice the ROM code spills neatly to main memory without any changes, which is good because you can't change it.

On the downside, we're now at 23+2N cycles, which is right back where we started.  Having multiple register banks is magical for this application.


Update 2020-09-14 Afternoon

Executive decision: The A100 and A101 segmented processors had two full register banks (all of ABCD, WXYZ, PLUS, and QOFI, only excluding R and T, plus the associated base and size registers for each segment).

The A102 and A103 microcontrollers lacked segmentation but had four full register banks.  This allowed for three different interrupt priorities each with zero-cycle latency.

The A108 DSP and A109 ASP had eight user register banks (just ABCD and WXYZ being switchable).

The A100, A102, and A108 are more expensive parts with Harvard architectures - dual busses separating instructions from data.

The Imagine 1000 uses the A103 and A109.  The subsequent Imagine 1100 uses the A119, an updated A109 with four full register banks and eight additional user register banks, for both its CPU and DSP.

Posted by: Pixy Misa at 04:34 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 1155 words, total size 10 kb.




Apple pies are delicious. But never mind apple pies. What colour is a green orange?




53kb generated in CPU 0.03, elapsed 0.2045 seconds.
56 queries taking 0.1803 seconds, 295 records returned.
Powered by Minx 1.1.6c-pink.