What is that?
It's a duck pond.
Why aren't there any ducks?
I don't know. There's never any ducks.
Then how do you know it's a duck pond?

Friday, September 18

Geek

Daily News Stuff 17 September 2020

Zed Eighty Edition

Tech News

  • Sony ran a gender reveal for its new console, devastating three continents.  (AnandTech)

    $499 for the full version with Blu-Ray drive; $399 for the digital-only version.

    That pushes back fairly hard against both the $299 Sbox and the $499 Xbox.  Smart move by Sony, except for the part where they're probably losing money at that price point.


  • Numbers, how do they work?  (AnandTech)

    Sony also announced the Xperia 5 II, a companion to the Xperia 1 II.

    It's not cheap at $949, but it does have a Snapdragon 865, 2520x1080 120Hz OLED display, 8GB RAM,  128GB or 256GB of flash, microSD slot, headphone jack, wireless charging, and IP65 and IP68 ratings.

    Oops.  Wireless charging is only on the 1 II.


  • Taking the Tiger out for a spin.  (Tom's Hardware)

    A look at Tiger Lake on an Intel reference laptop, with some benchmarks run under Intel's watchful eye, so take it with a grain of salt.  Single-threaded performance - on Geekbench - appears excellent, clearly faster than current Intel laptops and beating a Ryzen 4800U by 40%.  That's a lot, but it is just one benchmark.

    And on the other hand, video encoding with Handbrake ran twice as fast on the 4800U.

    The Intel chip is running at 28W, but for single-threaded tests that is only likely to bump the clock speed up by 2% or so, not a significant factor.

    Intel's Xe graphics more-or-less catch up with AMD too.  Both systems tested used LPDDR4X-4266 RAM, and while AMD is still faster for gaming by 5-20% at 15W, it no longer squishes Intel like a bug.  When the Intel chip is freed up with a 28W TDP it can outpace AMD's 15W part, but then AMD has a 35W part, so you can play that game forever.

    Looking forward to see if that single-threaded performance is real across a broad range of benchmarks, and to what AMD delivers with Zen 3.

    Update: AnandTech have the same Intel reference unit and confirm the great single-threaded performance across a wider range of benchmarks.  They ran the SPEC 2006 and 2017 suites and posted individual as well as composite scores, so there's a lot more than one Geekbench score to chew on here.

    Short summary: If you run Dwarf Fortress, Intel's 11th gen chips are 50% faster than AMD.  If you run Blender, AMD is well over twice as fast as Intel.  And if you run Civilization 6 on integrated graphics, you're a masochist.


  • An LL(1) expression parser in exactly 100 lines of Python.  (GitHub)

    Thanks, I'll take it.

    The only imports are enum and re - the Python regular expression library - and it only uses re to check if a string of characters is numeric, which you can do with the isdigit() method.  So it should be nearly as simple rewritten in Basic.


  • That nibble-mode trick I used for the Dream means I can reasonably offer an upgraded version of the Imagine in Imagine-Emu.

    The Imagine 1200 was launched in 1987.  It offered a faster CPU and DSP - 6MHz vs. 3MHz - with 256k system RAM and 256k video RAM, using 100ns nibble-mode chips to deliver 12MB/sec of bandwidth on each bus.  The system also replaced the earlier 500k and 1M double-density floppy drives with a new extended density (ED) drive with a capacity of 4M.

    Which means that all the tricks the original model could do by stealing cycles on the system bus, this model can do just in VRAM.  And then do more tricks by stealing cycles from the system bus again.

    This version will have 256 bytes of cache on the CPU and DSP, which will speed up the cycle-accurate emulation mode but slow-down the free-running mode.


  • Whatever happened to the Z800?  It was announced in 1980 but never appeared.  Turns out it did eventually show up, much delayed, renamed, and converted to CMOS, as the Z280.

    And it a strange little beast it was too.  Instructions could still only directly address 64k of RAM at a time, but it had a complete paged memory management unit capable of mapping 16MB of RAM, a supervisor mode, and a 256-byte instruction cache.  It even supported multi-processor configurations, as if someone really, really wanted to build a Z80-based Unix system.



    The Z800 / Z280 was a commercial failure, as was the Z380, a 32-bit version of the Z80 with eight banks of registers.  The Z180, though, based on the Hitachi 64180, is still being made today, as is the eZ80, which for under $10 delivers the performance of a 150MHz Z80.  Meaning that by today's standards it's dead slow.


Disclaimer: In the future, everything will be dead slow by today's standards for fifteen minutes.

Posted by: Pixy Misa at 12:37 AM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 795 words, total size 6 kb.

Wednesday, September 16

Geek

Daily News Stuff 16 September 2020

640k Edition

Tech News

  • The numbers are in, and the RTX 3080 is a solid 50% to 60% faster than the RTX 2080.  (Tom's Hardware)

    That means it easily beats the 2080 Ti as well; right now it's the fastest video card there is.

    The 3080 has nearly three times as many CUDA cores as the 2080, and similar clocks, but isn't remotely close to three times the performance.  That's because half the cores in this architecture are the same flexible FP/INT cores as before, while half are simpler FP-only cores.  A 32-bit integer multiplier is actually about twice the size of a 32-bit floating point multiplier, so it makes sense to save space on a chip this big.

    So if the code for a given game uses lots of integer operations, it won't scale nearly as well on this hardware as the raw floating-point numbers would suggest.  But if Nvidia had made all the cores FP/INT, the chip would have been too large to manufacture on Samsung's 8nm node.  Something had to give.

    And there's still the 3090 to come.


  • Apple has announced new iPads.  (AnandTech)

    The iPad Air comes with the new A14 chip, which is the first volume part I know of to come out of TSMC's 5nm process.  The A14 is...  Well, it's slightly faster than the A13.

    The 64GB iPad Air costs A$899, which is exactly as much as my 64GB Retina iPad from 2013.  It does have slightly more pixels, but still no microSD slot.


  • Pure Storage has acquired Portworx in a deal worth $370 million.  (ZDNet)

    I have heard of at least one of those companies.


  • China has immense capacity and expertise for assembling complex equipment.  But without access to technology from the West, it is stuck in 2007.  (New York Times)

    I include Japan, South Korea, and of course Taiwan as part of the West.

    Basically, Huawei is fucked, and the CCP is resolutely determined to make sure it remains fucked.  (Free Beacon)

    Shame.  They made nice tablets.


  • I ran the numbers to work out what sort of hardware the Dream - the 12-bit model in our lineup - would have had, given just two parameters: First, it launched around 1985, and second, it had a 640x360 display.

    The Imagine is a home computer powerful enough to run business apps; the Dream I'm designing as a business computer flexible enough to run decent games.

    So, first, what's the pixel clock for 640x360 @ 60Hz?  I worked out for the Imagine that at 50Hz that resolution needs around 16MHz - and that my existing HSYNC rate of 18.75kHz was within spec for a 720x350 monochrome monitor.  So for 60Hz we just add 20% to both numbers, and we get a 19.2MHz pixel clock and a 22.5kHz HSYNC.

    If we divide the pixel clock by 4 as the base system clock, we get 4.8MHz, and divide that by 22050 and we get 213.3 cycles per line.  Round that down to a nice even 212, multiply back up again, and...  We have a 4.77MHz system.  Huh.  This was meant to be.

    Now, how do we get the data for a 640x360 display in (say) 64 colours, using commodity 1985 DRAM and a 12-bit bus?  On the Imagine I originally wanted a 5MHz memory clock, looked up the databooks, and realised that wasn't feasible in 1983.  Instead I set the clock to 3MHz but used page mode to read two bytes per cycle.

    On the Dream I'm going to use a different readily-available trick from the early 80s, nibble mode, where a common 256k x1 DRAM chip could stream out four successive bits in a row at much faster rates than regular random access.  Looking through Toshiba's 1984 memory databook, I could hit a 2MHz bus clock with nibble mode on using 150ns RAM, 2.5MHz with 120ns, and 3MHz with 100ns.

    Conveniently, 120ns RAM, not too exotic, lets me pin the memory clock at half the CPU clock.

    So the video controller has 106 memory cycles per scan line (half the 212 we calculated earlier), each delivering four (12 bit) bytes using nibble mode.  Assuming 80 cycles are in the visible area (it was about 75% of the scan line on a typical monitor, so that's close enough) we need 8 pixels per cycle to get a 640 pixel line, and that gives us 6 bits per pixel for 64 colours.

    Which is not a cosmic coincidence; that's what was supposed to pop out at the end from the numbers I fed in at the start.  It just means I did the maths right.

    The only problem is that it still can't do 80 column text mode.  80 columns of text in graphics mode, sure, no problem at all.  80 column text mode, no.  We'd need 80 random accesses, because the character data won't be sequential, and we only have 80 memory cycles per line, no cycles free to read the text map.

    For that we'd need a separate text RAM and....  Well, I could just shove a separate text RAM into this one.  I ruled it out for the Imagine to keep it cheap and simple for the home market, but this is explicitly a more business-oriented machine.

    The Dream won't have the dual-bus architecture of the Imagine: The video chip can't directly access system RAM, and the CPU can't directly access video RAM.  But that leaves me lots of imaginary pins free on the chips to do other stuff, such as having 64k of text RAM in addition to the 256k of main video RAM.

    Do the numbers work out?

    - 12 pins for the CPU interface
    - 12 + 9 pins for main memory
    - 12 + 8 pins for text memory
    - 12 pins for pixel output

    Total 65, plus a minimum of a dozen more for control signals.  With an 84-pin PLCC we can just about do it.  Okay.  That's what it is.  Three 4464 chips in page mode for the text or tile data, twelve 41256 chips in nibble mode for the character cells or bitmap data.

    Hmm.  Does the Dream always run in character mode?  Maybe it does.  Maybe it has no pure bitmap mode, just 4096 user-defined characters.  I was thinking of giving this thing hardware windowing, like the Intel 82786, but the hell with that.  It's going to be a 12-bit Microbee Gamma.

    I wonder if there's a YouTube video of that one?  Those things were rarer than hen's teeth, and I only ever got to touch one for a few minutes at a computer show.



    Looks like someone got to hang onto one for slightly longer than that.  It had a 720x350 display - MDA / Hercules resolution, but in 4096 colours.  So I'm pretty much on target.

    Unlike the Amiga - and very much like the TRS-80 Model 16 - the Microbee Gamma could run Unix.  It had an 8MHz 68000 and two 4MHz Z80s, one to handle display tasks, and the other to handle I/O.  The separate I/O processor allowed the 68000 to properly handle page faults, which would otherwise require a 68010 or later chip


  • I can probably figure out a way to line up the character cells so that you can draw to them as if they were a bitmap.  I'm not a total sadist.  At least not when I'm the one who's going to be writing the graphics library for this beastie.

    To unpack a bit: The Intel 82786 (IEEE.org) let you define areas of the screen to be drawn from different areas of RAM - hardware windows - though you could only have a limited number of those because there were only so many registers on the chip.

    With the Dream's bus design, hardware windows in 640x360 64-colour mode would have to align to 8-pixel boundaries, because we can only switch hardware windows on a new bus cycle, and we read 8 pixels per bus cycle.  And we read 8 pixels per bus cycle because it's the only way we can make it fast enough.

    Now it just so happens that our character cells are also 8 pixels wide.  If graphics are drawn into character cells, we can do proper bitmapped hardware windows in text mode.  Which means we can move and scroll windows 32 times as fast as moving the actual pixels - two bytes instead of 64 bytes for an 8x16 64-colour character cell.

    And this is precisely what the Microbee Gamma did, and it worked really well.  The original Amiga didn't live-drag window contents when you moved a window, just the outline.  The Gamma smoothly dragged the window contents, even if they were actively updating at the time.  For 1986 - I think I saw it in '86 - that was really neat.

    One minor side-effect of that was that the Gamma had wide window borders - you can see that in the video above.  Yes, that was partly because it was a mid-80s system with a relatively low-resolution monitor, but also because the window borders had to be whole characters.

    Anyway, this is the Dream.  The Imagine has a squintillion different graphics modes - different bit depths, pixel packings, switchable palettes, switchable resolutions, text mode, graphics text mode, fill mode, HAM mode, RLL-compressed RGBA overlays with selectable alpha channel arithmetic...

    The Dream does none of that.   Text mode is graphics mode and graphics mode is text mode.

    Update: Yes, we can draw into the character map as if it were a 1024x512 64-colour bitmap, albeit only at the base 2.385 MHz memory clock, not using nibble mode.  We can copy rectangles in nibble mode, though.  Or you can view it as 4096 programmable 8x16 characters with 64 colours.

    There will also be sprites of some sort.  Let's say 16 of them, 16 pixels wide, eight colours, switchable to eight pixels wide and 64 colours.  That fits nicely into our bus cycle and our horizontal blanking period. 

    Update 2: Or I could go with the previous plan and have a separate sprite chip with its own 64k of RAM.  That works nicely, except that now the system has 704k of RAM.  Oh well, can't have everything.  It would help the Dream compete with the Imagine's clever hardware -compressed overlay system, which allows arbitrary numbers of sprites.

    Update 3: No, I have it!  The text RAM is the sprite RAM.  So you can have this neat accelerated super text/tile mode with 16 sprites, or a regular bitmap and 512 sprites.  And now we have 640k again.  This makes a lot of sense.  Why would a business machine have an overpowered sprite processor?  Because the hardware designers snuck it in as an alternate mode for the 80-column text system.  Just don't ask me why it has an overpowered audio processor.

    There's a trick I might steal from the SNES as well - the SNES was an amazing assemblage of tricks.  It could hardware scroll individual segments of the screen.  In text mode the Dream will use two bytes per character - one to select one of 4096 characters, the other to select from 64 foreground and background colours.

    In tile mode - pseudo-bitmap mode - we don't need to select those colours because the tiles themselves are in 64 colours.  So we can steal 7 bits to allow us to rotate the contents of the cell horizontally and vertically.  And still have 5 bits left to do other stupid stuff with.

    This won't scroll a whole area, but if you want to animate a tiled background, you can do it pixel-smooth just by updating the text map.  With one tweak to the video hardware, to pre-fetch a zeroth byte on each line, we can smooth-scroll an entire screen containing multiple independent smooth-scrolling windows just by updating the text map.


  • Oh, and the other thing.  The Dream will have 256k main memory, 256k graphics memory, 64k text memory, and 64k for the sound chip.  So it runs at 4.77MHz and has 640k of RAM.


  • What if someone wants to run Imagine-Emu on a Raspberry Pi?  Does Nim even work on the Raspberry Pi?  What's that, Lassie?  It not only works, it's available as a standard package?

    Well, okay then.  Transpiling via C has its benefits.


Disclaimer: I still prefer Crystal though.

Posted by: Pixy Misa at 10:58 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 2057 words, total size 14 kb.

Geek

Daily News Stuff 15 September 2020

Gone Rogue Edition

Tech News



Disclaimer: And stay out.

Posted by: Pixy Misa at 03:41 AM | Comments (6) | Add Comment | Trackbacks (Suck)
Post contains 309 words, total size 4 kb.

Tuesday, September 15

Geek

Daily News Stuff 14 September 2020

Oh Yes Edition

Tech News

  • How to write a Basic compiler in Python, Part 3: Code generation.

    I knew I'd blogged about something I could use.  Now, this does compile to C, but it compiles to really dumb C, s=s+c; b=b+1; level C.  Basically it's used as a portable assembler.  That's fine.

    And it's self-contained, not using any lexer or parser libraries, so it's a good starting point for something that will eventually be translated into itself.


  • Writing A Compiler In Go might also be a useful source.

    It takes a similar approach - no tools or libraries used - to produce a complete compiler for a simple programming language.  Of course I'm no Go programmer, but I can read Go code.  Mostly.  Had to for work.  Don't ask.


  • Nvidia is buying Arm for $40 billion.  (AnandTech)

    This has made a lot of people very angry and been widely regarded as a bad move.


  • Microsoft announced a major win today.  (Thurrott.com)

    Not buying TikTok is probably the company's smartest move since they didn't buy Yahoo.


  • I always thought that the claims coming from Nikola seemed a bit overblown.  (WCCFTech)

    Of course that also seemed true of Tesla and SpaceX, and yet they delivered the goods.


  • Your government at work.




  • Adjusted the hardware design of the Imagine just a little.  Basically, the idea is that the Imagine's CPU is a microcontroller with multiple register banks for fast interrupt servicing - like the Z80 and 8051 - and the DSP is a variant of that, with eight banks of the user registers but only one bank of system registers.

    The DSP also has a nominal 256 bytes of on-chip mask-programmed ROM containing a set of wavetable synthesis algorithms.  These changes achieve two things: It makes it really a wavetable synthesis chip, albeit one developers can tinker with; and it drastically reduces activity on the system bus.  The initial version of the design would have used around 70% of the bus for 5 stereo voices; this version is a little under 10%.


  • I might steal a trick from the HP 150 and have text mode on the Imagine double the pixel clock.  It has the bandwidth to do this; text mode basically wastes one byte every time it reads the character data, so it makes no difference if it reads and uses two bytes.  That would allow it to output readable 80-column text on colour screens (960x270 with the new subpixels), and beautiful 80-column text on monochrome screens (now 1280x360).  Need memory for those fonts though.


  • Working on an emulator generator.  Given the processor definitions, it spits out Nim code for the CPU-specific emulator class, an assembler, a disassembler, and a simple machine-language monitor like the old MS-DOS DEBUG.

    The next step would be to have this spit out a code generator back-end for the compiler as well.  That is probably possible.  Certainly possible for the 10, 11, and 12 bit models, which all ended up with the same underlying Super 6809 design just with progressively more and larger registers.


Disclaimer: For small values of "work".

Posted by: Pixy Misa at 12:35 AM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 519 words, total size 4 kb.

Monday, September 14

Geek

Daily News Stuff 13 September 2020

Pieberry Edition

Tech News

  • Raspberry Pi overkill for you needs?  The Iconikal SBC is $7.99 on Amazon.  (Tom's Hardware)

    That includes - or included; it's currently sold out for some reason - a Rockchip RK3328 with four 1.5GHz A53 cores, 1GB RAM, a free 16GB microSD that you probably shouldn't trust, a power supply, and a 16x2 character LCD screen.


  • The next SpaceX Starship prototype will have the final nosecone and control surfaces and attempt a 60,000 foot flight.  (Tech Crunch)

    Per explodia ad astra.


  • Trying out ASRock's DeskMini X300 with Ryzen 4000G.  (WCCFTech)

    The first time I saw a picture of this thing I thought to myself, what a crappy, cheap case.  Turns out it was more of a crappy, cheap photograph; it's brushed aluminium that they somehow made look like poorly-molded plastic.

    It's a little larger than a NUC - it's a standard mini-STX form-factor measuring 6" x 6" x 3" where the typical NUC is 4" x 4" x2" - but on the other hand it can fit APUs up to 65W, two M.2 NVMe drives, and two 2.5" SATA drives.

    As expected, with a Ryzen 4000 APU it goes vroom.  Not recommended for overclocking though.


  • Windows 10 is getting a new Start Menu.  (Bleeping Computer)

    I only ever see the default start menu for as long as it takes me to install Start10, so I really couldn't comment.  It will suck though, that much is certain.  Couldn't comment apart from that it is certain to suck.


  • I should make the MUL, DIV, and MAC instructions on the Imagine two bytes, shouldn't I?  It won't affect the performance at all, except if you're multiplying by a 10-bit immediate value, and it frees up two whole pages of opcodes.


Retrocomputing Video of the Day



This looks a lot like the HP 200 Model 16, a tiny 68000-based workstation.  It's not, though; it's the HP 150, a tiny 8088-based touchscreen PC.

He's going to do a follow-up video on the disk drive unit, which is the same one as the Model 16.  I'll be interested to see that, since I've heard but can't confirm that these drives ran at 600 RPM, twice as fast as any normal 3.5" drive.

He notes an interesting point on the text mode used on this and other old HP devices.  The 150 has an 80x27 text display made up of 9x14 pixel characters.  But it then says fuck it, I'll do what I want and shifts individual pixels by a half pixel width or widens them by one third as needed to make individual characters more legible.

That's some trick.  Makes me want to reproduce it in my emulator, though you'd need a 4320-pixel-wide display to do that exactly.


Disclaimer: It's basically a bunch of dinosaurs in an industrial blender.

Posted by: Pixy Misa at 12:23 AM | Comments (5) | Add Comment | Trackbacks (Suck)
Post contains 473 words, total size 4 kb.

Sunday, September 13

Geek

Imaginary Code

Here's the full programming model of the Imagine CPU and DSP.  I'll write a proper manual as I go; this is mostly to give an idea of what the processor will look like, and as a document for myself to make sure I haven't either (a) missed anything critical or worse (b) run out of opcodes.

Let's see if I can get a bunch of preformatted text in place without the Minx editor reducing it to mush...  Yes, on my third try.

Couple of things that fell out of this proper run-through of the full programming model:
  • We now have eight stack pointers!  The bits were there to be used, so why the hell not.  All four index registers, the two reserved stack pointers, the loop counter, and even the program counter can be used as stack pointers.

  • Why the hell would you use the program counter as a stack pointer?  Well, for pushing data it would be very weird.  But for popping data, you can read up to ten registers at once from immediate data:

    POP P, WXYZLT will read read the four main index registers and two alternate index registers with a single two-byte instruction; normally that would take six separate instructions.  Great if you're setting up for some complex graphics algorithm.

    And POP P, WXYZSPLURT will set up all the index registers and stack pointers, and branch to a new place in the code.  If you don't trash the stack you might be able to turn it into a subroutine call.

    PUSH P on the other hand will push the specified registers into the current program, in reverse order, and then execute them.  I can't imagine why you would want to do that, but you can.

  • Similarly, I borrowed LEA - Load Effective Address - from the 6809.  But you can also LEAP - that is, LEA to P, the program counter.  Since LEA supports base+offset addressing and indirect addressing simultaneously, that provides us with both fixed and relative jump tables, without needing an instruction specific for that.

    In fact, you can even LEAP into an interpolated jump table.  There's nothing stopping you.

  • The general word-size bit fell by the wayside.  There's just not enough opcodes on a 10-bit design.  Instead I reserved a code page (32 opcodes) for 20-bit instructions, which gives us 15 available bits for that mode.

    I haven't really started on the details of 20-bit mode.  Having 32 times the opcode space is liberating to the point of paralysis.  Maybe the early Imagine models didn't implement 20-bit mode.

  • One group of registers spells SPLURT.  Another spells out QOFI.  The remainder are ABCD and WXYZ which don't really spell out anything.

  • The .B and .W markers are almost always optional; the assembler can distinguish the mode either from the registers used or from the size of the immediate data.  For example, BRA $F0 means BRA.B and BRA $000F0 means BRA.W.  Of course the two instructions do esentially the same thing.

  • I've done a draft of the Dream programming model as well; it's very similar albeit with another eight registers and with some rough corners smoothed off.  The Imagine needs some specialised opcodes for handling registers outside the two main sets (ABCD and WXYZ).  The Dream has enough bits in its register selection to handle that in an orderly manner.

    This is a mixed blessing.  Want to add accumulator A to the flags register F?  The Imagine very sensibly has no opcode for such a dumb instruction, but the Dream is happy to oblige.


Update 2020-09-14
  • In the indexing postbyte:
    • A value of 7 now codes for no base, allowing for indirect mode
    • In the offset, 12 through 15 code for WH through ZH for LEAF mode.
    • The loop counter L is no longer available as a base, only as an offset.
    • Timer / alternate index register T is not available at all.

  • The A100 and A101 segmented microprocessors now have two register banks, including all segment base and size registers.
  • The A102 and A103 (non-segmented) microcontrollers now have four register banks, but of course no segment registers.
  • The A108 DSP and A109 ASP now have eight banks of general-purpose registers - accumulators ABCD and WXYZ - but only one bank of the remaining registers.  (Also no segment registers.)

more...

Posted by: Pixy Misa at 04:49 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 1593 words, total size 11 kb.

Geek

Imaginary Sounds

Previously I worked out a potential DSP loop for wave table synthesis for the Imagine.  It looked something like this:

Instruction     Cycles Comment
BANK 1          1      'Use embedded memory bank 1
LD W, R0        1      'Load the base register for the voice
LD X, R2        1      'Load the current offset
ADD X, R4       1      'Add the step
CMP XH, R7      1      'Compare the high byte with the sample size
IFGE 1          1      'Next instruction only executes if the offset is past the end of the sample
CLR X           1      'Set the offset back to the start
ST X, R2        1      'And save it
ADD W, XH       1      'Add the high byte of the current offset to the address
LD B, (W)       2      'Load the sample into the low byte of AB
MAC AL, B, R8   1+N    'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, R9   1+N    'And the same for the right

That was before working out the full instruction encodings.  Earlier I added in a switchable bank of 10 memory-mapped registers to reduce the number of memory accesses required, but that is between tricky and impossible to map into the opcode space - and there's actually a better way to do it, that we can steal from the Z80.

When working through the CPU programming model I also added in a LEAF instruction - Load Effective Address, Fractional - explicitly for table interpolation.  With this new model our algorithm turns out somewhat different:

Instruction     Cycles  Comment
GRP 1           1       'Switch to register group 1
LEAF X, X+Y     2       'Calculate an effective address in interpolated mode
CMP X, Z        1       'Check if we've reached the end of the sample
IFGE 1          1       'Next instruction only executes if the offset is past the end of the sample
  LD X, W       1       'Reset to the first byte of the sample
LD B, (X)       2       'Load the sample
MAC AL, B, C    1+N     'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N     'And the same for the right

Oops, that's wrong.  Sorry about your burned-out speakers.  Let's try again.

Instruction     Cycles  Comment
BANK 1          1       'Switch to register group 1
ADD X, Y        1       'Increment the offset by the step size
AND X, Z        1       'Restrict the offset to the sample size
LD B, (W+XH)    2       'Load the sample
MAC AL, B, C    1+N     'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N     'And the same for the right

The trick we've stolen from the Z80 is just to have multiple sets of registers.  The Z80 had two register sets in 1976, we need five or six in 1983.  That doesn't seem too implausible.

The LEAF instruction replaces the complicated high/low byte address fiddling.  It only saves three cycles but it's a lot easier to read.  If you're reading through some Imagine assembler and you see LEAF, you immediately know it's doing some kind of table interpolation.  (And if you see LEAP, that's a jump table.  It's actually LEA P, but the assembler will accept LEAP.)

Update 2020-09-14

We no longer need to use the LEAF instruction explicitly; if you specify (W+XH) as the index mode in any instruction that can take indexing postbyte, it calculates the address in interpolated mode.

This version of to code removes the CMP / IF / CLR logic and goes with the method used in the Ensoniq 5503 as found in the Apple IIgs.  Sample banks are a fixed power-of-two length - 64, 128, 256, 512, or 1024 bytes - so we can simply AND the offset with a bit mask to clip it to the appropriate range.

One other thing: When I was first thinking of the custom chips for the Imagine I called this a fixed-function DSP, even though it was fully-programmable.  It just seemed like a good name.  Then I realised exactly why this version of the chip would be described as fixed-function.

It has a small - maybe 256 bytes - mask-programmed ROM containing a bunch of DSP algorithms, like the code above.  You can still write your own custom algorithms, but if you're running code out of the on-board ROM, and you have your settings loaded into the registers, we only need to access main memory three times per sample: Two reads to issue the subroutine call, and one when the subroutine loads the sample data.

That means that for a sample rate of 18.75kHz - our nominal HSYNC rate and a useful audio sample rate - we'd make 187,500 access to main memory per second, 6.25% of available cycles.

And now we have a wavetable chip that makes sense.  The only remaining question is, does it have six entire register banks, or do we go back to the 64-byte onboard RAM?

With the latter, the code could look like this (instructions in bold access main memory).

Instruction     Cycles   Comment
LD U, $00000    2        'Point the user stack into on-chip RAM
JSR $41E        2        'This is the nominal location in mask ROM
POP U, WXYZ     5        'Load all the sample bank settings
POP U, CD       3        'Load the volume settings
ADD X, Y        1        'Increment the offset by the step size
AND X, Z        1        'Restrict the offset to the sample size
ST X, (U-8)     3        'Write the offset back to RAM
LD B, (W+XH)    2        'Load the sample
MAC AL, B, C    1+N      'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N      'And the same for the right
RET             2        'Return and process the next request in the queue

That's a lot more cycles than before, but now we don't need to have half a dozen complete register banks, and if we want just one more voice the ROM code spills neatly to main memory without any changes, which is good because you can't change it.

On the downside, we're now at 23+2N cycles, which is right back where we started.  Having multiple register banks is magical for this application.


Update 2020-09-14 Afternoon

Executive decision: The A100 and A101 segmented processors had two full register banks (all of ABCD, WXYZ, PLUS, and QOFI, only excluding R and T, plus the associated base and size registers for each segment).

The A102 and A103 microcontrollers lacked segmentation but had four full register banks.  This allowed for three different interrupt priorities each with zero-cycle latency.

The A108 DSP and A109 ASP had eight user register banks (just ABCD and WXYZ being switchable).

The A100, A102, and A108 are more expensive parts with Harvard architectures - dual busses separating instructions from data.

The Imagine 1000 uses the A103 and A109.  The subsequent Imagine 1100 uses the A119, an updated A109 with four full register banks and eight additional user register banks, for both its CPU and DSP.

Posted by: Pixy Misa at 04:34 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 1155 words, total size 10 kb.

Geek

Daily News Stuff 12 September 2020

ABCD Goldfish Edition

Tech News

  • I sat down this evening to put in an hour or two on Imagine-Emu, starting by updating PyCharm and getting some proper Nim support, and fairly quickly realised two things.

    First, while pairing 10-bit registers A, B, C, and D into 20-bit registers AB and CD might be historically accurate and aesthetically pleasing, it turns the emulator code into a bramble of special cases.

    And second, I shouldn't be writing a CPU emulator at all.  I should be writing a CPU emulator generator.

    These five architectures have between them a total of 15,872 opcodes.  Just write a program that loops through all the general cases and the special cases and spits out the exact code required to implement each one.  Each emulator then consists of some register and memory definitions, a few helper functions like the indexed address calculator, and one huge case statement.

    Same applies to the DSPs that exist in four of the five machines.  Graphics hardware I will need to write some specific code for.

    After pulling out some misfeatures from my original design, the Imagine and the Dream CPUs are looking very similar.  The main difference is that the Dream has twice as many general-purpose registers - eight each in 12-bit and 24-bit lengths.  The Mirage will likely slot in between the two, with four accumulators and eight general purpose registers.


  • Or of course you could just download an Amiga.  (Tom's Hardware)

    Workbench 3.1 was kind of ugly.  The original blue 1.x series was nicer.


  • A benchmark for a Radeon RX 6000 has leaked, putting it in line with an Nvidia RTX 2080 Ti.  (WCCFTech)

    Which is right this minute the fastest graphics card available but very shortly will not be.

    However, it's not clear what card actually leaked, whether this is the long-awaited Big Navi or merely the new high-midrange part.  We'll see, at some point, probably.


  • 6000 Euros for a plastic toy car?  (The Guardian)

    Albeit a plastic toy car that seats two slim adults and can hit 45km/h.


  • Just say false.  (NoYAML)

    YAML is readable JSON.  That turned out to be a really bad idea.

    Also, that "If SQL were built on YAML" example?  That's Elasticsearch, only Elasticsearch is much less elegant and changes arbitrarily every few months.


  • Apple continues to shit all over its customers.  (Thurrott.com)

    Their latest bright idea is that game streaming services must submit every streamable game to the App Store as an individual app.  The streaming service app itself then acts as a catalog for all the other apps - which are all absolutely identical because all they do is stream a game.  And for this unmitigated fuckery Apple will skim 30% off the top of every single transaction.


  • Or you could just say fuck you, Apple and get an Xbox Series S for $25 per month including constantly updated libraries of downloadable and streamable games from Microsoft and EA.  No payment up front and you own the device outright after two years.  It actually works out $60 cheaper than buying the device and paying for the service for two years.


  • In Australia the Series S is $499 and the Series X is $749.  With exchange rates and sales tax that makes the S about 10% more expensive here and the X about 1% cheaper.


  • Noted.



Disclaimer: MNO goldfish.

more...

Posted by: Pixy Misa at 12:53 AM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 1036 words, total size 8 kb.

Saturday, September 12

Geek

Mirage-11

I haven't worked out many details of this one just yet, except that it's going to be a response to the Imagine.  Everywhere that the Imagine is limited because it only has 10 bits, the Mirage is going to jump in with both feet.  And 11 bits.  Mostly with 11 bits, in fact.

Any feature I kind of want to put into the Imagine but reject because it is unreasonable goes into the Mirage.  It will hypothetically have come out a couple of years later - say, 1986 - so that I have more transistors to play with and can get away with more nonsense.
  • It will use fast external SRAM for colour lookup, with a total of 2048 registers.  You can freely select a palette of 2048 colours from a total of 2048, just because.
  • Standard 256k each of system RAM and VRAM, expandable to 512k, and with later memory chips, to as much as 2M, just because.
  • Fast page mode DRAM that can run for a full page, not just for two-word bursts, for up to double the VRAM bandwidth, just because.
  • Two 1.8M floppy drives, one on either side of the wedge-shaped case, just because.
  • Multi-threaded CPU / DSP.  The Imagine's CPU and DSP chips are effectively merged into one device running twice as fast, with multiple hardware threads and cycle-level scheduling.

Posted by: Pixy Misa at 03:37 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 226 words, total size 1 kb.

Geek

Phantom Nine

<mode wiki on>

The Phantom Nine is a 9/18 bit home computer introduced in November 1986 to compete with the increasingly popular MSX systems.  The Phantom's 12MHz Y90 CPU is a rare variation on Zilog's Z80: All word sizes are increased from 8 to 9 or from 16 to 18 bits as appropriate, but the instruction set is largely unaltered and Z80 assembler code can be ported to the Y90 with minimal effort.

The Phantom Nine boots from an internal 2.2M 3½" floppy drive, having only 8k of ROM for boot code, a basic font, and a simple machine-language monitor.

This is because the Phantom does not support address extension, segmentation, or bank-switching; its 18-bit address space is fully populated by 256k of standard RAM.  The boot ROM itself is only mapped into address space long enough to copy itself into high memory.

The Phantom Nine features five custom chips:
  • Two video display controllers each with 64k of RAM, displaying 240x270 in 512 colours or 480x270 with two 16-colour palettes.  Each video controller supports any combination of scrolling, scaling, shearing, and reflection of the pixel map.
  • One sprite controller with its own 64k of RAM, supporting up to 600 8-colour 18x18 sprites.
  • One video mapper that combines the palette and priority selectors and pixel data from the video and sprite controller chips, and merges them into a single display.  The video mapper includes 96 colour registers shared by the video and sprite controllers.
  • One sound controller with its own 64k of RAM, supporting 9 voices of FM or PCM sound.

The video controllers have no hardware acceleration for drawing graphics; instead the Y90's block mode operations are used via the VDC interface registers to move or copy data.

The Phantom Nine received one major hardware update, in 1988.  The Phantom 9/512 has 512k total RAM, the extra 256k functioning by default as a RAM disk, though it can be directly addressed by the Y90's I/O instructions, allowing it to be used as heap space (for example, for Basic program variables) though not for executable code.

The 9/512 also includes an extended-density 4.4M floppy drive.

Unusually for systems of this era, the Phantom Nine never officially ceased production.  The parent company still sold new stock Phantom 9/512 systems on their website as of July 2019, with the choice of either the original 4.4M floppy drive, or an SD card reader and USB adaptor that fits into the floppy drive bay.

This adaptor uses a 48 MHz Arm microcontroller to interface the USB and SD card signals with the onboard floppy controller, meaning that it is significantly more powerful than the Phantom Nine itself.

A much later pin-compatible Y99 CMOS microcontroller running at 24 MHz can be swapped into both Phantom models with no other modifications required; the clock speed is selected by a previously unimplemented instruction sequence.  The Y99 supports numerous additional instructions and also executes many existing instructions in fewer cycles, while drawing approximately one-fifth the power of the original Y90 chip.

<mode wiki off>

Posted by: Pixy Misa at 02:53 PM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 507 words, total size 3 kb.

<< Page 2 of 501 >>
112kb generated in CPU 0.09, elapsed 0.6721 seconds.
55 queries taking 0.6048 seconds, 330 records returned.
Powered by Minx 1.1.6c-pink.
Using https / https://ai.mee.nu / 328