Say Weeeeeee!
Ahhhhhh!

Sunday, September 20

Geek

Daily News Stuff 20 September 2020

Dundundundundun Edition

Tech News

  • Had the soundtrack to a side-scrolling shooter from the Amiga running through my head this evening.  Could not even remember the names of any of the side-scrolling shooters I played back then.  So I asked YouTube.

    It was Menace.



    That track, the one that plays right at the start, that's what I had stuck in my head.

    If you didn't play this game on the Amiga, though, you would have had completely different memories of it.  Mostly sad ones.



    Menace was created by DMA Design, who also created Lemmings and subsequently a little title named Race'n'Chase.

    Better known today as Grand Theft Auto.


  • Just watching that video I can see the Amiga is using:

      •  Dual-playfield mode (for parallax scrolling).
      •  Hardware sprites.
      •  Blitter objects.
      •  Copper (display list) programming to change the video mode on the fly.
      •  Four-channel PCM audio.  Well, I can't see that one.

    The other three systems have none of those features, and as a result those ports of the game all kind of suck.


  • Went out to lunch today with my brother and sister-in-law, who live locally.  Most things were back to normal, the shopping mall was bustling, and the asian fusion restaurant that does the amazing gluten-free pad siew was still in business and open.

    The Apple store, on the other hand, had a queue where they took your details, sanitised your hands, and gave you a mask to wear inside.  We did not go into the Apple store.


  • Behind every warning label lies a story.  The same is likely true of this motherboard with 20 USB ports.  (Tom's Hardware)

    I'm not sure I want to hear it though.


  • Spider Man, Spider Man, does whatever a 105GB download can.  (WCCFTech)

    That's only approximately one million Imagine 1000 ROM cartridges.


  • Apple has reportedly booked 100% of TSMC's 5nm production capacity.  (ExtremeTech)

    To be fair, Apple, or rather the fools who buy Apple's overpriced toys (cough ignore the retina iMac to my left cough) paid for TSMC's massive CAPEX that enabled the 5nm process in the first place.


  • If at first you don't succeed, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try, try again.  (Phoronix)

    Intel have submitted patches to include support for their secure enclave in Linux.  For the 38th time.


  • It took me about two minutes to identify Menace on YouTube from just a half-remembered theme tune.  Find a video of the top ten Amiga side-scrolling shooters and flip through them going nope, nope, nope, cool but nope, THERE, THAT'S IT!

    Some people take a bit longer to find the game they're nostalgic for. (Break Into Chat)

    Like, seven years longer.


  • Russia has announced plans to explore Venus.  (EuroNews)

    And to be fair, Russia's track record with its Venus probes is as good as its record with its Mars missions is bad.


  • I've moved the launch of the Mirage (the 11 bit architecture in my emulator) back to 1984 and downgraded the hardware to match.

    It now has specs very similar to the original Imagine from 1983, but instead of 128k of VRAM, has two video controller chips each with 64k.  And the floppy drive capacity will be 900k, since double-sided 3.5" drives came out in 1984.

    It has just two graphics modes, compared to approximately seventeen thousand on the Imagine, but they are really nice graphics modes:

    Mode 1: 480x270 in 32 colours or 240x270 in 1024 colours, switchable on a two-pixel boundary.
    Mode 2: 480x270 in 32 colours, or 960x270 in 4 colours with 4 palettes, switchable on a two-pixel boundary.

    Update: Three modes.  When I figured out Mode 3, I realised there's no way they wouldn't have pushed it in somehow.

    Mode 3: 320x270 in 256 colours with 4 palettes, switchable on an eight-pixel boundary.

    And you can overlay a Mode 2 playfield on a Mode 3 playfield, and modulate the colours on one-third pixel boundaries.

    The pixel data from the two VDCs is XOR'd and fed into a 1024x12 external palette RAM.  (I looked it up and suitable chips were indeed available in 1984, though I'm not sure how much they cost.)

    By default VDC1 would output the low 5 bits and VDC2 the top 5 bits of the final pixel value, so they combine independently to provide one of 1024 colours. And that means you can precisely define transparency, translucency, and shadow effects, or just output 480x270 in 1024 colours, or anything else that is not actually mathematically impossible.

    If you output two 1024-colour low-resolution playfields, though, things will get weird.  I might not bother to fix that.  Sometimes weird is good.

    I think the 11th bit in the instruction encoding will be used as a size bit - byte or word.  That's a really simple update to the 10-bit architecture but a very nice one; it makes the four index registers (WXYZ) true general-purpose registers. On the Imagine they can be used for arithmetic via LEA, and just gained shift operations in the weekend opcode cleanup, but they don't have the usual AND/OR/XOR, or even subtraction.  And while LEA can do A=B+C addition, by design it doesn't set the carry bit and can't be used for extended precision arithmetic.


Disclaimer: Me listening to 30-year-old Amiga music: Wow, eight-bit PCM audio was pretty rough.  Me listening to 30-year-old Atari, PC, and C64 music: Oh, right.

Posted by: Pixy Misa at 11:31 PM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 927 words, total size 8 kb.

Saturday, September 19

Geek

Daily News Stuff 19 September 2020

Pieces Of Ten Edition

Tech News

  • Now if you want to take some pictures of the fascinating witches who put the scintillating stitches in the britches of the boys who put the powder on the noses of the faces of the ladies of the court of King Caractacus you're too late!  (TechReport)

    Because they've just sold out.

    They being the RTX3080 and both models of the PlayStation 5.


  • The WeChat and TikTok bans kick in on Sunday.  (Tom's Hardware)

    Good.


  • VueJS has hit 3.0.0.  (GitHub)

    I haven't done anything real in it, but it does seem to be the least sucky of the major JavaScript frameworks.


  • Based solely on watching videos of Dragon Spirit I have retroactively determined that the Imagine included a cartridge slot.  It could take ROM cartridges mapping up to 128k of address space (plus optional bank switching) or a RAM cartridge for an additional 128k of system RAM.

    The system assigns a single 10-bit bank register to the cartridge port, and leaves it up to the cartridge hardware to determine how to deal with it, but in theory it could support 1024 128k banks for a total of 128M of ROM (or indeed RAM).


  • That means that the big feature of the Imagine 1100 wasn't expandable RAM, but that it came with 128k of system RAM and 256k of video RAM.  You could use the same 128k RAM cartridge to bring system RAM up to 256k, but video RAM wasn't expandable.

    The 1000 and 1100 used page mode RAM, while the 1200 used faster nibble mode RAM, so RAM expansion cartridges weren't perfectly compatible.  They did work, but ran slower than system-specific models.


  • In other fantasy computer news I needed exactly one and a half bits to add a whole new exciting family of addressing modes to the Imagine and clean up an untidy corner of the instruction set.

    I found them.

    Now PC relative addressing only has a range of -128 to +127* instead of -512 to +511, but every instruction has access to special hardware registers, alternate register banks, and on-chip RAM, if they exist in this particular implementation of the CPU.

    So, for example, the DSP variants define AL and AR as left and right audio accumulators, and multiple banks for ABCD and WXYZ, but there was no standard encoding for accessing those registers outside of the specific MAC and BANK instructions.  The OUT $01E, AR to output a calculated sample to the right audio DAC had to be custom crafted into the DSP.

    This new method also encodes 10-to-20-bit conversions like LD A, WL which previously was only possible via LD AB, W - and even that was a special instruction distinct from standard LD.

    These methods require a two-byte instruction encoding, but the second byte is similar to the format used for indexed addressing, rather than a big nasty ad hoc mess.

    This is also how supervisor mode (on variants with a supervisor mode) access the segment and address extension registers (on variants with segment and address extension registers), and how multi-threaded versions spin up and synchronise threads.  None of that will be present in the initial version, but it's all defined so I won't trip over it later.

    * Unless you use indexed mode with P + immediate offset, which has a full 10-bit or 20-bit range but uses one or two extra bytes.


Retro Gameplay Video of the Day



Quick precis: MSX1 bad.  MSX2 pretty good actually.  Also, most MSX systems apparently had two cartridge slots.  Hmm.



Bonus Retro Gameplay Video of the Day



This is Dragon Spirit captured from original arcade hardware.  The video isn't as sharp as from an emulator, but it's the real deal.

Listening to the sound, I'm pretty sure it's using three music voices plus two voices for sound effects.  There are a number of places where you'd expect a fourth instrument to come in, and it never does.

The Imagine can reasonably do five voice wavetable synth at 3MHz, and I've assigned eight sets of registers to the DSP so the other three sets can be used for sound effects, which require less complicated processing.  (The hypothetical Imagine 1200 from 1987 doubles everything, so 10 wavetable voices and 16 sets of audio registers.)

I really like the first couple of tracks here.  Super catchy.


Extra Bonus Retro Gameplay Videos of the Day

Last ones, I swear.

The first is Dragon Spirit on an MSX2 system - 3.5MHz Z80A, 256k of system RAM (though I believe it ran in 64k), and 128k of video RAM.

The second is Dragon Spirit on a Sharp X68000 - 10MHz 68000, 1M system RAM, 512k bitmap video RAM, 512k tile video RAM, and 32k sprite RAM.




The X68000 version definitely looks better, but the MSX system isn't bad at all.  I believe this was a disk-based game and not a cartridge, so that's actually running from just 64k of RAM.  There are some odd hitches in the music on the MSX that aren't present on the X68000, possibly because it had an 8 voice sound chip rather than just 3.


Disclaimer: Bleep bloop.

Posted by: Pixy Misa at 06:13 PM | Comments (4) | Add Comment | Trackbacks (Suck)
Post contains 861 words, total size 7 kb.

Geek

Daily News Stuff 18 September 2020

7-4+2=1 Edition

Tech News


Atari 800XL Gameplay Video of the Day



Some of these games are really impressive for 8-bit hardware dating to 1979.  But there is a reason I'm targeting the Imagine at a notch above most of the real 8-bit systems, a little above even the MSX2 or the FM77.

There are a ton of MSX gameplay videos on YouTube, by the way.  The original MSX was rubbish, but the MSX2 wasn't bad.  People have even got it to run a multi-tasking GUI with TCP/IP.

A lot of the MSX2 games are still pretty bad, but some look quite good, like Dragon Spirit.  Here's the MSX2 version:



This seems to be the original arcade version, emulated via MAME.  It's certainly better looking than the MSX2 could manage.



Disclaimer: Not that I plan to spend the entire weekend watching them.  Not the entire weekend.

Posted by: Pixy Misa at 12:30 AM | Comments (32) | Add Comment | Trackbacks (Suck)
Post contains 445 words, total size 4 kb.

Friday, September 18

Geek

Daily News Stuff 17 September 2020

Zed Eighty Edition

Tech News

  • Sony ran a gender reveal for its new console, devastating three continents.  (AnandTech)

    $499 for the full version with Blu-Ray drive; $399 for the digital-only version.

    That pushes back fairly hard against both the $299 Sbox and the $499 Xbox.  Smart move by Sony, except for the part where they're probably losing money at that price point.


  • Numbers, how do they work?  (AnandTech)

    Sony also announced the Xperia 5 II, a companion to the Xperia 1 II.

    It's not cheap at $949, but it does have a Snapdragon 865, 2520x1080 120Hz OLED display, 8GB RAM,  128GB or 256GB of flash, microSD slot, headphone jack, wireless charging, and IP65 and IP68 ratings.

    Oops.  Wireless charging is only on the 1 II.


  • Taking the Tiger out for a spin.  (Tom's Hardware)

    A look at Tiger Lake on an Intel reference laptop, with some benchmarks run under Intel's watchful eye, so take it with a grain of salt.  Single-threaded performance - on Geekbench - appears excellent, clearly faster than current Intel laptops and beating a Ryzen 4800U by 40%.  That's a lot, but it is just one benchmark.

    And on the other hand, video encoding with Handbrake ran twice as fast on the 4800U.

    The Intel chip is running at 28W, but for single-threaded tests that is only likely to bump the clock speed up by 2% or so, not a significant factor.

    Intel's Xe graphics more-or-less catch up with AMD too.  Both systems tested used LPDDR4X-4266 RAM, and while AMD is still faster for gaming by 5-20% at 15W, it no longer squishes Intel like a bug.  When the Intel chip is freed up with a 28W TDP it can outpace AMD's 15W part, but then AMD has a 35W part, so you can play that game forever.

    Looking forward to see if that single-threaded performance is real across a broad range of benchmarks, and to what AMD delivers with Zen 3.

    Update: AnandTech have the same Intel reference unit and confirm the great single-threaded performance across a wider range of benchmarks.  They ran the SPEC 2006 and 2017 suites and posted individual as well as composite scores, so there's a lot more than one Geekbench score to chew on here.

    Short summary: If you run Dwarf Fortress, Intel's 11th gen chips are 50% faster than AMD.  If you run Blender, AMD is well over twice as fast as Intel.  And if you run Civilization 6 on integrated graphics, you're a masochist.


  • An LL(1) expression parser in exactly 100 lines of Python.  (GitHub)

    Thanks, I'll take it.

    The only imports are enum and re - the Python regular expression library - and it only uses re to check if a string of characters is numeric, which you can do with the isdigit() method.  So it should be nearly as simple rewritten in Basic.


  • That nibble-mode trick I used for the Dream means I can reasonably offer an upgraded version of the Imagine in Imagine-Emu.

    The Imagine 1200 was launched in 1987.  It offered a faster CPU and DSP - 6MHz vs. 3MHz - with 256k system RAM and 256k video RAM, using 100ns nibble-mode chips to deliver 12MB/sec of bandwidth on each bus.  The system also replaced the earlier 500k and 1M double-density floppy drives with a new extended density (ED) drive with a capacity of 4M.

    Which means that all the tricks the original model could do by stealing cycles on the system bus, this model can do just in VRAM.  And then do more tricks by stealing cycles from the system bus again.

    This version will have 256 bytes of cache on the CPU and DSP, which will speed up the cycle-accurate emulation mode but slow-down the free-running mode.


  • Whatever happened to the Z800?  It was announced in 1980 but never appeared.  Turns out it did eventually show up, much delayed, renamed, and converted to CMOS, as the Z280.

    And it a strange little beast it was too.  Instructions could still only directly address 64k of RAM at a time, but it had a complete paged memory management unit capable of mapping 16MB of RAM, a supervisor mode, and a 256-byte instruction cache.  It even supported multi-processor configurations, as if someone really, really wanted to build a Z80-based Unix system.



    The Z800 / Z280 was a commercial failure, as was the Z380, a 32-bit version of the Z80 with eight banks of registers.  The Z180, though, based on the Hitachi 64180, is still being made today, as is the eZ80, which for under $10 delivers the performance of a 150MHz Z80.  Meaning that by today's standards it's dead slow.


Disclaimer: In the future, everything will be dead slow by today's standards for fifteen minutes.

Posted by: Pixy Misa at 12:37 AM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 795 words, total size 6 kb.

Wednesday, September 16

Geek

Daily News Stuff 16 September 2020

640k Edition

Tech News

  • The numbers are in, and the RTX 3080 is a solid 50% to 60% faster than the RTX 2080.  (Tom's Hardware)

    That means it easily beats the 2080 Ti as well; right now it's the fastest video card there is.

    The 3080 has nearly three times as many CUDA cores as the 2080, and similar clocks, but isn't remotely close to three times the performance.  That's because half the cores in this architecture are the same flexible FP/INT cores as before, while half are simpler FP-only cores.  A 32-bit integer multiplier is actually about twice the size of a 32-bit floating point multiplier, so it makes sense to save space on a chip this big.

    So if the code for a given game uses lots of integer operations, it won't scale nearly as well on this hardware as the raw floating-point numbers would suggest.  But if Nvidia had made all the cores FP/INT, the chip would have been too large to manufacture on Samsung's 8nm node.  Something had to give.

    And there's still the 3090 to come.


  • Apple has announced new iPads.  (AnandTech)

    The iPad Air comes with the new A14 chip, which is the first volume part I know of to come out of TSMC's 5nm process.  The A14 is...  Well, it's slightly faster than the A13.

    The 64GB iPad Air costs A$899, which is exactly as much as my 64GB Retina iPad from 2013.  It does have slightly more pixels, but still no microSD slot.


  • Pure Storage has acquired Portworx in a deal worth $370 million.  (ZDNet)

    I have heard of at least one of those companies.


  • China has immense capacity and expertise for assembling complex equipment.  But without access to technology from the West, it is stuck in 2007.  (New York Times)

    I include Japan, South Korea, and of course Taiwan as part of the West.

    Basically, Huawei is fucked, and the CCP is resolutely determined to make sure it remains fucked.  (Free Beacon)

    Shame.  They made nice tablets.


  • I ran the numbers to work out what sort of hardware the Dream - the 12-bit model in our lineup - would have had, given just two parameters: First, it launched around 1985, and second, it had a 640x360 display.

    The Imagine is a home computer powerful enough to run business apps; the Dream I'm designing as a business computer flexible enough to run decent games.

    So, first, what's the pixel clock for 640x360 @ 60Hz?  I worked out for the Imagine that at 50Hz that resolution needs around 16MHz - and that my existing HSYNC rate of 18.75kHz was within spec for a 720x350 monochrome monitor.  So for 60Hz we just add 20% to both numbers, and we get a 19.2MHz pixel clock and a 22.5kHz HSYNC.

    If we divide the pixel clock by 4 as the base system clock, we get 4.8MHz, and divide that by 22050 and we get 213.3 cycles per line.  Round that down to a nice even 212, multiply back up again, and...  We have a 4.77MHz system.  Huh.  This was meant to be.

    Now, how do we get the data for a 640x360 display in (say) 64 colours, using commodity 1985 DRAM and a 12-bit bus?  On the Imagine I originally wanted a 5MHz memory clock, looked up the databooks, and realised that wasn't feasible in 1983.  Instead I set the clock to 3MHz but used page mode to read two bytes per cycle.

    On the Dream I'm going to use a different readily-available trick from the early 80s, nibble mode, where a common 256k x1 DRAM chip could stream out four successive bits in a row at much faster rates than regular random access.  Looking through Toshiba's 1984 memory databook, I could hit a 2MHz bus clock with nibble mode on using 150ns RAM, 2.5MHz with 120ns, and 3MHz with 100ns.

    Conveniently, 120ns RAM, not too exotic, lets me pin the memory clock at half the CPU clock.

    So the video controller has 106 memory cycles per scan line (half the 212 we calculated earlier), each delivering four (12 bit) bytes using nibble mode.  Assuming 80 cycles are in the visible area (it was about 75% of the scan line on a typical monitor, so that's close enough) we need 8 pixels per cycle to get a 640 pixel line, and that gives us 6 bits per pixel for 64 colours.

    Which is not a cosmic coincidence; that's what was supposed to pop out at the end from the numbers I fed in at the start.  It just means I did the maths right.

    The only problem is that it still can't do 80 column text mode.  80 columns of text in graphics mode, sure, no problem at all.  80 column text mode, no.  We'd need 80 random accesses, because the character data won't be sequential, and we only have 80 memory cycles per line, no cycles free to read the text map.

    For that we'd need a separate text RAM and....  Well, I could just shove a separate text RAM into this one.  I ruled it out for the Imagine to keep it cheap and simple for the home market, but this is explicitly a more business-oriented machine.

    The Dream won't have the dual-bus architecture of the Imagine: The video chip can't directly access system RAM, and the CPU can't directly access video RAM.  But that leaves me lots of imaginary pins free on the chips to do other stuff, such as having 64k of text RAM in addition to the 256k of main video RAM.

    Do the numbers work out?

    - 12 pins for the CPU interface
    - 12 + 9 pins for main memory
    - 12 + 8 pins for text memory
    - 12 pins for pixel output

    Total 65, plus a minimum of a dozen more for control signals.  With an 84-pin PLCC we can just about do it.  Okay.  That's what it is.  Three 4464 chips in page mode for the text or tile data, twelve 41256 chips in nibble mode for the character cells or bitmap data.

    Hmm.  Does the Dream always run in character mode?  Maybe it does.  Maybe it has no pure bitmap mode, just 4096 user-defined characters.  I was thinking of giving this thing hardware windowing, like the Intel 82786, but the hell with that.  It's going to be a 12-bit Microbee Gamma.

    I wonder if there's a YouTube video of that one?  Those things were rarer than hen's teeth, and I only ever got to touch one for a few minutes at a computer show.



    Looks like someone got to hang onto one for slightly longer than that.  It had a 720x350 display - MDA / Hercules resolution, but in 4096 colours.  So I'm pretty much on target.

    Unlike the Amiga - and very much like the TRS-80 Model 16 - the Microbee Gamma could run Unix.  It had an 8MHz 68000 and two 4MHz Z80s, one to handle display tasks, and the other to handle I/O.  The separate I/O processor allowed the 68000 to properly handle page faults, which would otherwise require a 68010 or later chip


  • I can probably figure out a way to line up the character cells so that you can draw to them as if they were a bitmap.  I'm not a total sadist.  At least not when I'm the one who's going to be writing the graphics library for this beastie.

    To unpack a bit: The Intel 82786 (IEEE.org) let you define areas of the screen to be drawn from different areas of RAM - hardware windows - though you could only have a limited number of those because there were only so many registers on the chip.

    With the Dream's bus design, hardware windows in 640x360 64-colour mode would have to align to 8-pixel boundaries, because we can only switch hardware windows on a new bus cycle, and we read 8 pixels per bus cycle.  And we read 8 pixels per bus cycle because it's the only way we can make it fast enough.

    Now it just so happens that our character cells are also 8 pixels wide.  If graphics are drawn into character cells, we can do proper bitmapped hardware windows in text mode.  Which means we can move and scroll windows 32 times as fast as moving the actual pixels - two bytes instead of 64 bytes for an 8x16 64-colour character cell.

    And this is precisely what the Microbee Gamma did, and it worked really well.  The original Amiga didn't live-drag window contents when you moved a window, just the outline.  The Gamma smoothly dragged the window contents, even if they were actively updating at the time.  For 1986 - I think I saw it in '86 - that was really neat.

    One minor side-effect of that was that the Gamma had wide window borders - you can see that in the video above.  Yes, that was partly because it was a mid-80s system with a relatively low-resolution monitor, but also because the window borders had to be whole characters.

    Anyway, this is the Dream.  The Imagine has a squintillion different graphics modes - different bit depths, pixel packings, switchable palettes, switchable resolutions, text mode, graphics text mode, fill mode, HAM mode, RLL-compressed RGBA overlays with selectable alpha channel arithmetic...

    The Dream does none of that.   Text mode is graphics mode and graphics mode is text mode.

    Update: Yes, we can draw into the character map as if it were a 1024x512 64-colour bitmap, albeit only at the base 2.385 MHz memory clock, not using nibble mode.  We can copy rectangles in nibble mode, though.  Or you can view it as 4096 programmable 8x16 characters with 64 colours.

    There will also be sprites of some sort.  Let's say 16 of them, 16 pixels wide, eight colours, switchable to eight pixels wide and 64 colours.  That fits nicely into our bus cycle and our horizontal blanking period. 

    Update 2: Or I could go with the previous plan and have a separate sprite chip with its own 64k of RAM.  That works nicely, except that now the system has 704k of RAM.  Oh well, can't have everything.  It would help the Dream compete with the Imagine's clever hardware -compressed overlay system, which allows arbitrary numbers of sprites.

    Update 3: No, I have it!  The text RAM is the sprite RAM.  So you can have this neat accelerated super text/tile mode with 16 sprites, or a regular bitmap and 512 sprites.  And now we have 640k again.  This makes a lot of sense.  Why would a business machine have an overpowered sprite processor?  Because the hardware designers snuck it in as an alternate mode for the 80-column text system.  Just don't ask me why it has an overpowered audio processor.

    There's a trick I might steal from the SNES as well - the SNES was an amazing assemblage of tricks.  It could hardware scroll individual segments of the screen.  In text mode the Dream will use two bytes per character - one to select one of 4096 characters, the other to select from 64 foreground and background colours.

    In tile mode - pseudo-bitmap mode - we don't need to select those colours because the tiles themselves are in 64 colours.  So we can steal 7 bits to allow us to rotate the contents of the cell horizontally and vertically.  And still have 5 bits left to do other stupid stuff with.

    This won't scroll a whole area, but if you want to animate a tiled background, you can do it pixel-smooth just by updating the text map.  With one tweak to the video hardware, to pre-fetch a zeroth byte on each line, we can smooth-scroll an entire screen containing multiple independent smooth-scrolling windows just by updating the text map.


  • Oh, and the other thing.  The Dream will have 256k main memory, 256k graphics memory, 64k text memory, and 64k for the sound chip.  So it runs at 4.77MHz and has 640k of RAM.


  • What if someone wants to run Imagine-Emu on a Raspberry Pi?  Does Nim even work on the Raspberry Pi?  What's that, Lassie?  It not only works, it's available as a standard package?

    Well, okay then.  Transpiling via C has its benefits.


Disclaimer: I still prefer Crystal though.

Posted by: Pixy Misa at 10:58 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 2057 words, total size 14 kb.

Geek

Daily News Stuff 15 September 2020

Gone Rogue Edition

Tech News



Disclaimer: And stay out.

Posted by: Pixy Misa at 03:41 AM | Comments (6) | Add Comment | Trackbacks (Suck)
Post contains 309 words, total size 4 kb.

Tuesday, September 15

Geek

Daily News Stuff 14 September 2020

Oh Yes Edition

Tech News

  • How to write a Basic compiler in Python, Part 3: Code generation.

    I knew I'd blogged about something I could use.  Now, this does compile to C, but it compiles to really dumb C, s=s+c; b=b+1; level C.  Basically it's used as a portable assembler.  That's fine.

    And it's self-contained, not using any lexer or parser libraries, so it's a good starting point for something that will eventually be translated into itself.


  • Writing A Compiler In Go might also be a useful source.

    It takes a similar approach - no tools or libraries used - to produce a complete compiler for a simple programming language.  Of course I'm no Go programmer, but I can read Go code.  Mostly.  Had to for work.  Don't ask.


  • Nvidia is buying Arm for $40 billion.  (AnandTech)

    This has made a lot of people very angry and been widely regarded as a bad move.


  • Microsoft announced a major win today.  (Thurrott.com)

    Not buying TikTok is probably the company's smartest move since they didn't buy Yahoo.


  • I always thought that the claims coming from Nikola seemed a bit overblown.  (WCCFTech)

    Of course that also seemed true of Tesla and SpaceX, and yet they delivered the goods.


  • Your government at work.




  • Adjusted the hardware design of the Imagine just a little.  Basically, the idea is that the Imagine's CPU is a microcontroller with multiple register banks for fast interrupt servicing - like the Z80 and 8051 - and the DSP is a variant of that, with eight banks of the user registers but only one bank of system registers.

    The DSP also has a nominal 256 bytes of on-chip mask-programmed ROM containing a set of wavetable synthesis algorithms.  These changes achieve two things: It makes it really a wavetable synthesis chip, albeit one developers can tinker with; and it drastically reduces activity on the system bus.  The initial version of the design would have used around 70% of the bus for 5 stereo voices; this version is a little under 10%.


  • I might steal a trick from the HP 150 and have text mode on the Imagine double the pixel clock.  It has the bandwidth to do this; text mode basically wastes one byte every time it reads the character data, so it makes no difference if it reads and uses two bytes.  That would allow it to output readable 80-column text on colour screens (960x270 with the new subpixels), and beautiful 80-column text on monochrome screens (now 1280x360).  Need memory for those fonts though.


  • Working on an emulator generator.  Given the processor definitions, it spits out Nim code for the CPU-specific emulator class, an assembler, a disassembler, and a simple machine-language monitor like the old MS-DOS DEBUG.

    The next step would be to have this spit out a code generator back-end for the compiler as well.  That is probably possible.  Certainly possible for the 10, 11, and 12 bit models, which all ended up with the same underlying Super 6809 design just with progressively more and larger registers.


Disclaimer: For small values of "work".

Posted by: Pixy Misa at 12:35 AM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 519 words, total size 4 kb.

Monday, September 14

Geek

Daily News Stuff 13 September 2020

Pieberry Edition

Tech News

  • Raspberry Pi overkill for you needs?  The Iconikal SBC is $7.99 on Amazon.  (Tom's Hardware)

    That includes - or included; it's currently sold out for some reason - a Rockchip RK3328 with four 1.5GHz A53 cores, 1GB RAM, a free 16GB microSD that you probably shouldn't trust, a power supply, and a 16x2 character LCD screen.


  • The next SpaceX Starship prototype will have the final nosecone and control surfaces and attempt a 60,000 foot flight.  (Tech Crunch)

    Per explodia ad astra.


  • Trying out ASRock's DeskMini X300 with Ryzen 4000G.  (WCCFTech)

    The first time I saw a picture of this thing I thought to myself, what a crappy, cheap case.  Turns out it was more of a crappy, cheap photograph; it's brushed aluminium that they somehow made look like poorly-molded plastic.

    It's a little larger than a NUC - it's a standard mini-STX form-factor measuring 6" x 6" x 3" where the typical NUC is 4" x 4" x2" - but on the other hand it can fit APUs up to 65W, two M.2 NVMe drives, and two 2.5" SATA drives.

    As expected, with a Ryzen 4000 APU it goes vroom.  Not recommended for overclocking though.


  • Windows 10 is getting a new Start Menu.  (Bleeping Computer)

    I only ever see the default start menu for as long as it takes me to install Start10, so I really couldn't comment.  It will suck though, that much is certain.  Couldn't comment apart from that it is certain to suck.


  • I should make the MUL, DIV, and MAC instructions on the Imagine two bytes, shouldn't I?  It won't affect the performance at all, except if you're multiplying by a 10-bit immediate value, and it frees up two whole pages of opcodes.


Retrocomputing Video of the Day



This looks a lot like the HP 200 Model 16, a tiny 68000-based workstation.  It's not, though; it's the HP 150, a tiny 8088-based touchscreen PC.

He's going to do a follow-up video on the disk drive unit, which is the same one as the Model 16.  I'll be interested to see that, since I've heard but can't confirm that these drives ran at 600 RPM, twice as fast as any normal 3.5" drive.

He notes an interesting point on the text mode used on this and other old HP devices.  The 150 has an 80x27 text display made up of 9x14 pixel characters.  But it then says fuck it, I'll do what I want and shifts individual pixels by a half pixel width or widens them by one third as needed to make individual characters more legible.

That's some trick.  Makes me want to reproduce it in my emulator, though you'd need a 4320-pixel-wide display to do that exactly.


Disclaimer: It's basically a bunch of dinosaurs in an industrial blender.

Posted by: Pixy Misa at 12:23 AM | Comments (5) | Add Comment | Trackbacks (Suck)
Post contains 473 words, total size 4 kb.

Sunday, September 13

Geek

Imaginary Code

Here's the full programming model of the Imagine CPU and DSP.  I'll write a proper manual as I go; this is mostly to give an idea of what the processor will look like, and as a document for myself to make sure I haven't either (a) missed anything critical or worse (b) run out of opcodes.

Let's see if I can get a bunch of preformatted text in place without the Minx editor reducing it to mush...  Yes, on my third try.

Couple of things that fell out of this proper run-through of the full programming model:
  • We now have eight stack pointers!  The bits were there to be used, so why the hell not.  All four index registers, the two reserved stack pointers, the loop counter, and even the program counter can be used as stack pointers.

  • Why the hell would you use the program counter as a stack pointer?  Well, for pushing data it would be very weird.  But for popping data, you can read up to ten registers at once from immediate data:

    POP P, WXYZLT will read read the four main index registers and two alternate index registers with a single two-byte instruction; normally that would take six separate instructions.  Great if you're setting up for some complex graphics algorithm.

    And POP P, WXYZSPLURT will set up all the index registers and stack pointers, and branch to a new place in the code.  If you don't trash the stack you might be able to turn it into a subroutine call.

    PUSH P on the other hand will push the specified registers into the current program, in reverse order, and then execute them.  I can't imagine why you would want to do that, but you can.

  • Similarly, I borrowed LEA - Load Effective Address - from the 6809.  But you can also LEAP - that is, LEA to P, the program counter.  Since LEA supports base+offset addressing and indirect addressing simultaneously, that provides us with both fixed and relative jump tables, without needing an instruction specific for that.

    In fact, you can even LEAP into an interpolated jump table.  There's nothing stopping you.

  • The general word-size bit fell by the wayside.  There's just not enough opcodes on a 10-bit design.  Instead I reserved a code page (32 opcodes) for 20-bit instructions, which gives us 15 available bits for that mode.

    I haven't really started on the details of 20-bit mode.  Having 32 times the opcode space is liberating to the point of paralysis.  Maybe the early Imagine models didn't implement 20-bit mode.

  • One group of registers spells SPLURT.  Another spells out QOFI.  The remainder are ABCD and WXYZ which don't really spell out anything.

  • The .B and .W markers are almost always optional; the assembler can distinguish the mode either from the registers used or from the size of the immediate data.  For example, BRA $F0 means BRA.B and BRA $000F0 means BRA.W.  Of course the two instructions do esentially the same thing.

  • I've done a draft of the Dream programming model as well; it's very similar albeit with another eight registers and with some rough corners smoothed off.  The Imagine needs some specialised opcodes for handling registers outside the two main sets (ABCD and WXYZ).  The Dream has enough bits in its register selection to handle that in an orderly manner.

    This is a mixed blessing.  Want to add accumulator A to the flags register F?  The Imagine very sensibly has no opcode for such a dumb instruction, but the Dream is happy to oblige.


Update 2020-09-14
  • In the indexing postbyte:
    • A value of 7 now codes for no base, allowing for indirect mode
    • In the offset, 12 through 15 code for WH through ZH for LEAF mode.
    • The loop counter L is no longer available as a base, only as an offset.
    • Timer / alternate index register T is not available at all.

  • The A100 and A101 segmented microprocessors now have two register banks, including all segment base and size registers.
  • The A102 and A103 (non-segmented) microcontrollers now have four register banks, but of course no segment registers.
  • The A108 DSP and A109 ASP now have eight banks of general-purpose registers - accumulators ABCD and WXYZ - but only one bank of the remaining registers.  (Also no segment registers.)

more...

Posted by: Pixy Misa at 04:49 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 1593 words, total size 11 kb.

Geek

Imaginary Sounds

Previously I worked out a potential DSP loop for wave table synthesis for the Imagine.  It looked something like this:

Instruction     Cycles Comment
BANK 1          1      'Use embedded memory bank 1
LD W, R0        1      'Load the base register for the voice
LD X, R2        1      'Load the current offset
ADD X, R4       1      'Add the step
CMP XH, R7      1      'Compare the high byte with the sample size
IFGE 1          1      'Next instruction only executes if the offset is past the end of the sample
CLR X           1      'Set the offset back to the start
ST X, R2        1      'And save it
ADD W, XH       1      'Add the high byte of the current offset to the address
LD B, (W)       2      'Load the sample into the low byte of AB
MAC AL, B, R8   1+N    'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, R9   1+N    'And the same for the right

That was before working out the full instruction encodings.  Earlier I added in a switchable bank of 10 memory-mapped registers to reduce the number of memory accesses required, but that is between tricky and impossible to map into the opcode space - and there's actually a better way to do it, that we can steal from the Z80.

When working through the CPU programming model I also added in a LEAF instruction - Load Effective Address, Fractional - explicitly for table interpolation.  With this new model our algorithm turns out somewhat different:

Instruction     Cycles  Comment
GRP 1           1       'Switch to register group 1
LEAF X, X+Y     2       'Calculate an effective address in interpolated mode
CMP X, Z        1       'Check if we've reached the end of the sample
IFGE 1          1       'Next instruction only executes if the offset is past the end of the sample
  LD X, W       1       'Reset to the first byte of the sample
LD B, (X)       2       'Load the sample
MAC AL, B, C    1+N     'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N     'And the same for the right

Oops, that's wrong.  Sorry about your burned-out speakers.  Let's try again.

Instruction     Cycles  Comment
BANK 1          1       'Switch to register group 1
ADD X, Y        1       'Increment the offset by the step size
AND X, Z        1       'Restrict the offset to the sample size
LD B, (W+XH)    2       'Load the sample
MAC AL, B, C    1+N     'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N     'And the same for the right

The trick we've stolen from the Z80 is just to have multiple sets of registers.  The Z80 had two register sets in 1976, we need five or six in 1983.  That doesn't seem too implausible.

The LEAF instruction replaces the complicated high/low byte address fiddling.  It only saves three cycles but it's a lot easier to read.  If you're reading through some Imagine assembler and you see LEAF, you immediately know it's doing some kind of table interpolation.  (And if you see LEAP, that's a jump table.  It's actually LEA P, but the assembler will accept LEAP.)

Update 2020-09-14

We no longer need to use the LEAF instruction explicitly; if you specify (W+XH) as the index mode in any instruction that can take indexing postbyte, it calculates the address in interpolated mode.

This version of to code removes the CMP / IF / CLR logic and goes with the method used in the Ensoniq 5503 as found in the Apple IIgs.  Sample banks are a fixed power-of-two length - 64, 128, 256, 512, or 1024 bytes - so we can simply AND the offset with a bit mask to clip it to the appropriate range.

One other thing: When I was first thinking of the custom chips for the Imagine I called this a fixed-function DSP, even though it was fully-programmable.  It just seemed like a good name.  Then I realised exactly why this version of the chip would be described as fixed-function.

It has a small - maybe 256 bytes - mask-programmed ROM containing a bunch of DSP algorithms, like the code above.  You can still write your own custom algorithms, but if you're running code out of the on-board ROM, and you have your settings loaded into the registers, we only need to access main memory three times per sample: Two reads to issue the subroutine call, and one when the subroutine loads the sample data.

That means that for a sample rate of 18.75kHz - our nominal HSYNC rate and a useful audio sample rate - we'd make 187,500 access to main memory per second, 6.25% of available cycles.

And now we have a wavetable chip that makes sense.  The only remaining question is, does it have six entire register banks, or do we go back to the 64-byte onboard RAM?

With the latter, the code could look like this (instructions in bold access main memory).

Instruction     Cycles   Comment
LD U, $00000    2        'Point the user stack into on-chip RAM
JSR $41E        2        'This is the nominal location in mask ROM
POP U, WXYZ     5        'Load all the sample bank settings
POP U, CD       3        'Load the volume settings
ADD X, Y        1        'Increment the offset by the step size
AND X, Z        1        'Restrict the offset to the sample size
ST X, (U-8)     3        'Write the offset back to RAM
LD B, (W+XH)    2        'Load the sample
MAC AL, B, C    1+N      'Multiply the sample by the volume and add to the left accumulator
MAC AR, B, D    1+N      'And the same for the right
RET             2        'Return and process the next request in the queue

That's a lot more cycles than before, but now we don't need to have half a dozen complete register banks, and if we want just one more voice the ROM code spills neatly to main memory without any changes, which is good because you can't change it.

On the downside, we're now at 23+2N cycles, which is right back where we started.  Having multiple register banks is magical for this application.


Update 2020-09-14 Afternoon

Executive decision: The A100 and A101 segmented processors had two full register banks (all of ABCD, WXYZ, PLUS, and QOFI, only excluding R and T, plus the associated base and size registers for each segment).

The A102 and A103 microcontrollers lacked segmentation but had four full register banks.  This allowed for three different interrupt priorities each with zero-cycle latency.

The A108 DSP and A109 ASP had eight user register banks (just ABCD and WXYZ being switchable).

The A100, A102, and A108 are more expensive parts with Harvard architectures - dual busses separating instructions from data.

The Imagine 1000 uses the A103 and A109.  The subsequent Imagine 1100 uses the A119, an updated A109 with four full register banks and eight additional user register banks, for both its CPU and DSP.

Posted by: Pixy Misa at 04:34 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 1155 words, total size 10 kb.

<< Page 2 of 4 >>
141kb generated in CPU 0.1416, elapsed 0.9706 seconds.
56 queries taking 0.9462 seconds, 415 records returned.
Powered by Minx 1.1.6c-pink.
Using http / http://ai.mee.nu / 413