Saturday, 29 May 2021

Scrambling for time atm!

Stars are still WIP, I've been coding the algorithm to produce the tile data and also the data for optimal palette cycling. However work has gotten very busy (plus the Giro d'Italia is currently running) so I haven't done a lot lately. Work will still be busy for another work or two, but then it should quieten down and I'll knock off the last piece of the puzzle.

I haven't put it aside and am keen to start on Galaxian next...

Wednesday, 19 May 2021

A star is... coded.

 So I have a plan for the stars.

There are 124 stars, and using 16x16 tiles would require 103 tiles to contain them. Tiles have no more than 3 stars in them. Tiles will also have their own palette (16 colour entries).

There are 4 blink cycles. Rather than use 4 layers of sprites, I plan to use just one layer, with all stars rendered, and use palette cycling to turn stars on and off. At most, the states of 61 stars change in a blink cycle, though some of them may be on the same tile. So worst case, I need to update 61x3=183 words in the palette map. That should still be pretty insignificant as far as processing time for a frame.

In theory, I can replicate the exact star pattern, colours and behaviour - at least as implemented in MAME.

Just need to finish off writing the code to generate the tile and palette data, and some extra data to assist in optimising the updates.

UPDATE: After revisiting the star generator in MAME, it seems worst case is only 47 stars changing state, which is 47x3=141 words written. So a bit quicker again. Tile generation code next...

Tuesday, 18 May 2021

Star gazing

Have to admit I had a few days off doing much on Scramble. I did a little with non-tate mode on the Neo Geo but not enough to have anything worth blogging about.

Curiously there is some sort of 'glitch' in the rendering but I don't even understand the nature of the glitch at this point so I can't explain it. It manifests itself as an occasional dark flicker in a bottom tile of the landscape, which only happens whilst the draw_landscape() function is executing. But the landscape is only ever drawn off the visible area to the right, and once scrolled into the screen, is not updated. Furthermore, if I remove the code that clears the columns before rendering, the flickering seems to disappear. So I'm more inclined to conclude at this stage that it's specifically related to the Neo Geo implementation rather than the race conditions I saw earlier.

Moving on from there, I've been looking at the implementation of the stars in MAME. I should be able to mimic the distribution and the colour set; the precise blinking pattern may be another matter. It does cycle through 4 patterns of different stars being enabled/disabled. I think a close approximation will suffice.

It'll be a matter of generating a set of 16x16 tiles with the star patterns and using palette cycling to control them, perhaps even to enable/disable them as well. Like the tilemap, the stars will require a set of chained sprites in a fixed location. And being at the back in the priority order, I'll have to shuffle the existing sprites up a bit.

That leaves tweaking sprite positions, fixing Neo Geo-related implementation issues (controllers and coin-up)  and maybe look at sound, before I revisit non-tate mode.

I'm still undecided about attempting an Amiga port just now. I did just see reference to a video setting up gcc for Amiga development. I'm no Amiga expert so not sure how difficult scrolling a portion of the display is... but maybe that's a good excuse to learn.

UPDATE: I've fixed sprite and bullet position relative to the tilemap. Still have to tweak tilemap position on the Neo Geo though, so will likely require more tweaking. However I think I've got a bug in the bullet/rocket collision detection routine; occasionally the bullets appear to pass straight through rockets.

UPDATE #2: Had a quick play this morning and can see there's definitely an issue with bullet collision-detection at least, and it's not limited to rockets. Occasionally the bullet will pass straight through objects. A quick look at the code doesn't give any hints.

UPDATE #3: Had a look at the star generation for Galaxian in MAME. Scramble is slightly simpler as it doesn't scroll, but mapping the hardware implementation to the software framework of MAME convolutes the algorithm somewhat. In short, an LFSR generates the star positions and colour index (from a palette of 64), whilst further combinatorial logic controls the 4 cycles of blinking. There are a total of (just) 124 stars on the screen, not all visible at the same time due to the blinking effect.

All 124 stars generated for Scramble

I have actually implemented the same LFSR in the code to generate the star tiles for the Neo Geo, but there are a few unsolved issues that remain. Sprites are limited to 16 colours, whilst stars have 64. There are a number of approaches to that issue, including (effectively) assigning each star its own palette. Then there's the blinking; do I use palette cycling, or have 4 layers of sprites? Food for thought...

Saturday, 15 May 2021

Running at full tilt!

Great result! By updating 96 entries in the (memory-mapped) palette map rather than the palette index in 1024 (effectively port-mapped) sprite registers, the entire ISR now runs for signifcantly less than 1/4 of a frame and the landscape drawing is finished well before the next VBLANK interrupt.

Palette update finished before the visible area of the frame is rendered

This gives plenty of head-room for the code to run and any potential race conditions should now be alleviated. The game should run at 100% now with no glitches.

One thing I did (also) change is move the scrolling to the start of the ISR (before palette change) to eliminate the screen tearing I noticed on MAME. I'll do the same for the sprites.

And the background colour; I was thinking I'd need another layer of sprites but of course you can control the background colour directly on the Neo Geo (as I have been doing to profile the ISR) so that was a one-line implementation!

Blue background. All that is missing are the stars

Next up stars (which will need their own layer - just another 32 sprites and some tiles created for stars) and maybe use palette to blink them.

Finally, I need to tweak some Neo Geo-specific features, such as coining up and playing a game. At the moment I can only read the SELECT button running as an AES. Not sure how to handle coin-up on MVS just yet as it's handled by the BIOS depending on what mode the game is running in. If I get time today, I'd like to copy the ROM to my NeoSD and try it on my real AES console!

UPDATE: Scramble running on my Neo Geo AES!

Washed out colour due to the phone camera/flash

Profiling and palette manipulation?

I was thinking more about the glitches I saw and race conditions in the code which also reared their heads in the prototype PC port. There are two areas of the code which could, under certain conditions, cause issues. The one relevant here occurs when the mainline code is (still) drawing the landscape and gets interrupted by the next VBLANK. Not that this should never happen on the original machine.

So a few profiling experiments tonight to see how much time is spent doing what.

I thought of using the background colour to indicate which area of code was being executed. I was shocked to see how much time was spent in the ISR - not actually running Scramble code but updating the Neo Geo display hardware or, more specifically, the tilemap palette (there's 1024 tiles to update every frame).

The background is set to RED in the ISR during the video hardware update, YELLOW when the Scramble ISR code is running, and BLUE when the mainline code is updating the landscape (drawing the right-most 2 columns).

Everything enabled

As you can see, it spends about two-thirds of the time in the ISR, and more than half of that is just updating the video hardware (red). When the ISR queues a few commands to be executed by the mainline code, the landscape draw function (blue) extends to the right-hand edge of the screen. This is where you'd see glitches before I disabled interrupts around the VRAM accesses.

I halved the number of tilemap palette updates and saw a correspondingly shorter red area. And after eliminating all tilemap palette updates, the red isn't even visible which means the (Scramble) sprite updates are taking very little time at all.

Updating only half the tilemap palette

Although the blue area sometimes moves a little towards the right-hand edge, I never saw it actually get there. What this tells me is that there's sufficient time to execute all the Scramble code on the 68K; it's just the VRAM accesses that are causing issues. So there's little point trying to optimise the generate ASM code any further I don't think.

So what can I do from here? Ideally the landscape drawing (blue) should be finished before the next VBLANK interrupt, otherwise it's possible some object can be drawn in the wrong column - even with the interrupt protection. That's because the ISR changes some variables used by the landscape drawing function.

One option is to update the palette only when the palette actually changes, and only for those tiles that have changed, rather than each and every tile every frame. But that doesn't actually change the worse-case scenario, and will only reduce the problem, not eliminate it. And the extra code could push the landscape update further into the next VBLANK.

The landscape draw function is also surprisingly slow, though it requires quite a few VRAM accesses as well. I could take a copy of the variables updated in the ISR at the start of the function which means it would only be critical that the ISR start before the next VBLANK...

Something I need to ponder further... feel free to provide suggestions.

Actually I just had a thought - how about updating the palette entries instead of updating all the tilemap entries!?! This might have merit... stay tuned!

UPDATE: I'm yet to try it, but I think this will work! On Scramble there's a colour (palette index) for each column (32 tiles) - so 32 index entries. I can simply assign each of the columns to its own Neo Geo palette, and then update the three words for each column every frame - a total of 32x3=96 writes and no VRAM address register slowing it down ever more!

Friday, 14 May 2021

So it's down to looking at gcc code generation...

The good news is that I've sorted all the issues with race conditions. I learned how to generate extended inline ASM in gcc, refreshed my memory on Neo Geo interrupts, took a look at the ngdevkit crt0.S code, rediscovered how to generate assembler and symbol listings in gcc, and confirmed the findings in the MAME debugger.

In a nutshell, all I had to do is disable the VBLANK interrupt in the function that updates the tilemap, where it sets REG_VRAMADDR and then writes the tile number to REG_VRAMRW. This eliminates all the glitches where tiles would occasionally be written to the wrong location on the tilemap.

Note that this function is (also) called from the VBLANK interrupt, so I had to save/restore the 68K status register rather than simply re-enable interrupts.

The only other critical function that accesses the Neo Geo VRAM is called from the ngdevkit VBLANK rom callback routine, after the interrupt has been acknowledged (so I don't have to do it after all) and in the context of an ISR (interrupts are already disabled). So nothing to do there.

I also did a preliminary optimisation of the code that updates the tilemap attribute (colour) and scroll registers, and all the sprite registers. On the original arcade hardware, this is done in the NMI and is a simple 128-byte LDIR. Not so simple on the Neo Geo, and quite slow with the REG_VRAMADDR mechanism.

So onto the bad news.

The NMI (VBLANK interrupt) code is taking too much CPU time and although it does finish before the next interrupt, the mainline code is starved of CPU. So you can coin up and play the game but tasks like updating the text and so-called head-up display, and drawing some of the landscape isn't being done.

Taking a look at some of the code being generated was an eye-opener to say the least. There are snippets of code that are inexplicably inefficient and complex. It seems that any code involving a structure member variable is just ridiculously inefficient, and that's a static structure so there's no member offset calculations involved at all.

Check out this snippet that contrasts a simple member update...

1604:src/scramble/scramble.c **** wram.score_table_text_ptr = score_table_text;
4554 .loc 1 1604 0
4555 1d86 203C 0000 move.l #score_table_text,%d0
4555 0000
4556 1d8c 2200 move.l %d0,%d1
4557 1d8e 4241 clr.w %d1
4558 1d90 4841 swap %d1
4559 1d92 E049 lsr.w #8,%d1
4560 1d94 1439 0000 move.b wram+1113,%d2
4560 0000
4561 1d9a 0202 0000 and.b #0,%d2
4562 1d9e 8202 or.b %d2,%d1
4563 1da0 13C1 0000 move.b %d1,wram+1113
4563 0000
4564 1da6 2200 move.l %d0,%d1
4565 1da8 4241 clr.w %d1
4566 1daa 4841 swap %d1
4567 1dac 7400 moveq #0,%d2
4568 1dae 4602 not.b %d2
4569 1db0 C481 and.l %d1,%d2
4570 1db2 2F42 000C move.l %d2,12(%sp)
4571 1db6 1239 0000 move.b wram+1114,%d1
4571 0000 4572 1dbc 0201 0000 and.b #0,%d1
4573 1dc0 822F 000F or.b 15(%sp),%d1
4574 1dc4 13C1 0000 move.b %d1,wram+1114
4574 0000
4575 1dca 2200 move.l %d0,%d1
4576 1dcc E089 lsr.l #8,%d1
4577 1dce 7400 moveq #0,%d2
4578 1dd0 4602 not.b %d2
4579 1dd2 C481 and.l %d1,%d2
4580 1dd4 2F42 0008 move.l %d2,8(%sp)
4581 1dd8 1239 0000 move.b wram+1115,%d1
4581 0000
4582 1dde 0201 0000 and.b #0,%d1
4583 1de2 822F 000B or.b 11(%sp),%d1
4584 1de6 13C1 0000 move.b %d1,wram+1115
4584 0000
4585 1dec 7200 moveq #0,%d1
4586 1dee 4601 not.b %d1
4587 1df0 C280 and.l %d0,%d1
4588 1df2 2F41 0004 move.l %d1,4(%sp)
4589 1df6 1039 0000 move.b wram+1116,%d0
4589 0000
4590 1dfc 0200 0000 and.b #0,%d0
4591 1e00 802F 0007 or.b 7(%sp),%d0
4592 1e04 13C0 0000 move.b %d0,wram+1116
4592 0000

  ...with that of a discrete static variable:

1605:src/scramble/scramble.c **** bananas = score_table_text;
4593 .loc 1 1605 0
4594 1e0a 23FC 0000 move.l #score_table_text,bananas

That's quite a difference! There are other instances throughout the code, even for simple uint16_t variables. I'm sure it all adds up. So one option is to move all the variables out of the structure and see what sort of savings I get.

The not so bad news is that it seems to not be too far from the tipping point as-is. If I remove the code that updates the tilemap attributes for example, which comprises a mere 32 writes to VRAM, then the game appears to run just fine.

So a bit of experimentation in optimsation still to be done.

UPDATE: The structure issue could rather be an alignment issue... more to come...

UPDATE #2: Removing #pragma pack(1) from the wram structure produced code akin to the 2nd snippet above. It also decreased the code size from 0xB28C to 0x98F0. However it didn't seem to have any effect on the NMI execution time... hmm...

UPDATE #3: Almost there! Some more optimisation of the tilemap update code together with eliminating most of the structure packing and the game is almost glitch-free. Occasionally on the ufo stage the landscape isn't drawn correctly (more on this later).

Thursday, 13 May 2021

About to bite the bullet...

It's time to book the fat lady who sings. After consulting my Donkey Kong code, I copied the "magic" algorithm for transforming the sprite Y coordinate and viola - sprites appeared! Still perhaps not pixel-perfect due to the placement of the underlying tilemap, but close enough for now.

Score table with sprites

It runs slow, there are glitches, but most of the implementation is done. Bullets are missing but everything else is in place. It's actually playable now.

Scramble, on the Neo Geo in tate mode

The game really slowed down when I got the sprite update code working. Not sure why because there wasn't a lot I added but maybe just enough to tip the NMI execution time over the 16ms mark. It could also be because of the brute-force hack I did to minimise glitches (still) caused by a race condition with the Neo Geo VRAM registers. So there's still a bit of optmisation that can be done. I have no doubt I can ultimately get it running at 60fps.

The scrolling was trivial to implement on the Neo Geo. Initially I defined the tilemap as a single 32x32 tile (scaled to 50%) chained sprite, with a single origin. But today I divided it into 3 chained sprites, the middle sprite being the scrolling portion of the display. So scrolling the playfield now requires an update to just a single sprite Y coordinate register - too easy.

I'll need to go back and create a tile for the bullet sprite; on the arcade game the bullets are not rendered from ROM data like other sprites but rather a dedicated circuit on the PCB. Then I can add the code to assign another 4 Neo Geo sprites to the bullets and the functionality should then be complete.

From there it's a matter of handling the race conditions (I think I just need to disable interrupts briefly) and optimise the VRAM access code and that should be it. I might look into adding sound if I can find some samples as there seems to be resources around to get a sound driver up and running.

I'm also thinking of options and strategies for rotating the display. I already have rotated tiles and sprites in the CROM.

UPDATE: added bullets.