Thursday 5 January 2023

Starting on the Neo Geo optimisations

I've started on the Neo Geo optimisation in the last few days. Starting with the low-hanging fruit.

I did notice that the foreground tilemap used 40 Neo Geo sprites for 36 visible rows; the first four (4) rows are not visible. So I unchained all the sprites, and made the first 4 inactive, which means they no longer factor in the 96 sprites-per-scanline limit. Although those rows are only written when clearing the entire layer, retaining those sprites (even if inactive) means one less compare each time the layer is accessed.

I should be able to remove another handful of sprites from the foregound layer for rows that are never written - I just need to work out which rows they are. One option would be to deactivate all foreground layer sprites until a non-blank tile is written to that row... a few cycles overhead is not critical on the foreground layer as it is never written during VBLANK (IIRC).

Next I looked at the colorram (attribute) routines for the foreground and background layers, as they're quite lengthy. The sections of code that shuffle bits around between the arcade hardware and the Neo Geo hardware looked like they would benefit from a table look-up. So I set about coding those.

It is far preferable (IMHO) that the look-up tables are generated on-the-fly in the game's platform initialisation routine, than an external tool that generates .DB statements for inclusion in the source. I would have thought this decision was a no-brainer - until I had a 'debate' with another author a few years back that did the exact opposite! In fact the author (who I won't shame) couldn't even wrap his head around the concept of generating the tables on-the-fly, claiming "it couldn't be done". Hmm...

The foreground colorram routine requires a 256-word look-up table to translate an 8-bit attribute byte to a 16-bit Neo Geo attribute word. According to this handy site I discovered tonight, the execution time went from 156 cycles to 32 cycles (~20%). Not too bad at all!

The background colorram routine requires a 512-word look-up to translate a 9-bit attribute (combined from 2 bytes) to a 16-bit Neo Geo attribute word. The execution time in this case went from 180 cycles to 84 cycles (~47%). Not quite as good because I had to shift and combine bytes, but nothing to sneeze at either. The background is only updated 32 tiles at a time every (IIRC) 16 VBLANKS, but not during VBLANK, so not super-critical.

Another area I can look at utilising a lookup table is the mapping from video address to Neo Geo sprite and tile offset, and that would apply to both videoram and colorram accesses for both foreground and background layers. Those sections of code are around 200 cycles each atm, so probably worth implementing. The table would be sizeable (some $800 words) but on the Neo Geo, not an issue.

But the big gains will come in the scroll routine; right now it's particularly brain-dead. In my very early attempts to implement scrolling I actually had a much more efficient (albeit ultimately inadequate) routine. But I will be able to adapt that same implementation now that I have other aspects sorted.

Once thing I would like to do is work out how much time is spent in the VBLANK ISR, and how much time the MAIN and SUB programs are idling waiting for the next VBLANK. On other platforms there are tricks such as changing the border colour on-the-fly to give a visual gauge... and I seem to recall I did something similar on Scramble on the Neo Geo?!? I'll have to look into it...

Oh and I did find one (more) bug in the transcode; achieving a new high(est) score doesn't copy the new score or clear the old name on the entry screen. That's four (4) that I know of thus far (I should write them down in my project notebook).

No comments:

Post a Comment