Not only are there 3 levels of table/data look-up involved in the rendering of a single character, there's more than a few instances of self-modifying code in the routine, and the character itself is rendered into a temporary buffer (in the zero-page area) before being copied to the screen. I suspect that's because of the above-mentioned sprite-masking required in some instances - which technically would have been possible directly within the rendering routine, but would've been a mammoth task to wrap one's head around!
I won't go into the gory details, but the character/tile graphic data is effectively 'tokenized' which requires a separate look-up table for every byte on every scan-line. And there's a set of these tables for each possible pixel-offset (7). The end result is that there's a lot more look-up table data than actual graphic (pixel) data!
The necessity behind all this is that - in raw form - pixel data for all 104 tiles would require at least 18,304 bytes, a good chunk of the available RAM in the Apple II. With this clever encoding scheme, tile look-up is a slow process but requires only 4,992 bytes of RAM.
Once I'd reverse-engineered the code, I modified my PC-based utility to render each and every tile for all possible shift values from the binary dump of the game.
|The complete set of 104 tiles (shifted by 4 pixels)|
Given the available tile-set, I now strongly suspect that the entire game is coded to render graphics purely via tiles, and that absolutely no independent bit-mapped graphic routines are used. That bodes very well for a Neo Geo port - or any other tile- and sprite-based system for that matter!
EDIT: Prior to this I have already encountered one routine that doesn't use the tiles - the straight line across the bottom of the screen - DOH! Regardless, this would be relatively simple to emulate with a system as powerful as the Neo Geo.
Next task is to convert the graphics to a more suitable - for 8-bit video systems - format, then code the Z80 and 6809 routines. I'll need less copies of the data - only 4 shifts (or even as few as 2 on the TRS-80 ) as opposed to 7 - and all tiles will require 22 bytes (as opposed to some tiles requiring 33 bytes on the Apple II). The rendering routines will also be a lot simpler!
EDIT: 8-bit tile data (4 shifts) is done - 9,152 bytes.