Still haven't determined the main bottleneck but have made more progress.
After tweaking a few routines to eliminate unnecessary branches and some direct-page & register juggling I returned my focus to the sprite rendering routine. I was recently reminded that the original Z80 code used a separate execution path for byte-aligned sprites, so I decided to optimise that section.
After adding code to test for and then render specifically byte-aligned sprites, I then unrolled the loop for this case. To make matters worse (or better, depending on how you look at it), I had unnecessarily transferred a loop counter to & from DP memory and register B when I need only decrement the in-memory value. So worst case - the widest sprite is 5 bytes - it was taking 65 cycles per sprite line (the tallest sprite is 64 lines) just to test and loop. Contrast that with just 26 cycles set-up per sprite, and no per-line test and loop!
Empty screens are now too fast to control (turning accurately is problematic) and my troublesome 'moving block' screen, which started out at 5fps, is now 9-10fps - and I'm yet to optimise the non-byte-aligned (shifted) sprite rendering logic! Having said that, patching the code to render all sprites as byte-aligned did little to improve the frame rate on this particular screen, though it does improve on other screens.
The above-mentioned 'moving block' screen appears to be puzzlingly slow, considering there's only a pair of blocks to animate. Patching the code again to remove the blocks did increase the frame rate by around 2fps, but I can see little opportunity to speed up this particular sprite handler.
However during that process, I noticed that a dozen or so routines branched (near/relative) to another routine that simply jumped (far/absolute) to a third routine. Whilst this could conceivably have been done to reduce code size (slightly) it is rather detrimental to performance, so I of course remedied the situation by 'removing the middleman' so-to-speak.
I've not compared the performance with the original Z80 code since starting on the optimisations, but I would guess that it's starting to approach it now. It would be ideal if I could exceed it slightly, then tweak the frame rate smoothing logic to bring it back on par with the ZX Spectrum.
Watch this space!
No comments:
Post a Comment