Thursday 25 August 2016

Unrollin' unrollin' unrollin'

Still haven't determined the main bottleneck but have made more progress.

After tweaking a few routines to eliminate unnecessary branches and some direct-page & register juggling I returned my focus to the sprite rendering routine. I was recently reminded that the original Z80 code used a separate execution path for byte-aligned sprites, so I decided to optimise that section.

After adding code to test for and then render specifically byte-aligned sprites, I then unrolled the loop for this case. To make matters worse (or better, depending on how you look at it), I had unnecessarily transferred a loop counter to & from DP memory and register B when I need only decrement the in-memory value. So worst case - the widest sprite is 5 bytes - it was taking 65 cycles per sprite line (the tallest sprite is 64 lines) just to test and loop. Contrast that with just 26 cycles set-up per sprite, and no per-line test and loop!

Empty screens are now too fast to control (turning accurately is problematic) and my troublesome 'moving block' screen, which started out at 5fps, is now 9-10fps - and I'm yet to optimise the non-byte-aligned (shifted) sprite rendering logic! Having said that, patching the code to render all sprites as byte-aligned did little to improve the frame rate on this particular screen, though it does improve on other screens.

The above-mentioned 'moving block' screen appears to be puzzlingly slow, considering there's only a pair of blocks to animate. Patching the code again to remove the blocks did increase the frame rate by around 2fps, but I can see little opportunity to speed up this particular sprite handler.

However during that process, I noticed that a dozen or so routines branched (near/relative) to another routine that simply jumped (far/absolute) to a third routine. Whilst this could conceivably have been done to reduce code size (slightly) it is rather detrimental to performance, so I of course remedied the situation by 'removing the middleman' so-to-speak.

I've not compared the performance with the original Z80 code since starting on the optimisations, but I would guess that it's starting to approach it now. It would be ideal if I could exceed it slightly, then tweak the frame rate smoothing logic to bring it back on par with the ZX Spectrum.

Watch this space!

No comments:

Post a Comment