Friday 26 August 2016

Keep them doggies unrollin'

More incremental optimisations...

The most significant, from an effort point-of-view at least, was unrolling the shifted (non-byte-aligned) sprite rendering routine. That took a bit to get right. Previously it was also reading & writing to each video byte twice; that's now remedied too. FWIW it makes <1fps difference on the 'moving block' screen.

It's worth noting that it requires 75 cycles to render a single (shifted) byte. That entails reading 4 bytes from data memory, performing 4 table lookups (across 2 different tables) before a read-modify-write of a single byte in video memory.

I also opted to duplicate the (small) routine that calculated the video buffer address from X,Y position. It was originally returning the result in U, however the code always then transferred it to either X or Y, depending on whether it was used as a source or destination pointer. 16-bit register transfers are actually surprisingly expensive (6 cycles) and one case required two transfers to preserve U as well. I also optimised the calculation itself to save a few cycles.

I really need to profile the code properly to identify the bottlenecks. Chipping away at the more obvious optimisations isn't having much of an effect on fps.

And just because I haven't posted any pictures for a while...

Showing the fps counter lower right (49fps)

There's also definitely some subtle graphics corruption, or rather, garbage. It appears to be limited to human->wulf transformations, and only when in certain orientations. I'll do more experimenting to nail down the exact conditions, then see if it can be reproduced on the ZX Spectrum...

EDIT: Another upside of unrolling the sprite rendering loops is that it should be easier to add support for CPC (4-colour) graphics!

4 comments:

  1. I'm now really interested to see how you will keep the Rawhide lyrics relevant to the posts. :)

    ReplyDelete
  2. Clearly the difficulty of matching Rawhide lyrics was just too much.

    Remember that Spectrum versions of games run quicker when they stick to uncontended RAM and use the stack pointer for copying. In other words - keep your memory access high and wide. ;)

    ReplyDelete
    Replies
    1. I'm trying hard not to waste a line in response to your comment... :P

      In seriousness, I'm looking at adding profiling to VCC to reveal the areas of Knight Lore that I need to focus on. Right now I'm trying to avoid having to install VS on my machine, and attempting to build it under MINGW/GCC. But getting the DLLs building is doing my head in... :(

      Delete