Tonight, at the suggestion of the author of the Atari 800 Pentagram port, I thought I'd look at the masking logic. It certainly seemed like a good candidate, given the amount of time spent in the rendering routines and the fact that the masking requires two table look-ups per byte rendered.
Rather than try to optimise it, I thought I'd disable it altogether and see what effect it had on the performance. I was surprised, and a little disappointed, that it seemed to have very little in fact. Well, perhaps not completely insignificant at 4% in the 'busy' room, taking the frame rate from 8-9fps (see screenshot below) to a solid 9fps.
The infamous 'busy' screen, with fps counter |
And because I now had a metric, I disabled the Z-ordering again. I clearly underestimated the effect last time, because the performance jumped to 13fps, and a reduction in the corresponding routine (calc_display_order_and_render) from 25% all the way down to 2%.
I was now a little bummed that disabling masking and Z-order completely still resulted in 13fps, a few frames short of my target 15fps. Curious as to how this compared to the original, I fired up the ZX Spectrum and Coco3 emulators side-by-side and entered the 'busy' room on each.
To my surprise, the Coco3 (with no masking or Z-order) was significantly faster. So I re-enabled Z-order. It was still faster. So I re-enabled masking - back to the complete code - and it was still faster at 8-9fps!!! For this screen at least, I had been trying to optimise something that was already too fast!
So where does that leave the project? What I need to do now is visit a significant number of screens and compare them, side-by-side, with the ZX Spectrum original. If the Coco3 is running faster in every case, then my work here is done! In fact, I'll need to (re-enable and) tweak the throttling function so that the screens are all roughly the same speed.
EDIT: I think I'm going to back-port my fps ISR from the Coco3 to the ZX Spectrum!
Then there's the matter of beefing up the Z80 R register emulation, fixing a graphical glitch that is simply a result of not being able to emulate the Spectrum's attribute bytes, and we're done!
Livin' high and wide, one might even say!