Saturday, 17 September 2016

My heart's calculatin'

Just a quick update since it's late...

My profiler appears to be working for the most part, although any delusions I had about writing a generic 6809 profiler are pretty much dashed. I'll go into more details next post, but the nature of assembler makes it difficult - nay impossible - to identify the context (subroutine) of the executing code without some comprehensive code analysis (smells like a halting problem to me).

Regardless, with some inside knowledge of Knight Lore, I've got a pretty good handle on what's taking most of the CPU time now.

Addr   Routine           Count   Cycles
----   ----------------  -----   ------
0xE97A calc_pixel_XY_     1626 15175159 ( 58%)
0xE19E calc_display_o      249  2924962 ( 11%)
0xE8AD blit_to_screen      501  1081830 ( 4%)
0xD858 fill_window         521   845652 ( 3%)
0xE98F print_sprite        129   820025 ( 3%)
0xDB20 upd_16_to_21_2      235   561370 ( 2%)
0xE02E set_draw_objs_      235   501364 ( 2%)
0xE12B save_2d_info      10210   459450 ( 2%)
0xC852 toggle_audio_h      461   411865 ( 2%)
0xE610 get_ptr_object    12960   401760 ( 2%)
0xE967 flip_sprite        2239   380273 ( 1%)
0xE144 list_objects_t      249   337709 ( 1%)
0xE799 update_screen         2   291910 ( 1%)
0xE7BC render_dynamic      249   249443 ( 1%)
0xE790 clear_scrn_buf        2   172068 ( 1%)
0xD6CF print_sun_moon      248   145080 ( 1%)
0xDA55 upd_2_4             498   131802 ( 1%)
0xFEF7 _IRQ_               904    38264 ( 0%)
0xFEF4 _FIRQ_               58     1334 ( 0%)
----------------------        ---------
Total Cycles                   26085292

That top routine is actually calc_pixel_XY_and_render(), which does some trivial calcs and then calls into print_sprite, which is obviously where all the time is spent!

More on this topic next post...


  1. In practice, glaring bottlenecks are rare, most of the time you just have to work little-by-little eking out tiny optimisations.

    It can be difficult to understand the context of a function, but it may not always be necessary. So - don't try to understand 'em!

  2. I've been in communications with the author of the Atari 800 port of Pentagram, and he has identified one or two optimisations that have made significant improvements (albeit modifying the way the rendering logic works) so it may well be the case that I am lucky enough to get away with only a few more changes! But yes, in general, once you eliminate the glaring bottlenecks - usually early on in the process - the rest is a matter of scrimping and scraping cycles here and there.

    As for context (and I'm not sure if this comment was primarily made to sneak in some rawhide lyrics), by "context" I mean call stack and for the purposes of profiling you definitely do need to know the context so that the cycles are attributed to the correct function. There'll be more explanation on this in the next post when I get time to do it.

    1. Or when I work out how to work in another line from Rawhide... ;)

    2. I would always sneak a bit of Rawhide in, even though they're disapprovin'.