Just a quick update since it's late...
My profiler appears to be working for the most part, although any delusions I had about writing a generic 6809 profiler are pretty much dashed. I'll go into more details next post, but the nature of assembler makes it difficult - nay impossible - to identify the context (subroutine) of the executing code without some comprehensive code analysis (smells like a halting problem to me).
Regardless, with some inside knowledge of Knight Lore, I've got a pretty good handle on what's taking most of the CPU time now.
Addr Routine Count Cycles
---- ---------------- ----- ------
0xE97A calc_pixel_XY_ 1626 15175159 ( 58%)
0xE19E calc_display_o 249 2924962 ( 11%)
0xE8AD blit_to_screen 501 1081830 ( 4%)
0xD858 fill_window 521 845652 ( 3%)
0xE98F print_sprite 129 820025 ( 3%)
0xDB20 upd_16_to_21_2 235 561370 ( 2%)
0xE02E set_draw_objs_ 235 501364 ( 2%)
0xE12B save_2d_info 10210 459450 ( 2%)
0xC852 toggle_audio_h 461 411865 ( 2%)
0xE610 get_ptr_object 12960 401760 ( 2%)
0xE967 flip_sprite 2239 380273 ( 1%)
0xE144 list_objects_t 249 337709 ( 1%)
0xE799 update_screen 2 291910 ( 1%)
0xE7BC render_dynamic 249 249443 ( 1%)
0xE790 clear_scrn_buf 2 172068 ( 1%)
0xD6CF print_sun_moon 248 145080 ( 1%)
0xDA55 upd_2_4 498 131802 ( 1%)
(snip)
0xFEF7 _IRQ_ 904 38264 ( 0%)
(snip)
0xFEF4 _FIRQ_ 58 1334 ( 0%)
(snip)
---------------------- ---------
Total Cycles 26085292
That top routine is actually calc_pixel_XY_and_render(), which does some trivial calcs and then calls into print_sprite, which is obviously where all the time is spent!
More on this topic next post...
In practice, glaring bottlenecks are rare, most of the time you just have to work little-by-little eking out tiny optimisations.
ReplyDeleteIt can be difficult to understand the context of a function, but it may not always be necessary. So - don't try to understand 'em!
I've been in communications with the author of the Atari 800 port of Pentagram, and he has identified one or two optimisations that have made significant improvements (albeit modifying the way the rendering logic works) so it may well be the case that I am lucky enough to get away with only a few more changes! But yes, in general, once you eliminate the glaring bottlenecks - usually early on in the process - the rest is a matter of scrimping and scraping cycles here and there.
ReplyDeleteAs for context (and I'm not sure if this comment was primarily made to sneak in some rawhide lyrics), by "context" I mean call stack and for the purposes of profiling you definitely do need to know the context so that the cycles are attributed to the correct function. There'll be more explanation on this in the next post when I get time to do it.
Or when I work out how to work in another line from Rawhide... ;)
DeleteI would always sneak a bit of Rawhide in, even though they're disapprovin'.
Delete