Tuesday 31 January 2023

Transcode Audit Session #1

Tonight I started on the "audit" of the transcode. What I mean by this is a line-by-line review of the transcode, checking for:

  • consistency between Z80 and 68K code
  • consistency between Z80 and 68K annotations
  • missing logic/code from 68K transcode
  • micro-optmisations of 68K code

I'm only a very short way into the MAIN CPU program, but I've already been reminded of a few short-cuts I took in the transcode. Primarily, bulk initialisation (zeroing) of memory variables in a few places has not been transcoded. This is more likely to cause issues on subsequent games. And secondly, support for 2 players is completely untested, and possibly not completely implemented. For example, the code was not reading P2 inputs at all.

None of this is particularly problematic, but will take some time to implement and test in the process of completing the audit. Hopefully at the end of it I'll end up with less bugs than I have now rather than more!

Whilst I'm completing the audit, I'll probably set up the Amiga build toolchain and WinAUE and whatever else I need to be able to build and play the Amiga version - jotd is making good progress!

UPDATE: Session #2 - reorganised all the memory variables to be consistent with the original. This is necessary in order for the 'block' initialisation of memory areas to match up. A few isolated 'hack' initialisations I had made have now been removed. I also added (more) support for 2 player games - still WIP. Also fixed an erroneous annotation - makes more sense now.

UPDATE: Session #5 complete. Around 21% of the way through the MAIN CPU program, but there's a lot of data near the end so maybe actually ~25% of the way through the code? Most missing code has been for 2-player support, which I suspect is complete now (won't know for sure until the audit is done). I've also found & fixed a few subtle bugs along the way.

UPDATE: Session #6 complete. Verified pseudo random number routine (properly) - produces the same sequence as the Z80 routine. More annotations from Z80 RE and micro-optimisations.

Finished the first device (xvi_1.3p) of four (4) - so at least 25% of the way though the MAIN ROM.

Saturday 28 January 2023

Neo Geo implementation complete (transcode bugs aside)!!!

Sprite optimisation is - to the best of my knowledge - complete!

I had some time tonight to look at 2x2 tile sprite flipping, noticed that in my previous "shortcut" attempt I had made an error, fixed it... and it worked! I also verified that 1x2 sprites are never Y-flipped, so I don't need to handle this case (a similar strategy would have worked anyway).

Without going into detail, multi-level 'switch' statements have been avoided, as have look-up tables, with some convenient bit-wise operations. Quite neat in fact (and probably how the arcade hardware actually works). And the logic can likely be used in the Amiga implementation as well!

The Solvalou explosion is finally correct!

So this means that (sound aside) the Neo Geo implementation is now complete!

I will give it a final run, with a very close look at video quality, on the AES when I next get a chance.

Next task is a full audit of the RE & transcode - Z80 vs 68K - and fix the remaining few gameplay bugs in the process. This might take me a few weeks. Then I can finally dive into the Amiga implementation and try to learn a thing or two while I give jotd a hand to finish it off, before we work on the sound together.

Friday 27 January 2023

This is taking a flipping long time!

Scant time again but have managed to do some more sprite optimisation.

Technically, there's plenty of time to update the shadow registers before the next VBLANK, but I don't like unneceessarily inefficient code. I've done some optimisation as far as the current functionality is concerned, and I'm happy with that code now. Aside from the code itself, all unused Neo Geo sprites should now be deactivated. I'm yet to see how the Andor Genesis looks now with the 96 sprite-per-scanline limit... there still may be some tweaking I can do in this regard...

The only functionality left now is to handle flipping of 1x2 and 2x2 sprites. I've tried a few shortcuts but none were successful. I might have to resort to look-up tables, which I'm not averse to. Again, it's not critical that it's as efficient as possible - just decent code.

I'm estimating that I can knock this over in just one more good long session... maybe eary next week.

Then it's the relatively lengthy and somewhat tedious transcode audit and bug-fixes.

jotd has been making more progress on the Amiga; hopefully he'll be able to share a video of the background scrolling properly soon!

Monday 23 January 2023

When less sprites is more... good!

Just time for a quick 'n' dirty fix to the sprites, and already looking a lot better!

Each Xevious sprite has 2 Neo Geo sprites allocated, because the Xevious hardware can select double-width and double-height sprites. I thought I had coded the logic to deactivate the 2nd sprite for the (most common) 1x1 tile sprite case, but it turns out a bug meant it wasn't actually being deactivated. That explains why there were so many sprites hidden behind the display masks at the bottom of the screen!

To make matters worse, I wasn't directly deactivating unused sprites, but rather setting the coordinates in the object table and letting the subsequent logic handle the deactivation... which in hindsight probably wasn't sufficient. Regardless I now explicitly deactivate sprites in these cases.

As a result, the bottom of the screen is completely devoid of any unused sprites now.

This of course means there's more headroom before the 96 sprites-per-scanline limitation causes any issues. And I still have a trick up my sleeve if I need it...

When I do find the time, I'll go through the sprite shadow register update routine more closely and make some proper optimisations, and also add support for flipping 1x2 and 2x1 sprites.

[An aside: I think I just realised why the Bacura hit-box is off!]

Scrolling complete; only sprites to optimise now!

I'm happy enough with the scrolling on the Neo Geo now so I can move on. I could probably squeeze a few more cycles from the case where the scroll register is updated with a completely arbitrary value, (as opposed to scrolling the screen 1 pixel), but there's no point at all. In fact I had a bug in there which was benign because of the limited scenarios in which this case is executed; I almost didn't fix it (and maybe it's still not right?)

When testing on the AES it is difficult to ascertain whether or not the optimisations have made any difference to the video quality - as the sprites still need work - but it seems to me that there are actually less "sparklies" during the game and the video looks more stable. That could just be my imagination, but it's looking pretty good as-is.

The final piece of optimisation to do is the implementation of the Xevious sprites on the Neo Geo. Unused sprites are not properly deactivated and as a result affect the 96 sprite-per-scanline limit and I suspect also contribute to the "sparklies". And while I'm working on the sprite optimisations I also need to handle flipped double-width sprites properly.

I should note that the optimisations around the sprites are not specially for execution speed, as the core is idle now for between 67%-75% of the frame, and the code I will be optimising is updating the shadow copies of the sprite h/w registers in this period of the frame. So there is bucketloads of headroom on the Neo Geo; it's more about cleaning up the display and minimising the number of active sprites.

Once the sprites are done, that's it for the OSD (Nep Geo) layer (except sound).

I have a list of 8 "gameplay" bugs now; bugs that I would attribute to errors in the transcode of the core rather than the OSD (Neo Geo) layer. A few of them I should to be able to fix fairly quickly, a few others have me scratching my head. But they'll come after the above is complete (and before sound).

jotd has sent me a video of the core running on the Amiga with foreground, background and sprite layers at least partially implemented. It's pretty cool to see it running, and I have no doubt he'll be able to get it going pretty soon! He's also going to help out with the sound.

Wednesday 18 January 2023

Scrolling optimisation WIP

Quick update. Didn't really have the time to spare to work on Xevious tonight, but I did anyway,

I have optimised the scrolling in the 2 cases where it scrolls down by 1 pixel (map row boundaries being the 2nd case) - ie. 99.99% of the time. I didn't get to test the final version on the AES yet, but I have tested an earlier version which wasn't quite right. It looks right on MAME, and I'm hoping that it is.

The remaining case, where the scroll register changes by an arbitrary amount (used when changing attract screens and starting areas), probably doesn't require any optimisation, but while I'm working on scrolling I will improve it a little anyway.

That will just leave the sprite optimisation before I go back and audit the transcode and fix the remaining gameplay bugs.

UPDATE: I've just checked the latest build on the AES - looks like I've fixed the scrolling and it's about as smooth as it's going to get. Still have a lot of sparklies on the screen, but that could easily be caused by my sprites which are yet to be optimised. I just need to round out the 3rd case for arbitrary positioning of the scroll register first...

UPDATE: An optimisation that I think will finally get rid of the 'sparkles' on the title screen when running on the AES... checking that the scroll reigster has actually changed before doing any scroll updates! A few minor optimisations in the calcs and branches, but I've actually broken it a little more. Hopefully next session I can sort it and scrolling will be done!

Saturday 14 January 2023

Time is an illusion, Xevious time doubly so!

Lack of progess is not due to lack of motivation, but lack of time. From holidays back to work with a new boss and new tasks, kids needing to be entertained during the holidays, and training for a bike ride that is coming up all too soon... leaving absolutely no time for Xevious atm.

Good news is that jotd is starting to see some good results on the Amiga!

Foreground and background tilemap layers!

Hopefully I can find some time this coming week...

UPDATE: Did a bit of work on the optimised background scroll routine!

UPDATE: Barely got any time today but did fix one bug in the scroll...

Sunday 8 January 2023

Looking good on the AES!

I've fixed the black line (missing tiles) issue on the background. Turns out you need a rather lengthy delay after setting the VRAM address register before you attempt to read from VRAM. Much better to avoid having to read altogether, so I re-wrote the background color/video RAM access routines from scratch. In a way it's turned out even simpler than before; there's effectively a single routine to do both now.

Running on my AES with only a few minor glitches now

Whilst play-testing on the AES, I found one or two new gameplay issues. One is interesting - at one stage a bunch of Jara appeared on the screen, and as they changed direction they morphed into Toroids! Quite apt because they both have very similar, if not identical, flight patterns.

Another I have seen before, I just haven't had time to chase it down. At a certain point, every hit earns you extra Solvalou, which is reportedly the same bug that occurs when you reach 9,999,990 points!?!

And another I've just noticed now after coming back from leaving the game running (with invincibility enabled) for an hour or so; looks like some bullets aren't being destroyed when they leave the screen.

UDATE: I've ust realised those bullets are OK; they're from the Zoshi circling the invincible Solvalou. However there are a couple of other bullets just hanging stationary in mid-air - I'm not sure where those came from...

I'll leave the gameplay bugs until last whilst I'm focused on getting the game running glitch-free on real hardware. Next I'll tackle either the scrolling or optimising the sprite usage/lifecycle. I had to back out one change that reduced the number of Neo Geo sprites per line by 4, so I also need to work out why that was causing the last line on the foreground tile layer to disappear.

But overall, there's no show-stoppers and I think I'll have it running quite nicely within a few weeks. I've even started thinking about sound...

UPDATE: I've deactivated the invisible 4 rows (sprites) on the foreground layer - properly this time!

UPDATE: Enabled invincibility and left the game running for about 10 hrs today - it was running just as well at the end as it was at the start, so no gradual degradation or glitches or memory leaks it seems.

Friday 6 January 2023

Running on an AES again... good news... and bad news.

I ran the latest build of Xevious on my AES... and some very unexpected results!!!

First and foremost, the VBLANK ISR is running to completion during the vertical blanking period (no red line on the display). I have confirmed with a delay loop that extending the ISR does result in a red line. So that's good news.

However, the first 2 routines in the SUB CPU should be running in VBLANK as well... I need to reorganise the code slightly to make that happen. The 2nd of those routines is the as-yet unoptimised scroll routine.

Much to my amazement, the game is running at 100%! With the ISR running to completion, and both the MAIN and SUB programs running well within one frame, it seems there's plenty of overhead left on the Neo Geo when running the game. I guess the 12MHz 68K is up to the task of running code for two 3MHz Z80s - especially considering it's running around half the number of instructions!

One last unexpected result, and certainly an unwelcome result, black lines in the background layer. They start out once every screen or so, then seemingly increase in frequency to every few rows. At this early point I don't have much of a theory at all; I can't see how it could be caused by back-to-back VRAM accesses... so at a bit of a loss.

Still, the fact that the Neo Geo performance is more than adequate is by far the most significant result to come out of the experiment! Very happy to see it running at 100% and I haven't even got to the most important optimisation.

UPDATE: It seems the black lines are caused by reading from VRAM too soon after setting the VRAM address register. I confirmed this by adding a delay before the read in one of two places where this is a potential issue, and it improved the situation quite a bit. This has only started to happen after I've optimised the routines... I've done too good a job? 😜

I'm not seeing red, but that's not a good thing!

Some more optimisations today; look-up tables for foreground and background video sprite tile (SCB1) addresses per memory address.

Looking at the online profiler again, I've decided I can't make much sense of the numbers that it is reporting. It may have something to do with the RED numbers on half the lines, which are devoid of any cycle, read or write statistics. I'm guessing it doesn't like the GNU AS syntax...

So I can only guess as to whether or not the code is actually more efficient. Given the relative number of lines and the elimination of multiple shifts, I think it's a safe assumption that it is.

I also did a few optimisations in some of the SUB CPU ROM code, mainly optimising the number of registers pushed onto the stack, and changing some MOVEA.L instructions to LEA. Moreso things that I noticed in passing, than a concerted effort to optimise that code.

Then I had a look at what I did in Scramble for measuring how much time was spent in various routines, including the VBLANK ISR - I changed the backdrop colour. In order to see it on Xevious, I had to temporarily disable the display masking sprites top and bottom of screen. That in itself reminded me of some other optimisations that I need to make - the (Xevious) sprite management.

The MAIN and SUB programs each execute a table of routines once per VBLANK, and then spin waiting for the next VBLANK. So I hooked into 3 areas of the code to see how much (if any) idle time there was. 1st up was the VBLANK ISR in RED, 2nd the SUB program in BLUE, and 3rd the MAIN program in GREEN.

Execution times for SUB (blue) and MAIN (green)

As can be seen above, the results were both confounding and encouraging... confounding because there is no visible ISR (RED). Ordinarily that would be expected because it is running within VBLANK, but in this case I know that's not the case - at least on real hardware - and I would have expected to see quite a bit of red here given that it runs at 50% on my AES.

The BLUE and GREEN bars are very encouraging though, because it shows that they're executing well within one frame, and there's plenty of headroom even if the ISR takes longer than the VBLANK period. However I do need to split the SUB CPU into code that should run during the VBLANK, and code that can run during a frame... not sure yet exactly how I will achieve that...

Next step is to run this on my AES, and see what happens to the RED line... I suspect that MAME is not emulating the VRAM acess timings for the Neo Geo sprite hardware. I guess I'll find out soon enough!

Thursday 5 January 2023

Starting on the Neo Geo optimisations

I've started on the Neo Geo optimisation in the last few days. Starting with the low-hanging fruit.

I did notice that the foreground tilemap used 40 Neo Geo sprites for 36 visible rows; the first four (4) rows are not visible. So I unchained all the sprites, and made the first 4 inactive, which means they no longer factor in the 96 sprites-per-scanline limit. Although those rows are only written when clearing the entire layer, retaining those sprites (even if inactive) means one less compare each time the layer is accessed.

I should be able to remove another handful of sprites from the foregound layer for rows that are never written - I just need to work out which rows they are. One option would be to deactivate all foreground layer sprites until a non-blank tile is written to that row... a few cycles overhead is not critical on the foreground layer as it is never written during VBLANK (IIRC).

Next I looked at the colorram (attribute) routines for the foreground and background layers, as they're quite lengthy. The sections of code that shuffle bits around between the arcade hardware and the Neo Geo hardware looked like they would benefit from a table look-up. So I set about coding those.

It is far preferable (IMHO) that the look-up tables are generated on-the-fly in the game's platform initialisation routine, than an external tool that generates .DB statements for inclusion in the source. I would have thought this decision was a no-brainer - until I had a 'debate' with another author a few years back that did the exact opposite! In fact the author (who I won't shame) couldn't even wrap his head around the concept of generating the tables on-the-fly, claiming "it couldn't be done". Hmm...

The foreground colorram routine requires a 256-word look-up table to translate an 8-bit attribute byte to a 16-bit Neo Geo attribute word. According to this handy site I discovered tonight, the execution time went from 156 cycles to 32 cycles (~20%). Not too bad at all!

The background colorram routine requires a 512-word look-up to translate a 9-bit attribute (combined from 2 bytes) to a 16-bit Neo Geo attribute word. The execution time in this case went from 180 cycles to 84 cycles (~47%). Not quite as good because I had to shift and combine bytes, but nothing to sneeze at either. The background is only updated 32 tiles at a time every (IIRC) 16 VBLANKS, but not during VBLANK, so not super-critical.

Another area I can look at utilising a lookup table is the mapping from video address to Neo Geo sprite and tile offset, and that would apply to both videoram and colorram accesses for both foreground and background layers. Those sections of code are around 200 cycles each atm, so probably worth implementing. The table would be sizeable (some $800 words) but on the Neo Geo, not an issue.

But the big gains will come in the scroll routine; right now it's particularly brain-dead. In my very early attempts to implement scrolling I actually had a much more efficient (albeit ultimately inadequate) routine. But I will be able to adapt that same implementation now that I have other aspects sorted.

Once thing I would like to do is work out how much time is spent in the VBLANK ISR, and how much time the MAIN and SUB programs are idling waiting for the next VBLANK. On other platforms there are tricks such as changing the border colour on-the-fly to give a visual gauge... and I seem to recall I did something similar on Scramble on the Neo Geo?!? I'll have to look into it...

Oh and I did find one (more) bug in the transcode; achieving a new high(est) score doesn't copy the new score or clear the old name on the entry screen. That's four (4) that I know of thus far (I should write them down in my project notebook).

Monday 2 January 2023

Reminiscing, removing dependencies and preparing to optimise.

Interesting to go back and read my blog posts from the beginning! So much that I'd forgotten about, and quite obviously a scary amount of hours put into retro ports thus far. Doesn't seem like I have a lot to show for it though... 😒

I didn't read them all today, but I will have to go back and read all my entries for Donkey Kong when the time comes. If the Lode Runner and Knight Lore posts are anything to go by, it will go a long way towards reminding me of what I've done and how I've done it. No doubt I get a lot more out of my own blogs than anyone else reading them...

So back to Xevious today and I finally removed the dependencies on the Neo Geo layer in the main code. All the hit-box calculations are done using the sprite shadow register values which are - for efficiency reasons - in OSD (Neo Geo) format. In order to finish the transcode ASAP, I simply left it that way, with a note to revisit it when it was time to optimise.

All the hit-box calculations use the 8 most significant bits of the X/Y register. On the original hardware, the most significant bit of the X register had to be rotated in via the carry to produce X[8:1] every time it was referenced in a hit-box calculation. On the Neo Geo, X had to be shifted down by 8 bits, while Y had to be shifted down by 7 bits. Neither solution is optimal, so I opted to create another (platform-agnostic) copy of the X/Y register values exclusively for hit-box calculations (I already had to create a 2nd copy of X a while back to handle 2x2-tile sprites anyway).

Since this is done as the last operation in the SUB CPU main loop before spinning waiting on VBLANK, timing isn't critical. This also had the added bonus of simplifying the 68K code whenever they were referenced, as I combined the previously separate X&Y values into a single array, and eliminated the need to do word operations or any pre-shifting.

It should also allow other platforms (Amiga) to start implementing sprites now.

Now I can focus solely on getting the Neo Geo port to run at 100% with no disappearing sprites. The few remaining bugs will sort themselves eventually. I haven't actually profiled any of the code yet, but I do know the critical sections will be scrolling and sprite h/w updates, as they're both done during VBLANK. Scrolling is currently horrendously inefficient, so I'm hoping for big gains there.

Foreground and background access routines are pretty horrid as well - and they disable interrupts whilst executing by necessity - but there's not a lot of that happening during the game. So I'm not sure how much can be done there.

For disappearing sprites (96 per scanline limitation) I've thought a little about how I can mitigate that; by ensuring that unused sprites are inactive (in the Neo Geo h/w) and also turning off some foreground sprites that aren't used during gameplay.

The fun is over. Now let the fun begin!

New Year check-in. Holiday is over and now it's time to recover from it! Will take a few days to get back into a routine. I also have to sort out my dead hard disk and set up my desktop computer - yet again.

No more work on Xevious since the last update but instead I did a bit of work on Donkey Kong. I updated the tools to convert the graphics and changed the rotation of all the tiles to match Xevious. I also updated the project structure in preparation for separating the core and Neo Geo code in the same way I have done for Xevious. I'm not sure how much work this will entail as I haven't looked at my Donkey Kong code for exactly 9 years now.

As it stands, Donkey Kong is running as I left it all those years ago, with the new rotation and project structure. And here it will probably remain while I revert my focus back to Xevious in the coming weeks.

Quick 'n' dirty fixes to get it running again

As I mentioned in earlier posts, time to fix a couple of minor bugs in the transcode and then onto wholesale optimisation of the Neo Geo code. And while that is happening, hopefully the Amiga port will start to take shape...

The core running on the Amiga, showing the foreground tile layer
Stay tuned!