Tuesday 31 August 2021

Galaxian Part 5 - Goals, Aims & Notes on the process

Before we dive into the actual transcode, I wanted to document a few things about the goals and aims of the project, along with some notes about the process.

Goals and Aims

First and foremost, the aim of the project is to document and preserve both the behaviour and the implementation of the original code which, not surprisingly, tend to go hand-in-hand.

Preserving the behaviour means that the transcode will look, sound and play identically to the original. This is pretty much implicit in a transcode, as opposed to a port, since you're translating the code from one language to another. It is in fact the next closest you can get to an emulation.

Documenting & preserving the implementation is also somewhat implicit in a transcode, but not necessarily 100% guaranteed. It is possible to take some liberties with the code, taking advantage of features and/or efficiencies in a higher-level language and more powerful host platform. A transcode that literally mimics Z80 register operations, for example, may be inherently accurate but is not particularly useful; it does little more than the original ASM code in terms of documenting the implementation and also obscures the underlying program design with a tediously verbose reimplementation.

It makes much more sense to make full use of aptly named C variables and data structures, C constructs such as if/else, switch statements and while/for loops, and passing parameters to C functions.

It's important though to avoid any code enhancements that modify the underlying algorithms, even if there are underlying inefficiencies or even bugs! If the ASM code happens to inadvertantly check the same condition twice within some code paths - replicate that in the C code. If the ASM code uses a clumsy sequence of instructions to calculate various conditions - replicate that in the C code. If there's a bug that causes stray pixels on the screen in some cases (Knight Lore on the ZX Spectrum) - replicate that in the C code! We want the authentic experience, not a sanitised one!

The real trick is finding the balance; implementing the original intention as faithfully as possible whilst taking advantage of the higher level language, optimising out the (generally few) insignificant details and preserving the important ones. And this is of course all subjective!

Secondly, another major goal of the project is to produce a high level implementation that is as easily ported to as many (generally more capable) systems as possible. Keep the C code as simple and as platform-independent as possible, avoiding potential data size and endianity issues. As a rule I use integer data types as defined in <stdint.h> and avoid bitfields altogether.

As an aside, the caveats I've encountered tend to be overflow and signed/unsigned nuances of the original CPU and the host platform C implementations. Certain 6502 condition codes for example are particularly mind-bending.

The process

It's worth documenting a few notes about the process that I've found to be useful along the way.

Any relatively complete reverse-engineering (RE) will have labels for all the memory locations in use, and names for all the functions. These labels and names we want to use verbatim in the C transcode.

In the past, and admittedly a legacy of the 6502 zero-page register space, I've created a C structure to contain all the memory variables in a program. This is perhaps a little verbose and can be terribly inefficient on some host platforms if you attempt to use alignment and/or packing to facilitate certain nuances of general assembly programming (eg clearing or copying blocks of memory). Still in two minds about this, but thus far it hasn't caused any insurmountable problems. Regardless, there's a C variable for each and every memory location referenced by the program, named exactly the same as the RE, generally of the same type & size (though sometimes optimised), and in ascending memory address order. This may be an important implementation detail.

Likewise, functions mirror those in the RE and have identical names, and are defined in the same order as in the original code. I even preclude each with a comment containing the ASM address for reference. The main difference is that functions may have arguments - typically but not limited to pointers and data values passed in CPU registers in the ASM code - and sometimes even a return value.

These practices make it easier to reference functionality back to the ASM source, either during development or perhaps at a later date by someone studying the implementation of the original code.

Over the course of the last few projects I've refined the process incrementally. Significantly, I'd start with the RE source file and prepend each and every line with a C++ comment. Effectively, the file would then compile to nothing. I'd then start the transcode by adding C functions at the corresponding lines in the file, just below the ASM routine, referencing the code as I wrote the C equivalent. Once each function was completed and debugged, I'd actually delete the ASM routine from the file. End result is pure C code.

At some point during the Scramble transcode, I found it useful to include snippets of the commented ASM code to document what I'd done and/or why I'd done it that way. Or to preserve unused routines or data for future reference. By the end of the transcode, I was considering that perhaps it would be advantageous to leave all the original ASM code in the file, even interleaved within blocks of code in the functions, to provide a definitive reference. And to possibly assist in identification of bugs in the transcode should they become apparent only some time after the fact.

And so for Galaxian, I think I'll start out on this route. By prepending with a C++ style comment and some obscure but visually subtle sequence of characters, the possibility to completely strip the ASM code from the C code with just a few keystrokes would be retained.

Hopefully what I've described here will become more clear when we get into the transcode and I can provide a few concrete examples of transcoded data variables and functions.

Next up, in what will likely be the last part before we get to the transcode proper, I'll detail the process of testing the framework in readiness to start on the real work.

No comments:

Post a Comment