Thursday 15 June 2017

Assemblers, undocumented instructions, and assumed addressing modes.

First order of the day; a helpful fellow developer has pointed me towards c2d, a command-line executable that creates a 'quick booting' Apple II .DSK file from a .BIN. So now simply typing 'make' assembles all my source and subsequently produces - in less than 1 second - an image I can boot in MAME.

Next: getting the arcade Asteroids source listing assembling in CA65. Not surprisingly IDAPro doesn't have direct support for the CA65 assembler. I briefly investigated the option of adding support via the IDAPro SDK, but it requires modifying and rebuilding the processor support module and I haven't have much success in doing so in the past.

Fortunately the supported SVENSON ELECTRONICS 6502/65C02 ASSEMBLER - V.1.0 - MAY, 1988 turns out to be a pretty close match; in fact, ultimately a single search-and-replace is sufficient to fix the pure syntax issues. [This is important since I will need to re-generate the source from IDAPro at some point in the future when I complete the reverse-engineering]. And once I explicitly defined the ZEROPAGE segment, only one syntax error remained.

The assembler had barfed on a DCP instruction. That didn't sound familiar to me, so I consulted my trusty ZAKS 6502 bible. No mention of it. Perhaps it has an alternate mnemonic? Google quickly revealed the problem - it's an undocumented opcode! After some further reading of the CA65 manual, I discovered a command-line switch to enable (some of) these opcodes. With relatively little effort, I now had the arcade Asteroids source code assembling under CA65!

I noticed, however, that the assembly was not producing the same number of bytes as the original, evident by the address of the last (IDAPro auto-generated) label in the assembler output listing. Somewhat fortuitously as it turns out in this case, IDAPro (by default) auto-generates labels that contain the address, making it easy to spot a mismatch against the assembled address.

Tracing back through it, I found the first instance of a mismatch; the code was referencing a zero-page variable via absolute (16-bit) addressing. Since the syntax of CA65 doesn't make a distinction between the two, it was assuming zero-page addressing and generating a different (length) opcode. As it turns out, this is the case in no less than 7 instances throughout the code (most in the same subroutine). I suspect the original assembler did make a distinction, and the programmer simply used the wrong addressing mode a few times, or possibly moved a variable from RAM to the zero-page at a latter stage of development.

After some further Googling I found the solution - forcing absolute addressing for an instruction - buried in a post on the NESDEV forums.

Either way it makes no difference to the outcome, but I do (first) want to verify that I am able to produce an exact binary using CA65. And for authenticity, I would prefer it does run the exact same code as far as possible.

One last mismatch was another undocumented instruction - SKW - this time, unsupported by both IDAPro and CA65. IDAPro disassembled the 3 bytes into a single NOP, which of course CA65 in turn assembled to the single byte $EA. No choice in this case but to define three constant bytes in place of the instruction.

Finally, CA65 appears to produce the same number of bytes as the Asteroids ROM. Indeed, after some further munging I have been able to confirm, via binary file compare, that the output is identical.

The issue now is getting the segments and .ORG statements in order to load at the correct address in Apple DOS (right now it produces a contiguous binary that loads at $0000). For that I need to so some more reading, and experimenting. But decent progress thus far.

UPDATE: The binary produced by CA65 now contains only the Asteroids (ROM) code and loads at $6800 in the Apple IIe emulation under MAME. The initialisation code runs, and it loops waiting for the 'VBLANK' (NMI x4) interrupt - as you would expect on non-Asteroids hardware!

2 comments:

  1. Based on our emulators I don't think Asteroids uses any undocumented instructions. And I don't see the ones you mention in http://www.computerarcheology.com/Arcade/Asteroids/Code.html

    Seems likely that you have some data masquerading as code.

    ReplyDelete
  2. You are indeed correct. My fault for not finishing the RE, but I have a deadline for at least a demo version of the IIGS port so I'm getting a little ahead of myself.

    I should have noticed that both 'undocumented' instructions were near one-another. Turns out there's a single (AFAIK unused) data byte wedged into the code right after a BEQ. I keep forgetting that pretty much every instruction on the 6502 affects the Z flag, and in this case the BEQ always branches. Another tell-tale sign was a JSR into the same area at an operand address.

    Marking the offending byte as data and re-analysing that section of the code removed both undocumented instructions and also makes a little more sense!

    Thanks for keeping a keen eye out George!

    ReplyDelete