This blog chronicles my progress porting various retro games to other retro platforms. The goal in each project - at least when targeting a new CPU - is to effectively replicate the original graphics and the original code line-by-line, to produce a 100% accurate port of the original game.
Tuesday 17 May 2016
Some lunchtime progress; draw and erase 'simple' sprite routines. These draw, and erase, byte-aligned sprites respectively.
Byte-aligned sprites are rendered/reased
Rotating is all the more complicated (=slow) without an 8-bit rotation on the 6809.
Great minds think alike (or is it, fools never differ?)
I tried the lookup table yesterday, I think it's a little slower, plus uses an extra index register. The table's still there, so once it's all working I'll go back and do a proper cycle count to see which is more efficient.
But for now I came up with: ldb ,y rorb pshs cc rola puls cc ror ,y
I need an 8-bit rotate because the original video data needs to be preserved after the rotation is complete. One other option is to copy 8 bytes to an intermediate buffer and rotate there.
But for now, I want the 'easiest' code option just to get it all working, and then I'll look at optimisation. I must admit it's all looking like more cycles than I bargained for.
Awesome suggestion thanks! Just tried it and it works a treat - and the ISR is not taking a whole frame anymore. I'm sure there's a few more optimisations I can make too. Clearly I need to lift my game. :(
Once Space Invaders is working properly and released, I need to go back and optimise Knight Lore... hopefully I'll do a better job than I have thus far on Space Invaders.
The fastest I can think of is the X-flip I use in my graphic routines.
XFLIP is a table of 256 bytes, containing the flipped byte for each index. But as LDAr r,X is using r as a signed-offset, the table is ordered from $80-$FF, then $00-$7F instead of the usual $00-$FF.
Then I would do something like: LDX #XFLIP+$80 ; 3 LDB ,Y ; 4 LDA B,X ; 5
Which is 12 cycles for one, or 9 cycles if you can keep X set up from earlier on.
You could just use the A register with this method and have B free for other things.
You did say that using up one of the index registers might be a problem, but I thought I would mention it anyway!
I take it you need to do something like take 8 bytes of column-oriented bits and turn them into 8 bytes of row-oriented bits?
My 6809 skillz are rusty, but looking over "The 6809 Companion" I think the fastest would involve 8 tables of 256 bytes each. tableN[K] = 1 if bit N of K is clear, 0 otherwise. Take your source bytes and store them as the 8 bit offsets in this loop:
Those 8 bit offsets are signed so that table definition is a little trickier and you'll want to start with "ldx #table0+128". Some other details to work out for sure, but I think the idea is sound.
Would a table help? Might be overkill, but you should have plenty of space.
ReplyDeleteGreat minds think alike (or is it, fools never differ?)
ReplyDeleteI tried the lookup table yesterday, I think it's a little slower, plus uses an extra index register. The table's still there, so once it's all working I'll go back and do a proper cycle count to see which is more efficient.
But for now I came up with:
ldb ,y
rorb
pshs cc
rola
puls cc
ror ,y
I need an 8-bit rotate because the original video data needs to be preserved after the rotation is complete. One other option is to copy 8 bytes to an intermediate buffer and rotate there.
But for now, I want the 'easiest' code option just to get it all working, and then I'll look at optimisation. I must admit it's all looking like more cycles than I bargained for.
I still haven't ruled out going back to plan A.
It looks like you need:
DeleteROR8 [Y]
ROL8 A
Your routine takes 26 cycles.
Could this not work?
LDB ,Y ; 4
RORB ; 2
ROR ,Y ; 6
ROLA ; 2
(14 cycles)
Awesome suggestion thanks! Just tried it and it works a treat - and the ISR is not taking a whole frame anymore. I'm sure there's a few more optimisations I can make too. Clearly I need to lift my game. :(
DeleteOnce Space Invaders is working properly and released, I need to go back and optimise Knight Lore... hopefully I'll do a better job than I have thus far on Space Invaders.
There are a few tricks that can be used. PSHr, PULr and (especially) TFR can be pretty slow.
DeletePSH/PUL are best when moving more than one register as they take "5+number_of_bytes" cycles.
Sometimes it is better to push a loop counter onto the stack and DEC ,S instead of having something like PSHS A/routine/PULS A/DECA.
Remember that PC is one of the registers! So you can replace:
PULS A ; 6
RTS ; 5
with:
PULS A,PC ; 8
Some TFR instructions can be replaced with LEA to save two cycles.
TFR Y,U ; 6 (2)
LEAU ,Y ; 4 (2)
The fastest I can think of is the X-flip I use in my graphic routines.
DeleteXFLIP is a table of 256 bytes, containing the flipped byte for each index. But as LDAr r,X is using r as a signed-offset, the table is ordered from $80-$FF, then $00-$7F instead of the usual $00-$FF.
Then I would do something like:
LDX #XFLIP+$80 ; 3
LDB ,Y ; 4
LDA B,X ; 5
Which is 12 cycles for one, or 9 cycles if you can keep X set up from earlier on.
You could just use the A register with this method and have B free for other things.
You did say that using up one of the index registers might be a problem, but I thought I would mention it anyway!
I take it you need to do something like take 8 bytes of column-oriented bits and turn them into 8 bytes of row-oriented bits?
ReplyDeleteMy 6809 skillz are rusty, but looking over "The 6809 Companion" I think the fastest would involve 8 tables of 256 bytes each. tableN[K] = 1 if bit N of K is clear, 0 otherwise. Take your source bytes and store them as the 8 bit offsets in this loop:
ld a,#8
ld x,#table0
loop:
ldb B0,X
lslb
orb B1,X
lslb
orb B2,X
lslb
orb B3,X
lslb
orb B4,X
lslb
orb B5,X
lslb
orb B6,X
lslb
orb B7,X
stb ,U+
leax 256,X
deca
bne loop
Those 8 bit offsets are signed so that table definition is a little trickier and you'll want to start with "ldx #table0+128". Some other details to work out for sure, but I think the idea is sound.