Tuesday, 17 May 2016

Some lunchtime progress; draw and erase 'simple' sprite routines. These draw, and erase, byte-aligned sprites respectively.

Byte-aligned sprites are rendered/reased

Rotating is all the more complicated (=slow) without an 8-bit rotation on the 6809.

7 comments:

  1. Would a table help? Might be overkill, but you should have plenty of space.

    ReplyDelete
  2. Great minds think alike (or is it, fools never differ?)

    I tried the lookup table yesterday, I think it's a little slower, plus uses an extra index register. The table's still there, so once it's all working I'll go back and do a proper cycle count to see which is more efficient.

    But for now I came up with:
    ldb ,y
    rorb
    pshs cc
    rola
    puls cc
    ror ,y

    I need an 8-bit rotate because the original video data needs to be preserved after the rotation is complete. One other option is to copy 8 bytes to an intermediate buffer and rotate there.

    But for now, I want the 'easiest' code option just to get it all working, and then I'll look at optimisation. I must admit it's all looking like more cycles than I bargained for.

    I still haven't ruled out going back to plan A.

    ReplyDelete
    Replies
    1. It looks like you need:
      ROR8 [Y]
      ROL8 A

      Your routine takes 26 cycles.

      Could this not work?
      LDB ,Y ; 4
      RORB ; 2
      ROR ,Y ; 6
      ROLA ; 2
      (14 cycles)

      Delete
    2. Awesome suggestion thanks! Just tried it and it works a treat - and the ISR is not taking a whole frame anymore. I'm sure there's a few more optimisations I can make too. Clearly I need to lift my game. :(

      Once Space Invaders is working properly and released, I need to go back and optimise Knight Lore... hopefully I'll do a better job than I have thus far on Space Invaders.

      Delete
    3. There are a few tricks that can be used. PSHr, PULr and (especially) TFR can be pretty slow.

      PSH/PUL are best when moving more than one register as they take "5+number_of_bytes" cycles.

      Sometimes it is better to push a loop counter onto the stack and DEC ,S instead of having something like PSHS A/routine/PULS A/DECA.

      Remember that PC is one of the registers! So you can replace:

      PULS A ; 6
      RTS ; 5

      with:

      PULS A,PC ; 8


      Some TFR instructions can be replaced with LEA to save two cycles.

      TFR Y,U ; 6 (2)
      LEAU ,Y ; 4 (2)

      Delete
    4. The fastest I can think of is the X-flip I use in my graphic routines.

      XFLIP is a table of 256 bytes, containing the flipped byte for each index. But as LDAr r,X is using r as a signed-offset, the table is ordered from $80-$FF, then $00-$7F instead of the usual $00-$FF.

      Then I would do something like:
      LDX #XFLIP+$80 ; 3
      LDB ,Y ; 4
      LDA B,X ; 5

      Which is 12 cycles for one, or 9 cycles if you can keep X set up from earlier on.

      You could just use the A register with this method and have B free for other things.

      You did say that using up one of the index registers might be a problem, but I thought I would mention it anyway!

      Delete
  3. I take it you need to do something like take 8 bytes of column-oriented bits and turn them into 8 bytes of row-oriented bits?

    My 6809 skillz are rusty, but looking over "The 6809 Companion" I think the fastest would involve 8 tables of 256 bytes each. tableN[K] = 1 if bit N of K is clear, 0 otherwise. Take your source bytes and store them as the 8 bit offsets in this loop:

    ld a,#8
    ld x,#table0
    loop:
    ldb B0,X
    lslb
    orb B1,X
    lslb
    orb B2,X
    lslb
    orb B3,X
    lslb
    orb B4,X
    lslb
    orb B5,X
    lslb
    orb B6,X
    lslb
    orb B7,X
    stb ,U+
    leax 256,X
    deca
    bne loop

    Those 8 bit offsets are signed so that table definition is a little trickier and you'll want to start with "ldx #table0+128". Some other details to work out for sure, but I think the idea is sound.

    ReplyDelete