6502

This section of my website is about coding for the 6502 family of processors (650x). There are several members of this family. In the 21^st Century, they are mainly used for embedded controllers. In the past, they were also used as controllers but more often as the primary CPU of products from Apple, Atari, Commodore, and Nintendo (either personal computers or game consoles).

Links

Here are some links to 6502 coding that you may find useful:

Multiply

The 650x processors do not have a genreric multiply (or divide) machine instructions; it must be done in software! (Binary multiply/division is possible with a simple Rotate Left or Rotate Right instruction.) A simplistic formula has frequently been used/published which is essentially the same as long multiplication (or division) which we all learned (I hope) in elementry school. Several alternatives have been attempted. For example, using logarithms which is analogous to using a slide rule. It can be fast but suffers from accuracy errors. Another method (I'm not sure who developed it) is using a neat algebraic equation: a*b = f(a+b) - f(a-b), where f(x) is simply x*x/4. This is perfectly accurate and much faster than "long multiply" routines! As with most programming designs, there is a cost of memory usage to gain that speed.

Below is the fastest code that I have developed (or have read about) to perform an 8-bit by 8-bit multiply (the product is 16 bits). It is for unsigned values. Signed numbers can be used if several changes are made (basically when a+b and a-b are calculated, the overflow flag must be tested and adjustments made when that flag is set).

MultAX:

;Inputs:

; A = multiplicand

; X = multiplier

;both unsigned

;uses self-modification (not suitable for ROM)

;uses all registers (A,X,Y)

;Results:

A(low)

X(high)

STA getLow+1

STA getHigh+1

SEC

SBC idTab,X

BCS doMult

SBC #0

EOR #255

doMult:

TAY

getLow:

LDA sqrLow,X

SBC sqrLow,Y

PHA

getHigh:

LDA sqrHigh,X

SBC sqrHigh,Y

TAX

PLA

RTS

   .ALIGN 256 ;tables should start on page boundry

sqrLow:
   .BYTE {0*0/4, 1*1/4, 2*2/4, 3*3/4, 4*4/4, 5*5/4, ... 511*511/4} & 255

sqrHigh:
   .BYTE {0*0/4, 1*1/4, 2*2/4, 3*3/4, 4*4/4, 5*5/4, ... 511*511/4} / 256

idTab:
   .BYTE 0, 1, 2, 3, 4, 5, 6, ... 255

The code above requires (on average) 46 CPU cycles. The "long multiplication" method requires well over 100 cycles! So it is really a trade-off... do you want fast speed (1280=256*5 bytes for tables) or do you want small size (no tables, but 2x+ slow) ?? Only you can decide!!!

An alternate version which needs 256 fewer bytes for tables (only 1024 instead of 1280, or 20% fewer bytes) can easily be accomplished by slowing down the code by 2 cycles (so 48 instead of 46, or 4% slower) by changing this line:

SBC idTab,X

into this:

STX diff+1
diff:
SBC #0 ;operand is modified!

Some experienced 650x programmers may be wondering were addition (ADC) is being performed, since I already said it was based on f(a PLUS b) - f(a-b). Well, the answer to that is the use of the CPU's own addressing mode: index by X. In other words, instructions like LDA sqrLow,X are implicitly doing the addition (when the CPU adds the X register to the base address).

You can reduce the size of tables further (only 256 entries instead of 512) if you use real ADC (not the sneaky index by X trick just described) and apply fix-up code at the end. That version is a bit messy and about 50% slower so I won't show it here.

Mirror Byte

There are several ways to accomplish this task (reverse the bits within a byte). A lengthy discussion can be found here. To spare you all the details, below is a quick summary of the results from several contributors:

Name	Size (bytes)	Delay (cycles)	Speed (65536/cycles)	Effeciency (speed/bytes)	Power (speed²/bytes)
Generic	10	100	655	66	42.9k
Mafiosino	9	84	780	87	67.6k
H2Obsession	37	32	2048	55	113k
Mega-Table	260	6	10923	42	459k

Mafiosino has the best effeciency: about 3x slower than my code, but he uses 4x fewer bytes! Once again, it really depends on your priorty of speed versus size. In summary (0% bias), you should use Mafiosino if size is your primary concern, or a Mega-Table if speed is a primary concern... but I still believe my idea is great if you want to compromise. (I hate to muddy the waters, but there are other factors like RAM and Register use to consider too.)

Anyway, here are the codes...

Generic:

LDX #7

Loop:

ASL

ROR temp

DEX

BPL Loop

LDA temp

Mafiosino:

STA temp

LDA #1

Loop:

LSR temp

ROL

BCC Loop

H2Obsession:

TAY

AND #$0f

TAX

TYA

LSR

TAY

LDA revTab,y

EOR revTab,x

AND #$0f

EOR revTab,x

;....
revTab:
.byte $00, $88, $44, $cc, $22, $aa, $66, $ee
.byte $11, $99, $55, $dd, $33, $bb, $77, $ff

Mega_Table:

TAX

LDA revTab,x

revTab:
.byte $00, $80, $40, $c0, $20, $a0, $60, $e0 ...