Both stacks in the 6809 build downwards and pop upwards. At any time the stack pointer points at the location above that into which the next byte will be pushed. So, the push sequence is; decrement the stack pointer, then enter the byte where the pointer now points. The pull (or pop) sequence is the direct opposite; the byte is read from the location pointed at by the pointer, and then the pointer is incremented,
e.g. the following are equivalent in function:
STA ,-S => PSHS A
LDA ,S+ => PULS A
Any information on the stack may be accessed using the indexed addressing
mode. The top of the stack may always be looked at using the operand S
(or U). Other bytes (or words) on the stack may be accessed using an
offset from S or U. This offset may be a constant or be held in a
register, A, B or D (a sixteen bit register pair),
LDA 4,S ; 5th byte down on stack
LDA B,U ; U offset by B
The stack pointers themselves may be manipulated using the LEAS and LEAU
instructions, which take only the indexed addressing mode. Either stack
pointer may be incremented or decremented using these instructions:
LEAS ,S+ ; Increments S
LEAU 4,U ; Adds 4 to U
Many operations may be done directly on data stored on the stack, using
the indexed mode, with U or S as the base register (after the comma). To
write re-entrant code which uses locations in memory to store temporary
variables, it is necessary to allocate this space on the stack to ensure
that each invocation of the routine gets a new area of memory for its
variables. Variables whose space is allocated way are called
"automatic" variables because the actual locations used are automatically
allocated each time the routine is called.
The stack may also be used to pass parameters to a routine and to receive the results. The calling routine must allocate space on the stack for the parameters and then call the subroutine. The subroutine then picks up the parameters from the stack, processes them and returns. The calling routine must then fiddle the stack, using LEAS, to restore the stack to its original state. If results are required, the calling routine must also allocate space on the stack for them, and deallocate the space when the subroutine has returned, as for the parameters. The locations for the parameters and results are then reached via a constant offset from S.
Number to be converted: $3F5A
Divide by ten: $655 remainder 8
Divide by ten again: $A2 remainder 1
Divide by ten yet again: $1O remainder 2
Divide by ten one more time: $1 remainder 6
Positively the last time: $O remainder 1
So $3F5A, when converted to decimal is 16218
The major drawback to this method is that the digits are
not generated in a convenient order. We would like them to
be generated most significant digit first, so that we can
print them as we produce them, but this method yields the
least significant digit first. So we need to reverse the
order of the digits before printing them. Needless to say
there are many different ways of achieving this end; we
could store the digits as a list in memory and then print
them out in reverse order, for example.
Another way would be to push the digits onto the stack as we generate them and then pop them off for printing when all the digits have been produced. Both these methods require that we maintain a digit counter to keep track of the number of digits we have generated, and the first method also requires a reserved area of memory for the use of the conversion routine. A third method will be explained below.
The routine first divides the number by ten and stacks the resulting digit. It then calls itself to print the more significant digits beiore unstacking and printing its own digit. For instance, let us print the decimal number 145 (in binary within the 6809 of course). The routine first divides by ten to yield 14 and a remainder of 5, which it stacks. It then calls itself, passing 14 as the number. The cailed routine divides 14 by ten to yield 1 and a remainder of 4, which it stacks. The third calling of the routine divides 1 by ten to get 0 and a remainder of 1. As the number is now zero, it prints the 1 and returns to its cailer. The caller pops 4, which it stacked previously and prints it before returning to its caller, which pops and prints 5. The first called routine then returns to the routine which originally called it, the task having been accomplished.
; Recursive Binary to Decimal Conversion routine.
; The number to be converted is passed in X.
; X returns as zero and A is destroyed.
; The existence of 'divten' and 'outdigit' is assumed
; but the routines are not given here. 'divten'
; divides X by ten, returning the result in X
; and the remainder in A. 'outdigit' outputs the digit
; in A to the terminal or whatever.
bintodec: jsr divten ; divide X by 10
pshs a ; save remainder
cmpx #0 ; finished?
beq print ; if so, print digits
jsr bintodec ; not done, get next digit
print: puls a ; retrieve digit
jsr outdigit ; print it
rts ; and return
Some features of this program are worth noting:When you start writing real programs, particularly in Machine/Assembly Code you soon discover the need for subroutines. These are sections of code which you want to use over and over again in much the same way that you use procedures in high level languages.
For example in the lab task to write software for the Pelican Crossing you need a delay routine as written in lab session 1
Set traffic lights to AMBER
Delay for 10 secs
Set traffic lights to RED
Delay for 2 secs
Set CROSS lights on and beeper on
Delay for 30 secs
etc.
Obviously it would be wasteful in terms of programming time and
memory to write out the delay routine each time it is required -
ideally we want to write it out once and call it up as needed.
This is exactly what we mean by a subroutine.
Jumping to the start of a subroutine is no problem at all - we have seen how branch and jump instructions can be implemented. The problem comes at the end of the subroutine when control has to return to the section of program from which it came.
Set traffic lights to AMBER
Call DELAY subroutine
Set traffic lights to RED
Call DELAY subroutine
etc.
This means that when a subroutine is called the processor must
somehow store the address to which control should return - in
other words the address of the instruction which follows the call
subroutine instruction. There are many ways in which this problem
has been tackled.
1000 JSR $1200 000 0000
1003 Next Inst 0000
: 0000
:
1200 Start of SR1 001 1003
: 0000
: 0000
122A JSR $13A0
122D Next Inst 001 1003
: 0000
1235 RTS 0000
13A0 Start of SR2 002 1003
: 122D
: 0000
13D8 RTS
The disadvantage of this approach is the that no matter how
many registers are provided people will always want more and
it adds to the processor complexity. The advantage is that
it is fast since all transfers are internal to the
processor. (A number of Intel processors such as 8048 have
used this approach).Since addresses are 16 bits and memory locations in 6809 can only store 8 bits two locations are need to store each return address. The exact sequence of events when a JSR is executed are. E.g. If SP is initialised to $2000
(SP)-1 -> SP $2000-1 = $1FFF -> SP
(PCL) -> (SP) $03 -> $1FFF
(SP)-1 -> SP $1FFF-1 = $1FFE -> SP
(PCH) -> (SP) $10 -> $1FFE
When an RTS is executed the reverse process takes place
((SP)) -> PCH ($1FFE) = $10 -> PCH
(SP)+l -> SP $1FFE + 1 = $1FFF -> SP
((SP)) -> PCL ($1FFF) = $03 -> PCL
(SP)+l -> SP $1FFF + 1 = $2000 -> SP
Using this technique the user can allocate as much space as
is necessary for the return address stack. Notice, though
that there are no set boundaries so that if you had:-
$1000 JSR $1000
The whole of memory would be quickly used up by the stack.
This is an obvious situation which can be easily avoided but
there are more subtle ones which can have the same effect.
There is no standardisation between processors as to the order in which the H and L bytes of the PC are stored. Nor is the pre-decrement, post increment for JSR and RTS fixed.
Nevertheless this is the method of handling return addresses which is used in all 8 & 16 bit processors these days. Its only disadvantage is the length of time it takes to store the address bytes in external memory,
One solution is for the subroutine to store away the register contents in set memory locations before using them and to retrieve these values before returning. The problem is where should these values be stored? Ideally we are writing subroutines which can be used in many different programs or we may even be using other peoples or commercially available routines. If set memory locations are used they may clash with program or data areas in another application.
A better solution is to use the stack which the user always sets to be in an unwanted area of memory. At the start of the routine the contents of any registers which will be changed is stored on the stack and they are retrieved before the RTS is executed. Putting data onto the stack is known as Pushing data and retrieving it is known as Pulling. The 6809 has instructions for doing this known as Push (PSH) and Pull (PUL) - the codes for these are 34 and 35. Each of these has an additional instruction byte which indicates which registers are to be stored or retrieved as follows:-
LSBit 0 - Condition Code Register
1 - Accumulator A
2 - Accumulator B
3 - Direct Page Register
4 - Index Register X
5 - Index Register Y
6 - Stack Pointer
MSBit 7 - Program Counter
Notice that the first four are 8 bit registers requiring only one
byte to be stored and that the last four are 16 bit registers
requiring two bytes to be stored.
It is rarely necessary to store the PC and SP on the stack and obviously the more registers stored the longer the instruction takes (It's 5 cycles + 1 for each byte - 17 max).
34 Pushes all registers 34 Pushes A and X
FF 12
Clearly the number of bytes pushed at the start of the subroutine
must equal the number of bytes pulled at the end otherwise the
return address will not be correct and the program will crash. To
avoid this happening the 6809 provides two stack pointers U and
S, the User and System stack pointers. The JSR and RTS always use
the S pointer but the Push and Pull instructions can use either.
If the User stack is always used then there is little chance of
the return address being corrupted. The codes given are for the
system stack but the user stack works in the same way.