If we can get the computer to do a simple translation like hex to binary then we can get it to do a slightly more useful translation of mnemonics to machine code. The mnemonics are easily remembered shorthand versions of the instructions which we have already met LDA, LSR, ADDA etc. A program which converts mnemonics to machine code is called an Assembler and the language based on these mnemonics is known as Assembly Code.
The Assembler uses a look-up table to convert each mnemonic into the appropriate code. An instruction like DECA is simple the code is 4A. With an instruction like LDA, however, there are many possible codes depending upon the addressing mode. The program must obviously specify which mode should be used.
LDA #$FF - Immediate addressing 86 LDA <$56 - Direct Page addressing 96 LDA $FE00 - Extended addressing B6 LDA $0F,X - Indexed addressing A6 LDA [$2020] - Indirect addressing A6Thus there is a precise format for the 'address' part of the instruction which indicates which mode to use. In the last two cases the instruction code is the same but the 'post-byte' or second byte of the instruction is different.
So the first thing we can get the assembler to do is to look up the instruction codes for us - but in fact it can do a great deal more.
We have all had problems with the relative addressing used in Branch instructions even though it is really a fairly simple calculation. The assembler can do this calculation for us.
Assembler Input Assembler Output (Source Code) (Object Code) $0030 LDX #$C646 8E C6 46 $0033 LEAX -1,X 30 1F $0035 BNE $0033 26 FCBy now this will be a very familiar loop. The assembler will work out the twos complement of the difference between the address of the next instruction and the destination address. Moreover the assembler will use the normal or short branch instruction with an 8 bit offset or the long branch if the offset is outside the range -128 to +127.
The next thing we can get the assembler to do is to keep track of all the addresses in the program. Rather than specifying the destination of a branch explicitly as an address we can give the appropriate location a name and simply refer to the name from then on. The assembler works out what actual address the name corresponds to and works out the offset accordingly.
ORG $0030 ;Program starts at $0030 LDX #$C646 LOOP LEAX -1,X BNE LOOPThe Assembler works out that the address of the instruction labelled LOOP is $0033 and uses this number in the calculation of the offset. This makes editing the program a great deal easier since we can add lines and the assembler will work out all the new addresses. We can have as many labels like this as we like and they can refer to instruction locations as shown here, or any other locations in memory.
ORB EQU $FE00 INA EQU $FE01 DDRB EQU $FE02 DDRA EQU $FE03 ORG $0020 LDA #$FF STA DDRB LOOP1 LDA INA STA ORB BRA LOOP1The first four statements simply equate the label to the value so that whenever that label is encountered again the appropriate value is substituted. This means that labels can be defined just once at the beginning of the program. If we wish to change the actual addresses then we only have to change the definition statements and re-asssemble the program for it all to work correctly.
The assembler works by reading the source code program and building up a symbol table which lists all the labels and their equivalent values. This causes no problems with the programs listed so far, but we often want to refer to locations by label before they have been encountered.
ADDA 0,X BCC NOCARRY INC MSBYTE NOCARRY DECB etc.Here the label NOCARRY is encountered by the assembler before it has been given a value. In addition the assembler does not know whether the offset will fit into 8 bits or 16 bits which in turn affects the addresses of all subsequent instructions and indeed the address of NOCARRY itself. To overcome this the assembler will read through the program a number of times. The first time it builds up the symbol table assuming a 16 bit offset will be required in a situation like that above. It then reads through again knowing the actual addresses and correcting the symbol table as necessary to take account of the fact that only an 8 bit offset is needed. It is of course possible that this shortening of the program brings offsets into 8 bit range that were not previously. So the assembler must keep on reading through until no further shortening is possible. It then does a final read through translating all the codes using the final version of the symbol table.
The assembler is therefore a fairly sophisticated software package. In some cases it is possible for the assembler to run on the computer system it is assembling programs for. E.g. the BBC microcomputer had a built in assembler. In the case of the simple lab kits there is not nearly enough memory to run such a large program, and so if we want to run an assembler we have to do it on a different computer system. In this case it is referred to as a Cross-Assembler. The assembler we have runs on the PCs in 15A and is such a cross-assembler. It is one produced by Motorola and the format of the source code conforms to the standard Motorola format which you may find in a number of books on the subject.
ORG $xxxx | Tells the assembler to enter the next instruction (or data byte) at the location xxxx. |
END | Marks end of source code. |
EQU | Equates a symbol to a value. |
FCB $0F | Form a Constant Byte - Allocates a memory location and enters the data specified. Can be used to create tables of values in memory. |
FDB $1234 | Form a Double Byte - Allocates a pair of memory locations and enters the data specified - MSByte first. |
FCC 'Message' | Form Constant Characters - Allocates sufficient memory bytes to hold the string enclosed in quotation marks. The ASCII values of the characters are entered into the following locations. |
RMB $20 | Reserve Memory Bytes - Reserves the specified number of memory locations without entering any data into them. |
; | Whatever follows on the line is a comment and should be ignored. |
In addition most assemblers will allow simple arithmetic to be carried out using the symbols and basic operators.
VIA EQU $FE00 ORB EQU VIA INA EQU VIA+L DDRB EQU VIA+2 DDRA EQU VIA+3 etc.Here we are defining the address at which the VIA is located and then defining the addresses of all the registers in the VIA with reference to that starting address. This means that if we want the software to run on a different system with the VIA at a different address all we have to do is change one line in the program and re-assemble it.
VIA EQU $8008Or if we have a table of values and wish to have a location which stores the number of entries in the table:-
TABLE FCB $0A FCB $15 ENDTAB FCB $7B NUMBER FCB ENDTAB-TABLE