ADVANCED LOGIC DESIGN AND MICROPROCESSORS

Processor architectures (1)

The architecture chosen affects the efficiency of processing data, but the optimum architecture is dependant on the process.

Fast throughput is generally the aim and there are many approaches to this end.

Pipelining

This involves fetching data or instructions in advance.

It also implies processing operations in parallel.

Instruction fetch

Pipelining
When an instruction is fetched, the microprocessor often requires an internal cycle for execution. Therefore the bus is not used. 'Unused' bus cycles can be used to prefetch the next instruction.
- It is desirable not to have any unused bus cycles.
- Prefetched instructions may have to be thrown away when a branch occurs.
- Delayed branching is used to minimise wasted cycles. Here, a useful instruction is chosen to be executed (if possible) whilst the branch is calculated.
DRAM page modes
- Used with pipelining, the fast bus cycles shown in the diagram below will increase the rate at which data is moved across the bus.
- This is only useful when consecutive addresses are accessed, as is normally the case with fetching instructions.
Standard DRAM Read Cycle
e.g. for 100ns access time, the cycle time is 190ns.
Page-Mode Read Cycle for the same DRAM
for the same DRAM, the page cycle time is 70ns.

Bus bandwidth

For a 1MHz M6809 the bus bandwidth is 1Mbyte/second. This might be increased to 2Mbytes/second if DMA is used (but this bandwidth is not available to the processor).

MIPS (Million Instructions per Second) is a term which is not very meaningful when comparing processors - for example, a instruction may be very simple or highly complex, depending on the processor.

Early 8-bit processors normally had a performance of 1 MIPS or less. Today, well over 25 MIPS is readily achieved.

Take a 1MHz M6809 as an example. This cannot perform at 1 MIPS. Even a NOP instruction executes at only 0.5 MIPS!

The reason is clear in the following example of a 6809 instruction.

	e.g. LDA $3456

Only 20% of bus cycles move data,
20% of bus cycles are unused in this example, and
60% of bus cycles are for fetching the instruction.

To improve the efficiency of using the bus:

Eliminate unused bus cycles
- pipelining
- delayed branching
Free data bus of instructions
- Harvard architecture
- RISC architecture
- both of the above

Harvard architecture

4 main buses are used instead of two. These buses are maintained internally and pipelines are used, where the instruction and data piplines can be optimised independently.

RISC architecture

Reduced Instruction Set Computers (RISC) make better use of available bus bandwidth as instructions are kept short (one word).

This implies a reduced set of simple instructions. The reduction in the instruction decode and execute logic makes RISC processors smaller, cheaper and faster than their complex (CISC) counterparts.

RISC processors were developed from the observation that (in a typical system),
10 instructions accounted for 80% of compiled code and
30 instructions accounted for 99% of compiled code.

These figures arise from statistics derived from the frequency of use of machine-code instructions which were compiled from a high-level language (HLL). Since most code on 16/32 bit processors comes from HLLs, these observations are valid.

CISC processors (e.g. MC68020, 80386 and later processors in those families) had developed a large set of instructions and addressing modes and resemble a small "language" in themselves. This makes for easy compiler construction, but efficieny of use of the microprocessor instructions can be quite low, especially as compilers are often not sensitive to nuances in the code.

The cost of the extra instructions in a CISC processor, over and above the most frequently used set, is quite high. Each extra instruction makes the instruction logic larger and slower. Making the chip larger has an impact on its cost which rises exponentially with chip area (because of wafer defects).

As an example, comparable RISC and CISC processors may have 25,000 transistors occupying 5 sq.mm and 200,000 transistors occupying 16 sq.mm respectively.

Early RISC processors did no more than reduce the instruction set and simplify the support chips required.

Later RISC processors introduced more pipelining and aimed to achieve instruction execution in a single cycle.

Other architectures

DSP chips are optimised for digital signal processing functions. For example, multiplications - with the addition of a single-cycle multiplier. Multiplications can take over 50 cycles on a conventional processor.

Graphics chips are optimised for block data transfers and for performing Boolean operations on data.

Transputers are essentially RISC processors which are optimised for parallel processing. Special hardware is provided for inter-processor communications and a scheduler is provided to handle data dependancies.

Custom architectures can be constructed using VLSI or FPGAs which are now available with over 100,000 gates.

Back