Cache Memory

A program spends most of its time taking instructions from a localised area of memory in the short-term.

Programs also access a limited set of data operands within a limited time-frame.

This property of temporal locality is useful in implementing an efficient memory cache.

Microprocessor bus cycles

Microprocessors today have internal clock speeds of around 200MHz. Effective cycle times of 25ns are possible. However, these processors have to operate with longer cycle times when accessing external memory. For example, DRAM cycle times can be 70ns or so. This implies that a processor must insert one or more WAIT states when accessing DRAM.

However, SRAM is available with much faster cycle times than DRAM. The disadvantage of SRAM is its higher cost and higher power consumption for the same memory size. Therefore it is not economic or desirable to replace DRAM with SRAM.

By taking advantage of the property of temporal locality, a small SRAM array can be used as a cache which contains the most recently used instructions or data. When these instructions or data are next used (as they are likely to be, within a short space of time), they can be fetched from the cache rather than DRAM - so avoiding WAIT states.

Cache operation

When an operand is requested by the CPU, if this data is in the cache, then it is fetched directly from the cache at high-speed.

If the operand is not found in the cache, then it is fetched from main memory and sent to the processor. A copy of the operand is saved in the cache at the same time. In this way, future access to this operand can be from the cache at high speed.

The cache controller stores a TAG and the data. The TAG comprises the access rights and the address of the data. When the cache is full, a LRU algorithm (for example) is used to determine which item is overwritten.

Reading from the cache requires no special conditions; writing to memory requires more care.

When data is written to memory, either to the cache or to main memory, then the cache and main memory differ. This is a potential problem.

A write-through cache means that all WRITE cycles are to main memory and the cache copy of this data is marked invalid. In this way, if this data is subsequently read by the processor, it is fetched from main memory and the cache is updated.

Intelligent cache controllers allow writes to the cache as well, for best performance. However, if the cached item is ever replaced, then a separate bus cycle is required to update main memory with the revised data. Such cache controllers can perform memory accesses in the background.

When DMA (direct memory access) is used, then it is not the CPU which is changing that data in the cache, but some other device which is changing main memory data. In this case, the cache and main memory differ. To detect this, cache controllers need to adopt "bus snoop" protocols and mark cached items invalid when the cache and main memory are not coherent.

The cache can be disabled, to allow debugging; it can be frozen, cleared and burst-filled.


Back