Coupon Accepted Successfully!



Processor does all memory operations with cache.

  • Miss - If requested word is not in cache, a block of words containing the requested word is brought to cache, and then the processor request is completed.
  • Hit - If the requested word is in cache, read or write operation is performed directly in cache, without accessing main memory.
  • Block - minimum amount of data transferred between cache and main memory.

Temporal & Spatial Locality

There are two types of locality:

Temporal Locality (locality in time) If an item is referenced, it will likely be referenced again soon. Data is reused.

Spatial Locality (locality in space) If an item is referenced, items in neighboring addresses will likely be referenced soon

CPU execution time must factor in stalls from memory access-

Assume the on-chip cache responds within the amount of time allotted to it from the load/store or instruction fetch unit (e.g., 1 clock cycle)

CPU time = IC * CPI * Clock cycle time

IC * CPI must include all stalls so now we have to add memory stalls

IC * CPI = CPU clock cycles + memory stall cycles

Where IC We can view memory cycle stalls as number of stalls per instruction or per memory access (loads & stores have 2 memory accesses per instruction, all others have 1 memory access per instruction)

Memory stall cycles = IC × (misses / instruction) × miss penalty

Memory stall cycles = IC × (memory accesses / instruction) × miss rate × miss penalty

Hit Ratio:
 The ratio of the number of hits divided by the total CPU references to memory (hits plus misses) is the hit ratio.

Hit ratio = (no. of cache hits) / (Total no. of memory references)

Mappings Functions

  1. Direct Mapping :
    In this technique each block of main memory can only go to a single fixed position in cache. A single block of cache is a possible candidate for few fixed blocks of main memory.
    This mapping is expressed as :
    i = j modulo m where, i = cache block number
    j = main memory block number
    m = number of blocks in cache
    This mapping can be implemented by dividing the 24-bits of address into three parts. The least significant w-bits are used to identify a word within a block. The remaining s-bits are used to identify a main memory block among 2s blocks. This field of s-bits is again divided into two parts for cache : r-bits line field(least significant) and s-r bits tag field (most significant). The line field identifies one cache line among 2r lines of cache.
  2. Set Associative Mapping :
  • Address length = (s + w) bits
  • Number of addressable units = 2s + w words or bytes
  • Block size = line size = 2w words or bytes
  • Number of blocks in main memory = 602.png
  • Number of lines in set = k
  • Number of sets v = 2d
  • Number of lines in cache = kv = k × 2d
  • Size of tag = (s – d) bits

For all cases we consider the following configuration:

Cache Block size=4 bytes and data can be transferred in multiples of blocks,

Cache size = 64 KB, hence cache contains 214 Blocks (called “Cache Lines”)

Main memory block size = 4 bytes

Main memory size = 16MB, hence main memory contains 2
22 Blocks

Each byte of main memory is directly addressable by a 24-bit address (as 224 = 16M)

Address length = (s + w) bits

Number of addressable units = 2
s + w words or bytes

Block size = line size = 2w words or bytes

Number of blocks in main memory =
 607.pngwords or bytes

Number of lines in cache = m = 2r

Size of tag = (s – r) bits

Cache size equation

Simple equation for the size of a cache:

(Cache size) = (Block size) × (Number of sets)

× (Set Associativity)

  • Can relate to the size of various address fields:
    (Block size) = 2(# of offset bits)
    (Number of sets) = 2(# of index bits)
    (# of tag bits) = (# of memory address bits)
    – (# of index bits) – (# of offset bits)
    Cache Hierarchy
    Average access time
    = T1 + (1 – h1) [ T2 + (1 – h2)Tm ]


  • T1 = L1 cache access time (smallest)
  • T2 = L2 cache access time (small)
  • Tm = memory access time (large)
  • h1, h2 = hit rates (0  h1, h2  1)

Average access time reduces by adding a cache.

CPU time as

 = (CCCPU Execution + CCMemoryStalls) × tCC

Cache Performance

The memory-stall clock cycles come from cache misses.It can be defined as the sum of the stall cycles coming from writes + those coming from reads:

Memory-Stall CC = Read-stall cycles + Write-stall cycles, where




Cache Performance Formulas

Useful formulas for analyzing ISA/cache interactions :

(CPU time) = [(CPU cycles) + (Memory stall cycles)]× (Clock cycle time)

(Memory stall cycles) = (Instruction count) ×
(Accesses per instruction) × (Miss rate) × (Miss penalty)

You Can split access time into instructions & data:

Avg. mem. acc. time = (% instruction accesses) × (inst. mem. access time) + (% data accesses) × (data mem. access time)

Another simple formula:

CPU time = (CPU execution clock cycles + Memory stall clock cycles) × cycle time

Factoring out Instruction Count




Test Your Skills Now!
Take a Quiz now
Reviewer Name