Base machine has a 2.4GHz clock rate. There is L1 and L2 cache. L1 cache is 256K, direct mapped write through. 90% (read) hit rate without penalty, miss penalty is 4 cycles. (cost of reading L2) All writes take 1 cycle. L2 cache is 2MB, 4 way set associative write back. 95% hit rate, 60 cycle miss penalty (cost of reading memory). 30% of all instructions are reads, 10% writes. All instructions take 1 cycle - except reads which take 1 cycle if data is in L1, but otherwise have a miss penalty
a) Calculate CPI for this machine.
b) Suppose we increase L1 cache to 512K. This improves our hit rate to 95% in L1 , but slows all reads to 2 cycles. What is the CPI with this change?
c) Changing the L1 cache to 4 way set associative increases the hit rate to 94% in L1 but added complexity of hardware means we must reduce the clock rate to 2.3GHz. What is the CPI in this case?
d) Which of the 3 machines is faster for the given instruction mix?
I have completed this problem in very detailed manner, If you like it. Please give thumbs up.
Step 1 :- Given data is
Base machine clock rate = 2.4GHz
L1 cache size is =256KB
L2 cache size is =2MB
Hit rate of read(Hr) is =90%
Miss penalty is =4cycles
Write takes =1 cycle
L2 cache is 4 way set associative
L1 cache is direct mapped
In L2 cache, hit rate is 95%
Miss penalty is 60cycles
30% read instructions,10% write instructions,60% other instructions
Step 2 := (a)
CPI of this machine =0.30(for read)*0.90(hit rate)*1cycle + 0.30(read)*0.10(miss in L1)*0.95(hit in L2)*4(miss peanlty) + 0.30(read)*0.10(miss in L1)*0.05(miss in L2)*60cycles(miss penalty) + 0.10(for write)* 1cycle + 0,60(others)*1cycles = 0.27 + 0.114 + 0.09 + 0.10 + 0.60 = 1.174 CPI
Step 3 :- (b)
Hit rate for read becomes = 95%
CPI of this machine = 0.30(for read)*0.95(hit rate)*2cycle + 0.30(read)*0.05(miss in L1)*0.95(hit in L2)*4(miss peanlty) + 0.30(read)*0.05(miss in L1)*0.05(miss in L2)*60cycles(miss penalty) + 0.10(for write)* 1cycle + 0,60(others)*1cycles = 0.57 + 0.057 + 0.045 + 0.10 + 0.60 = 1.372 CPI
Step 4 :- (c)
Hit rate for read becomes =94%
CPI of this machine =0.30(for read)*0.94(hit rate)*1cycle + 0.30(read)*0.06(miss in L1)*0.95(hit in L2)*4(miss peanlty) + 0.30(read)*0.06(miss in L1)*0.05(miss in L2)*60cycles(miss penalty) + 0.10(for write)* 1cycle + 0,60(others)*1cycles = 0.282 + 0.0684 + 0.054 + 0.10 + 0.60 = 1.104 CPI
Step 4 (d)
Among all 3 machines ,3rd machine is faster because its CPI is very less and faster in execution per instruction compares to other.
Base machine has a 2.4GHz clock rate. There is L1 and L2 cache. L1 cache is...
A particular (fictional) CPU has the following internal units and timings: 1. IFD: Instruction fetch + decode : 160 ps 2. RR: Register read 80 ps 3. ALU: 240 ps 4. MA : memory access: 160 ps (assuming cache) 5. RW : register write : 80 ps There are 5 basic instruction types: 1. LOAD : IFD+RR+ALU+MA+RW 720 ps 2. STORE: IFD+RR+ALU+MA : 640 ps 3. ARITHMETIC: IFD+RR+ALU+RW : 560 4. BRANCH: IFD+RR+ALU : 480 ps 5. MEMOP: IFD+RR+MA+ALU+MA :...
Systems Programming problem: Consider a processor with the following parameters Base CPI (no Memory Stall) Clock rate L1 miss rate L2 Direct Mapped speed L2 Direct Mapped miss rate L2 8-way set associative speed L2 8-way set associative miss rate 1. 1.5 2 GHz 12 cycles 3.5% 28 cycles 1.5% Main Memory Access Time = 50 ns Calculate the CPI with L1 only * Calculate the CPl with L1 and L2 Direct Mapped Calculate the CPI with L1 and L2...
2. Cache hierarchy You are building a computer system with in-order execution that runs at 1 GHz and has a CPI of 1, with no memory accesses. The memory system is a split L1 cache. Both the I-cache and the D-cache are direct mapped and hold 32 KB each, with a block size of 64 bytes. The memory system is split L1 cache. Both the I-cache and the D-cache are direct mapped and hold 32 KB each, with a block...
1. Consider a program that can execute with no stalls and a CPI of 1 if the underlying processor can somehow magically service every load instruction with a 1-cycle L1 cache hit. In practice, 5% of all load instructions suffer from an L1 cache miss, 2% of all load instructions suffer from an L2 cache miss, and 1% of all load instructions suffer from an L3 cache miss (and are serviced by the memory system). An L1 cache miss stalls...
Assume that the: Clock rate is 2 GHz, L1 access time is 1 cycle, L2 access time is 10 cycles, Memory access time is 100 cycles, L1 hit rate is 60%, L2 hit rate is 70%. What is the average memory access time? (4 marks)
Question 4 (10 pt). One difference between a write-through cache and a write-back cache can be in the time it takes to write. During the first cycle, we detect whether a hit will occur, and during the second (assuming a hit) we actually write the data. Let’s assume that 50% of the blocks are dirty for a write-back cache. For this question, assume that the write buffer for the write through will never stall the CPU (no penalty). Assume a...
a) Calculate the AMAT for a cache system with one level of cache between the CPU and Main Memory. Assume that the cache has a hit time of 1 cycle and a miss rate of 11%. Assume that the main memory requires 300 cycles to access (this is the hit time) and that all instructions and data can be found in the main memory (there are no misses). b) Let us modify the cache system from part (a) and add...
For gcc, the frequency for all loads and stores is 36%. Instruction cache miss rate is 5%. Data cache miss rate is 10%. If a machine has a CPI of 2 without memory stalls and the miss penalty is 40 cycles for all misses, how much faster is a machine with a perfect cache? increase the performance by doubling its clock rate. Since the main memory speed is unlikely to change, assume that the absolute time to handle a cache...
A new smartphone just out on the market has a L1 cache with an access time of 1 cycle, an L2 cache with an access time of 5 cycles and DRAM with access time of 30 cycles. The latest benchmarks indicate that for most applications the L1 hit rate is 80% and L2 hit rate is 95%. Compute the Average Memory Access Time for the memory hierarchy in this device. (More interested in the explanation of how to get the...
Question 4 - [25 Points] Part (a) - Average Access Time (AMAT) The average memory access time for a microprocessor with One (1) level (L1) of cache is 2.4 clock cycles - If data is present and valid in the cache, it can be found in 1 clock cycle If data is not found in the cache, 80 clock cycles are needed to get it from off- chip memory Designers are trying to improve the average memory access time to...