When processor designers consider a possible improvement to the processor data path, the decision usually depends on the cost/performance trade-off. In the following three problems, assume that we are starting with a data path from Figure 4.2, where I Mem, Add, Mux, ALU, Regs, D-Mem, and Control blocks have latencies of 500ps, 150ps, 30ps, 110ps, 240ps, 350ps, and 100ps, respectively, and costs of 1100, 40, 10, 90, 220, 2000, and 500, respectively. Consider the addition of a multiplier to the ALU. This addition will add 300ps to the latency of the ALU and will add a cost of 600 to the ALU. The result will be 5% fewer instructions executed since we will no longer need to emulate the MUL instruction.
5.1 What is the clock cycle time with and without this improvement
5.2 What is the speedup achieved by adding this improvement?
5.3 Compare the cost/performance ratio with and without this improvement.
Time Cost
I - Mem 500 1100
Add 150 40
Mux 30 10
ALU 110 90
Reg 240 220
D-Mem 350 2000
Control blocks 100 500
New multiplier is added having 300 ps with cost of 600
5.1.
Load instruction takes more clock cycle time
Without multiplier -
Path = PC -> I-Mem -> Reg -> Mux -> ALU -> D-Mem -> Mux -> Reg
Clock cycle = 500 + 240 + 30 + 110 + 350 + 30 + 240
= 1500 ps
With Multiplier -
Path = PC -> I-Mem -> Reg -> Mux -> ALU -> Mul -> D-Mem -> Mux -> Reg
Clock cycle = 500 + 240 + 30 + 110 + 300 + 350 + 30 + 250
= 1800 ps
5.2.
No speedup occurs. as we can see the clock cycle time is increased with improvement.
Given that 5% fewer instructions will be executed.
Speedup = (1500*100) / (1800*95)
= 0.87
5.3.
Total cost Without multiplier - 1100 + 220 + 90 + 2000 + 40 + 40 + 10 + 10 + 10 + 500 = 4020
Cost/Performance = 4020/1500 = 2.68
Total cost With multiplier - 4020 + 600 = 4620
Cost/Performance = 4620/1800
= 2.56
Performance Ratio = 2.68/2.56
= 1.04
Performance = 1.04 / 0.87
= 1.19
When processor designers consider a possible improvement to the processor data path, the decision usually depends on...