section 13.6
let us assume the following worst case latencies for the blocks in our data path, the sum of which yields the execution latency for lw instruction:
instruction access 2 ns
Register read 1 ns
ALU operation 2 ns
Data cache access 2 ns
Register write back 1 ns
total 8 ns
Solution:
Taking all the details into consideration, we are creating different illustrations considering the questions asked,
Illustration 1 (According to given details in Section 13.6):-
Instruction Access = 2 ns
Register Read = 1 ns
ALU Operation = 2 ns
Data Cache Access = 2 ns
Register Write Back = 1 ns
Total = 8 ns = Clock Cycle Time
CPI for this Illustration is 1.
Execution Time = CPI * Clock cycle time = 8 * 1 = 8 ns
Illustration 2 (According to given details in Section 13.8 part a):-
Instruction Access = 2 ns
Register Read = 0.5 ns
ALU Operation = 2 ns
Data Cache Access = 2 ns
Register Write Back = 1 ns
Total = 7.5 ns = Clock Cycle Time
CPI for this Illustration is 1.
Execution Time = CPI * Clock cycle time = 7.5 * 1 = 7.5 ns
Conclusion :- Performance with respect to Illustration 1 = Execution Time of Illustration 1 / Execution Time of Illustration 2
= 8 / 7.5 = 1.06667
Thus, the performance of Illustration 2 is faster than Illustration 1 by 1.06667 times.
Illustration 3 (According to given details in Section 13.8 part b):-
Instruction Access = 2 ns
Register Read = 1 ns
ALU Operation = 1.5 ns
Data Cache Access = 2 ns
Register Write Back = 1 ns
Total = 7.5 ns = Clock Cycle Time
CPI for this Illustration is 1.
Execution Time = CPI * Clock cycle time = 7.5 * 1 = 7.5 ns
Conclusion :- Performance with respect to Illustration 1 = Execution Time of Illustration 1 / Execution Time of Illustration 3
= 8 / 7.5 = 1.06667
Thus, the performance of Illustration 3 is faster than Illustration 1 by 1.06667 times.
Illustration 4 (According to given details in Section 13.8 part c):-
Instruction Access = 2 ns
Register Read = 1 ns
ALU Operation = 2 ns
Data Cache Access = 3 ns
Register Write Back = 1 ns
Total = 9 ns = Clock Cycle Time
CPI for this Illustration is 1.
Execution Time = CPI * Clock cycle time = 9 * 1 = 9 ns
Conclusion :- Performance with respect to Illustration 1 = Execution Time of Illustration 1 / Execution Time of Illustration 4
= 8 / 9 = 0.88889
Thus, the performance of Illustration 4 is slower than Illustration 1 by 0.88889 times.
Illustration 5 (According to given details in Section 13.8 part a, b, c):-
Instruction Access = 2 ns
Register Read = 0.5 ns
ALU Operation = 1.5 ns
Data Cache Access = 3 ns
Register Write Back = 1 ns
Total = 8 ns = Clock Cycle Time
CPI for this Illustration is 1.
Execution Time = CPI * Clock cycle time = 8 * 1 = 8 ns
Conclusion :- Performance with respect to Illustration 1 = Execution Time of Illustration 1 / Execution Time of Illustration 5
= 8 / 8 = 1
Thus, the performance of Illustration 5 and Illustration 1 is the same.
section 13.6 let us assume the following worst case latencies for the blocks in our data...
1. Introduced by IBM with its System/360, the _________ is a set of computers offered with different price and performance characteristics that presents the same architecture to the user. 2. A large number of general-purpose registers, and/or the use of compiler technology to optimize register usage, a limited and simple instruction set, and an emphasis on optimizing the instruction pipeline are all key elements of _________ architectures. 3. The difference between the operations provided in high-level languages (HLLs) and those...
Consider a VEX-executing VLIW machine with the following characteristics: The machine supports 4 slots (4-wide machine) with the following resources: 2 memory units each with a load latency of 3 cycles 2 integer-add/sub functional units with a latency of 2 cycle 1 integer-multiply functional unit with a latency of 4 cycles Each functional unit in the machine is pipelined and can be issued a new operation at each cycle. However, the results of an operation are only available after the...
(15pts) Answer each of the following with a TRUE (T) or (1) Data in SRAM will be lost without refreshing frequently or FALSE (F) 9. (2) A cache is a small fast memory that stores a subset of the informatio (3) In set associative cache, ory block can be placed in only (4) The unit of data transfer between cache and main memory is a w (5) When CPU requests a word and cannot find it in cache, it a...
T F Xilinx's SDK assembler supports both FOR statements, but not wHILE statements T F In the ARM processor, immediate operands are stored in data memory, and not in the opcode T F In ARM processor instructions, one but not both operands can come from main T F In the ARM processor, a single load/store instruction T F It is possible for a microprocessor to use a virtual TCache memory is typically much faster and much larger than main memory...
Multiple Choice Multiple Choice Section 4.1 Pointers and Dynamic Memory Consider the following statements: int *p; int i; int k; i = 42; k = i; p = &i; After these statements, which of the following statements will change the value of i to 75? A. k = 75; B. *k = 75; C. p = 75; D. *p = 75; E. Two or more of the answers will change i to 75. Consider the following statements: int i =...