For gcc, the frequency for all loads and stores is 36%. Instruction cache miss rate is 5%. Data cache miss rate is 10%. If a machine has a CPI of 2 without memory stalls and the miss penalty is 40 cycles for all misses, how much faster is a machine with a perfect cache? increase the performance by doubling its clock rate. Since the main memory speed is unlikely to change, assume that the absolute time to handle a cache miss does not change. How much faster will the machine be with the faster clock?
An upvote would be really helpful!
Determine how much faster a machine would run with a perfect cache that never missed.
(I is # of instructions)
Instruction miss cycles = I x 0.05 x 40 = 2 I
Data miss cycles = I x 0.36 x 0.1 x 40 = 1.44 I
Total memory stall cycles = 2 I + 1.44 I = 3.44 I
CPIstall = 2 + 3.44 = 5.44
= 5.44 / 2 = 2.72
Since, how much faster a machine would run with a perfect cache that never missed = 2.72
Now,
Suppose that clock rate of the machine used in the previous example is doubled but the memory speed, cache misses, and miss rate are same. How much faster the machine be with the faster clock?
Since the clock rate is doubled, new miss penalty will be 2x40=80 clock cycles.
Total memory stall cycles = (0.05 x 80) + 0.36 x (0.10 x 80) = 6.88
CPIfast clock = 2 + 6.88 = 8.88
= (5.44*2)/8.88 = 1.23 times faster
For gcc, the frequency for all loads and stores is 36%. Instruction cache miss rate is...
Assume the miss rate of an instruction cache is 3% and the miss rate of the data cache is 5%. If a processor has a CPI of 2 without any memory stalls and the miss penalty is 120 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed. Assume the frequency of all loads and stores is 36%. *The size of the tag field-64- (n + m 2). ** The total...
Consider a processor with a CPI of 1.5, excluding memory stalls. The instruction cache has a miss rate of 1.5%, whereas the miss rate of the data cache is 3.5%. The miss penalty of the data cache is 80 cycles. The percentage of load/store instructions within the running programs is 25%. If the CPI of the whole system, including memory stalls, is 2.5, calculate the miss penalty of the instruction cache. Miss penalty of the instruction cache- Cycles.
Base machine has a 2.4GHz clock rate. There is L1 and L2 cache. L1 cache is 256K, direct mapped write through. 90% (read) hit rate without penalty, miss penalty is 4 cycles. (cost of reading L2) All writes take 1 cycle. L2 cache is 2MB, 4 way set associative write back. 95% hit rate, 60 cycle miss penalty (cost of reading memory). 30% of all instructions are reads, 10% writes. All instructions take 1 cycle - except reads which take...
Exercise 8.16 You are building a computer with a hierarchical memory systenm that consists of separate instruction and data caches followed by main memory. You are using the ARM multicycle processor from Figure 7.30 running at 1 GHz (a) Suppose the instruction cache is perfect (i.e., always hits) but the data cache has a 5% miss rate. On a cache miss, the processor stalls for 60 ns to access main memory, then resumes normal operation. Taking cache misses into account,...
Assume a cache with a write-through policy, non-write allocate. Your cache has a miss rate of 5%. There is a 150 cycle miss penalty. Additionally, it takes an extra 30 cycles to do a write. Your program has a base CPI of 1, is 20% loads, and 5% stores. What will its CPI be?
Question 4 (10 pt). One difference between a write-through cache and a write-back cache can be in the time it takes to write. During the first cycle, we detect whether a hit will occur, and during the second (assuming a hit) we actually write the data. Let’s assume that 50% of the blocks are dirty for a write-back cache. For this question, assume that the write buffer for the write through will never stall the CPU (no penalty). Assume a...
Assume an memory hierarchy with unified data and instruction memories, miss rate equal to 15%, miss penalty equal to 90 cycles, 25% Load/Store instructions, TLB miss ratio per TLB access equal to 6% and TLB miss penalty equal to 80 cycles. What is the realistic CPI of this system if the ideal CPI is 1.5? What is the speedup compared to not having TLB? What would be the speedup if the TLB could hold every entry?
1. (10 points) Suppose you have a load-store computer with the following instruction mix Operation Frequency Number of clock cycles ALU ops Loads Stores Branches 40 % 20 % 18% 22 % 4 4 The ALU ops (arithmetic logic unit ops) typically use operands in CPU registers and hence they take fewer clock cycles to execute. However, if you want to add a memory operand to a CPU register, then you would have to explicitly load it into a CPU...
Consider two different implementations, M1 and M2, of the same
instruction set. There are three classes of instructions (A, B, and
C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has
a clock rate of 100 MHz. The average number of cycles for each
instruction class and their frequencies (for a typical program) are
as follows:
(a) Calculate the average CPI for each machine, M1, and M2.
(b) Calculate the average MIPS ratings for...
A particular (fictional) CPU has the following internal units and timings: 1. IFD: Instruction fetch + decode : 160 ps 2. RR: Register read 80 ps 3. ALU: 240 ps 4. MA : memory access: 160 ps (assuming cache) 5. RW : register write : 80 ps There are 5 basic instruction types: 1. LOAD : IFD+RR+ALU+MA+RW 720 ps 2. STORE: IFD+RR+ALU+MA : 640 ps 3. ARITHMETIC: IFD+RR+ALU+RW : 560 4. BRANCH: IFD+RR+ALU : 480 ps 5. MEMOP: IFD+RR+MA+ALU+MA :...