Question

Which of the caches/TLBs in a processor are in the critical path of instruction execution? How...

Which of the caches/TLBs in a processor are in the critical path of instruction execution? How does this impact on the design of these structures?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Modern desktop PC and server CPUs have three independent caches:

An Instruction cache which speeds the executable instruction fetch,

Data cache to speed up data fetch and store, and

A Translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable instructions and data.

Among these three data cache and TLB are present in the  critical path of instruction execution. As data gets fetched, decoded and executed.

Critical-path instructions, and their dependents, tend to get stalled in the instruction queue, and often become the oldest (bottom) instruction in the queue at some point. The columns labeled “IQ latency” and “Oldest in IQ” in Figure 1 show for each instruction, the average number of cycles spent in the instruction queue, and at the bottom of the instruction queue, respectively. These numbers correlate well with the critical path.

caches modeled are more modest, to compensate for the relatively small memory footprint of most of our benchmarks. The fetch unit can fetch up to 16 instructions per cycle from up to three basic blocks per cycle. This simulates the behavior of an effective trace cache.

As this effects the performance of the processor RAM has to increase inorder to provide more space to get the instructions to be executed. As processors increase their ability to exploit ILP in the instruction stream, application performance becomes more tied to the execution of the critical dependence path. Optimizations that accelerate critical path execution will have an increasingly large advantage

Add a comment
Know the answer?
Add Answer to:
Which of the caches/TLBs in a processor are in the critical path of instruction execution? How...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • A non-pipelined processor has a clock rate of 1 GHz and an average instruction takes 9...

    A non-pipelined processor has a clock rate of 1 GHz and an average instruction takes 9 cycles to execute. The manufacturer has decided to design a pipelined version of this processor. For this purpose, the instruction cycle has been divided into five stages with the following latencies: Stage 1 – 2.0 ns,Stage 2 – 1.5 ns, Stage 3 – 1.0 ns, Stage 4 – 2.6 ns, Stage 5 – 1.9 ns. Each stage will require an extra 0.4 ns for...

  • For pipelined execution, there are multiple instructions on the pipeline for concurrent execution. How the control...

    For pipelined execution, there are multiple instructions on the pipeline for concurrent execution. How the control unit is designed inside the processor? Choose one most appropriate answer below. Use multiple control units each controlling the execution of one instruction. Control signals are generated at ID stage, and propagates via pipeline buffers to next stage(s) along with instruction execution. Control signals are pre-stored in the pipeline buffers. When an instruction reaches to a certain stage, it will use the signals stored...

  • Given 3 different instruction types, A, B and C. Each type-A, B and C instruction takes...

    Given 3 different instruction types, A, B and C. Each type-A, B and C instruction takes 30ns, 20ns and 50ns to complete, respectively. An assembly program is written with 20 type-A, 30 type-B and 40 type-C instructions. Assume a single-issue not pipelined processor, how much time (in nano-seconds) is required to complete the execution of this program? Now let us pipeline these instructions based on a cycle time of 10ns. To pipeline these instructions equally and ideally using this cycle...

  • A processor is designed such that the clock of the processor runs at 1 GHz. The following table gives the instruction frequencies for the benchmark and how many cycles each instruction takes. Inst...

    A processor is designed such that the clock of the processor runs at 1 GHz. The following table gives the instruction frequencies for the benchmark and how many cycles each instruction takes. Instruction Type Frequency Cycles Load & Stores 25% 10 cycles Arithmetic Instructions 65% 6 cycles Branch instructions 10% 4 cycles (a) Calculate the CPI for the above benchmark. (b) Suppose the amount of registers are doubled, such that clock cycle time increases by 40%. What is the new...

  • c. Performance: Company A's processor and Company B's processor are benchmarked on the same program. The...

    c. Performance: Company A's processor and Company B's processor are benchmarked on the same program. The CPI of the component instructions that are executed in the program are shown above for each processor. Additionally, the number of instructions of each type is provided. Company A 5 Branch CPI Load/Store CPI Arithmetic CPI 7 1 Company B 4 Branch CPI Load/Store CPI Arithmetic CPI 9 1 Program Instruction Count Branch Load/Store Arithmetic CPL 1.2x109 1.67x1010 1.5x1011 i. Assume Company A has...

  • Suppose this program is executed in a computer in which the clock rate for the processor is 1.6 MHz

    Figure 1: each block gives the number of different types of instructionsConsider a program with the execution flow shown in Figure 1. There are in total 3 types of instructions used in this program: Type 1 (in-processor calculation): execution rate as 1 per clock cycle; Type 2 (memory access): each instruction takes 2 clock cycles for execution; Type 3 (loop control): each instruction takes 2 clock cycles for jump into the loop block or 3 clock cycles for jump to the block after...

  • 3. Assume the processor data path show below. XE30 Add Add ALU result Shift left 2...

    3. Assume the processor data path show below. XE30 Add Add ALU result Shift left 2 RegDst Branch MemRead Instruction (31-26] RegSrc Control ALUOP Mem Write ALUSrc RegWrite PC Instruction (25-21) Read address Instruction (20-16] Instruction [31-0) Instruction instruction (15-11) memory Read register 1 Read data 1 Read register 2 Write Read register data 2 Write data Registers Zero ALU ALU result Read Address data OX OX3) 3x) Write Data data memory Instruction [15-0) 16 32 Sign- extend ALU control...

  • What is the critical path? Describe how the critical path is determined? What is slack (float)?...

    What is the critical path? Describe how the critical path is determined? What is slack (float)? What is total slack? What is free slack?

  • We found that the instruction fetch and memory stages are the critical path of our 5-stage...

    We found that the instruction fetch and memory stages are the critical path of our 5-stage pipelined MIPS CPU. Therefore, we changed the IF and MEM stages to take two cycles while increasing the clock rate. You can assume that the register file is written at the falling edge of the clock. Assume that no pipelining optimizations have been made, and that branch comparisons are made by the ALU. Here’s how our pipeline looks when executing two add instructions: Clock...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT