The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all...

Question

Question

The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all...

The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all memory accesses take 1 clock cycle.

a) What are the three types of pipeline hazards?

b) What is the control (branch) hazard of an instruction pipeline? Provide three branch prediction alternatives to reduce branch hazards.

c) What is the data forwarding scheme used to reduce the data hazard?

d) Give all the forwarding paths of the five-stage DLX pipeline, including sources, destinations, and information transferred.

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

There are situations, called hazards, that prevent the next instruction in the instruction stream from being executing during its designated clock cycle. Hazards reduce the performance from the ideal speedup gained by pipelining.

There are three classes of hazards:

Structural Hazards. They arise from resource conflicts when the hardware cannot support all possible combinations of instructions in simultaneous overlapped execution.
Data Hazards. They arise when an instruction depends on the result of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline.
Control Hazards.They arise from the pipelining of branches and other instructions that change the PC.

Hazards in pipelines can make it necessary to stall the pipeline. The processor can stall on different events:

A cache miss. A cache miss stalls all the instructions on pipeline both before and after the instruction causing the miss.
A hazard in pipeline. Eliminating a hazard often requires that some instructions in the pipeline to be allowed to proceed while others are delayed. When the instruction is stalled, all the instructions issued later than the stalled instruction are also stalled. Instructions issued earlier than the stalled instruction must continue, since otherwise the hazard will never clear.

In case of structural hazards:

Clock cycle number
Instr	1	2	3	4	5	6	7	8	9	10
Instr i	IF	ID	EX	MEM	WB
Instr i+1		IF	ID	EX	MEM	WB
Instr i+2			IF	ID	EX	MEM	WB
Stall				bubble	bubble	bubble	bubble	bubble
Instr i+3					IF	ID	EX	MEM	WB
Instr i+4						IF	ID	EX	MEM	WB

A stall causes the pipeline performance to degrade the ideal performance.

Average instruction time unpipelined

Speedup from pipelining = ----------------------------------------

Average instruction time pipelined

CPI _unpipelined * Clock Cycle Time _unpipelined

= -------------------------------------

CPI _pipelined * Clock Cycle Time _pipelined

The ideal CPI on a pipelined machine is almost always 1. Hence, the pipelined CPI is

CPI_pipelined = Ideal CPI + Pipeline stall clock cycles per instruction

= 1 + Pipeline stall clock cycles per instruction

If we ignore the cycle time overhead of pipelining and assume the stages are all perfectly balanced, then the cycle time of the two machines are equal and

CPI _unpipelined

Speedup = ----------------------------

1+ Pipeline stall cycles per instruction

If all instructions take the same number of cycles, which must also equal the number of pipeline stages ( the depth of the pipeline) then unpipelined CPI is equal to the depth of the pipeline, leading to

Pipeline depth

Speedup = --------------------------

1 + Pipeline stall cycles per instruction

If there are no pipeline stalls, this leads to the intuitive result that pipelining can improve performance by the depth of pipeline.

Structural Hazards

When a machine is pipelined, the overlapped execution of instructions requires pipelining of functional units and duplication of resources to allow all posible combinations of instructions in the pipeline.
If some combination of instructions cannot be accommodated because of a resource conflict, the machine is said to have a structural hazard.

Common instances of structural hazards arise when

Some functional unit is not fully pipelined. Then a sequence of instructions using that unpipelined unit cannot proceed at the rate of one per clock cycle
Some resource has not been duplicated enough to allow all combinations of instructions in the pipeline to execute.

Example1:
a machine may have only one register-file write port, but in some cases the pipeline might want to perform two writes in a clock cycle.

Example2:
a machine has shared a single-memory pipeline for data and instructions. As a result, when an instruction contains a data-memory reference(load), it will conflict with the instruction reference for a later instruction (instr 3):

Clock cycle number
Instr	1	2	3	4	5	6	7	8
Load	IF	ID	EX	MEM	WB
Instr 1		IF	ID	EX	MEM	WB
Instr 2			IF	ID	EX	MEM	WB
Instr 3				IF	ID	EX	MEM	WB

To resolve this, we stall the pipeline for one clock cycle when a data-memory access occurs. The effect of the stall is actually to occupy the resources for that instruction slot. The following table shows how the stalls are actually implemented.

Clock cycle number
Instr	1	2	3	4	5	6	7	8	9
Load	IF	ID	EX	MEM	WB
Instr 1		IF	ID	EX	MEM	WB
Instr 2			IF	ID	EX	MEM	WB
Stall				bubble	bubble	bubble	bubble	bubble
Instr 3					IF	ID	EX	MEM	WB

Instruction 1 assumed not to be data-memory reference (load or store), otherwise Instruction 3 cannot start execution for the same reason as above.

To simplify the picture it is also commonly shown like this:

Clock cycle number
Instr	1	2	3	4	5	6	7	8	9
Load	IF	ID	EX	MEM	WB
Instr 1		IF	ID	EX	MEM	WB
Instr 2			IF	ID	EX	MEM	WB
Instr 3				stall	IF	ID	EX	MEM	WB

Introducing stalls degrades performance as we saw before. Why, then, would the designer allow structural hazards? There are two reasons:

To reduce cost. For example, machines that support both an instruction and a cache access every cycle (to prevent the structural hazard of the above example) require at least twice as much total memory.
To reduce the latency of the unit. The shorter latency comes from the lack of pipeline registers that introduce overhead

Data Hazards

A major effect of pipelining is to change the relative timing of instructions by overlapping their execution. This introduces data and control hazards. Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequentially executing instructions on the unpipelined machine.

Consider the pipelined execution of these instructions:

		1	2	3	4	5	6	7	8	9
ADD	R1, R2, R3	IF	ID	EX	MEM	WB
SUB	R4, R5, R1		IF	ID_sub	EX	MEM	WB
AND	R6, R1, R7			IF	ID_and	EX	MEM	WB
OR	R8, R1, R9				IF	ID_or	EX	MEM	WB
XOR	R10,R1,R11					IF	ID_xor	EX	MEM	WB

All the instructions after the ADD use the result of the ADD instruction (in R1). The ADD instruction writes the value of R1 in the WB stage (shown black), and the SUB instruction reads the value during ID stage (ID_sub). This problem is called a data hazard. Unless precautions are taken to prevent it, the SUB instruction will read the wrong value and try to use it.

The AND instruction is also affected by this data hazard. The write of R1 does not complete until the end of cycle 5 (shown black). Thus, the AND instruction that reads the registers during cycle 4 (ID_and) will receive the wrong result.

The OR instruction can be made to operate without incurring a hazard by a simple implementation technique. The technique is to perform register file reads in the second half of the cycle, and writes in the first half. Because both WB for ADD and ID_or for OR are performed in one cycle 5, the write to register file by ADD will perform in the first half of the cycle, and the read of registers by OR will perform in the second half of the cycle.

The XOR instruction operates properly, because its register read occur in cycle 6 after the register write by ADD.

The next page discusses forwarding, a technique to eliminate the stalls for the hazard involving the SUB and AND instructions.

We will also classify the data hazards and consider the cases when stalls can not be eliminated. We will see what compiler can do to schedule the pipeline to avoid stalls

Forwarding

The problem with data hazards, introduced by this sequence of instructions can be solved with a simple hardware technique called forwarding.

		1	2	3	4	5	6	7
ADD	R1, R2, R3	IF	ID	EX	MEM	WB
SUB	R4, R5, R1		IF	ID_sub	EX	MEM	WB
AND	R6, R1, R7			IF	ID_and	EX	MEM	WB

The key insight in forwarding is that the result is not really needed by SUB until after the ADD actually produces it. The only problem is to make it available for SUB when it needs it.

If the result can be moved from where the ADD produces it (EX/MEM register), to where the SUB needs it (ALU input latch), then the need for a stall can be avoided.
Using this observation , forwarding works as follows:

The ALU result from the EX/MEM register is always fed back to the ALU input latches.
If the forwarding hardware detects that the previous ALU operation has written the register corresponding to the source for the current ALU operation, control logic selects the forwarded result as the ALU input rather than the value read from the register file.

Forwarding of results to the ALU requires the additional of three extra inputs on each ALU multiplexer and the addtion of three paths to the new inputs.

The paths correspond to a forwarding of:
(a) the ALU output at the end of EX,
(b) the ALU output at the end of MEM, and
(c) the memory output at the end of MEM.

Without forwarding our example will execute correctly with stalls:

		1	2	3	4	5	6	7	8	9
ADD	R1, R2, R3	IF	ID	EX	MEM	WB
SUB	R4, R5, R1		IF	stall	stall	ID_sub	EX	MEM	WB
AND	R6, R1, R7			stall	stall	IF	ID_and	EX	MEM	WB

As our example shows, we need to forward results not only from the immediately previous instruction, but possibly from an instruction that started three cycles earlier. Forwarding can be arranged from MEM/WB latch to ALU input also. Using those forwarding paths the code sequence can be executed without stalls:

		1	2	3	4	5	6	7
ADD	R1, R2, R3	IF	ID	EX_add	MEM_add	WB
SUB	R4, R5, R1		IF	ID	EX_sub	MEM	WB
AND	R6, R1, R7			IF	ID	EX_and	MEM	WB

The first forwarding is for value of R1 from EX_add to EX_sub .
The second forwarding is also for value of R1 from MEM_add to EX_and.
This code now can be executed without stalls.

Forwarding can be generalized to include passing the result directly to the functional unit that requires it: a result is forwarded from the output of one unit to the input of another, rather than just from the result of a unit to the input of the same unit.

One more Example
To prevent a stall in this example, we would need to forward the values of R1 and R4 from the pipeline registers to the ALU and data memory inputs.

		1	2	3	4	5	6	7
ADD	R1, R2, R3	IF	ID	EX_add	MEM_add	WB
LW	R4, d (R1)		IF	ID	EX_lw	MEM_lw	WB
SW	R4,12(R1)			IF	ID	EX_sw	MEM_sw	WB

Stores require an operand during MEM, and forwarding of that operand is shown here.
The first forwarding is for value of R1 from EX_add to EX_lw .
The second forwarding is also for value of R1 from MEM_add to EX_sw.
The third forwarding is for value of R4 from MEM_lw to MEM_sw.
Observe that the SW instruction is storing the value of R4 into a memory location computed by adding the displacement 12 to the value contained in register R1. This effective address computation is done in the ALU during the EX stage of the SW instruction. The value to be stored (R4 in this case) is needed only in the MEM stage as an input to Data Memory. Thus the value of R1 is forwarded to the EX stage for effective address computation and is needed earlier in time than the value of R4 which is forwarded to the input of Data Memory in the MEM stage.
So forwarding takes place from "left-to-right" in time, but operands are not ALWAYS forwarded to the EX stage - it depends on the instruction and the point in the Datapath where the operand is needed. Of course, hardware support is necessary to support data forwarding.

Add a comment

Answer 2

The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all...

Homework Answers

Add Answer to:
The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all...

Post as a guest

Earn Coins

I need help plz 6. 4-stage MIPS pipeline: It has "IF, ID-EX, MEM, WB" stages. Specify...

Consider a standard 5-stage MIPS pipeline of the type discussed during the class sessions: IF- ID-EX-M-WB....

c. The classic five-stage pipeline MIPS architecture is used to execute the code fragments. Assume the...

A 5-Stage pipeline is composed of the following stages Instruction Fetch (IF), Decode (DE), Execute (EX), Memory Access...

We found that the instruction fetch and memory stages are the critical path of our 5-stage...

Using a 5 stage pipeline architecture (IF-OF-EX-MEM-WB), use a drawing to indicate the stages used by...

The latencies of individual stages in five-stage MIPS (Microprocessor without Interlocked Pipeline Stages) Architecture are given...

The datapath for 5-stage MIPS pipelined architecture is given below. IFAID ID/EX EX/MEM MEM/WB > Add...

The datapath for 5-stage MIPS pipelined architecture is given below. IF/D ID/EX EX/MEM MEM/WB >Add Add...

1. Assume that individual stages of the datapath have the following latencies: IF ID EX MEM...

The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all...

Homework Answers

Add Answer to: The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all...

Post as a guest

Earn Coins

Add Answer to:
The basic pipeline for DLX has five stages: IF, ID, EX, MEM, and WB. Assuming all...