SOLUTION: The arithmetic intensity of the kernel is 1 In the given code, it reads 4 floats and writes 2 floats for every 6 FLOPs, therefore1 ts and writes 2 loats for every 6 FLOPs,therefore.
The VMIP assembly code for the loop using strip mining is shownbelow: 1i 1i ŞVL, 44 Şrl, 0 again mulvv. s $v2, a im+5r1 mulvv. s subvv. s ŞV5, ŞV5, ŞV6 SV mulvv. s $v5, Cim-ST1 SV bne addi TI, 0,else $r1, $r1, #44 again STI,STI, #64 Şr1,300,again else: skip: addi blt
mulvv.s 1v lv subvv.S SV mulvv.s 1v mulvv.s lv addvv.s SV mulvv.S Therefore, there are 6 chimes are required
Here, Load/store unit has a start-up overhead of 15 cycles Multiply unit has 8 cycles Add/subtract unit has 5 cycles Total cyclesperinstruction (CPI Total cydesperinstruction(CPI)-6 chimes x64elements +15x6+8x4+5x2 J =516 516 64 cycles per result = Thus, the clock cycles are required per complex result value, including overhead if vector sequence is chained is, 8.