Question

Consider the C program shown below int i int a [1024 1024] int x = 0; for (iso; i 1024, i++) { x = x + a[i] + a[1024 * i]; Suppose the code above is executed on a system with a direct mapped 16 KB data cache with block size 4. Assume that the address of matrix starts at o and I and x are in registers and initially the cache is empty. a) b) What is the miss rate of the cache? Suggest method to reduce the miss rate for the cache? Recalculate the miss rate based on your suggestion

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Solution:

a is solved, please repost b.

Number of blocks in the cache = 16 KB/4 B = 4K

means we have 4*1024 = 4096 blocks in the cache

a)

Array size is 1024*1024 = 2^20

Initially, there will be a miss and when particular accessed element is not in the cache but after the access it will be placed in one of the 4096 blocks in the cache

the cache will start accessing elements 0, 1, 2, 3, .....1024

so these elements start to get stored in the cache along with a[1024 * i] will save the elements 1024, 2048, 4096, 8192, .... in the cache

at i = 0 one miss and one hit will be there

x + a[i] + a[1024* i]

as soon as data for a[0] is accessed for one the second one will get a hit

after that

from 1 to 1023 all cache misses

Total number of cache miss = 1023 + 1024 = 2047

Miss rate= 2047/4096

I hope this helps if you find any problem. Please comment below. Don't forget to give a thumbs up if you liked it. :)

Add a comment
Know the answer?
Add Answer to:
Consider the C program shown below int i int a [1024 1024] int x = 0;...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Cache performance The starting code would have: struct position { int x; int y; } int...

    Cache performance The starting code would have: struct position { int x; int y; } int N; struct position grid[N][N]; int totalX=0; int totalY=0; int i, j; //For X loop for( i = 0; i < N; i ++){ for( j = 0; j < N; j++){ totalX += grid[i][j].x; } } //For Y loop for( j = 0; j < N; j++){ for( i = 0; i < N; i++){ totalY += grid[i][j].y; } } Part I: This part...

  • 14 Points Bonus Question: Consider the following transpose routine: typedef int array (212): void transposel (array...

    14 Points Bonus Question: Consider the following transpose routine: typedef int array (212): void transposel (array dst, array sre) int i, j; for(i=0; i<2; i++) { for (-0; j 2; j++){ dst sre[BG); Assume that this code runs on a machine with the following properties: • sizeof(int) = 4 • The src array starts at address and the dst array starts at address 16 There is a single L1 cache that is direct-mapped, write through, write allocate, with a block...

  • Arrays A and B contain 1 K (1024) elements each. Each element is 4 bytes. The first element of A (A[O]) is stored at physical address 0x0000 4000. The first element of B (B[O]) is stored at physical...

    Arrays A and B contain 1 K (1024) elements each. Each element is 4 bytes. The first element of A (A[O]) is stored at physical address 0x0000 4000. The first element of B (B[O]) is stored at physical address 0x0001 8000. A physical address is 32 bits. Assume that only arrays A and B will be cached in the following fragment of code (i.e., the index i will be in a register): for i 1023; i 0; i-) Alil-i The...

  • Consider the following matrix transpose routines: typedef int array[4][4]; void transpose (array dst, array src) {...

    Consider the following matrix transpose routines: typedef int array[4][4]; void transpose (array dst, array src) {     int i, j;     for (i=0; i<4; i++) {    for (j=0; j<4; j++) {           dst[i][j] = src[i][j];    }     } } void transpose2 (array dst, array src) {     int i, j;     for (i=0; i<4; i++) {    for (j=0; j<4; j++) {           dst[j][i] = src[j][i];    }     } } Assume this code runs on...

  • If I have a problem set like so: Below is a list of 64-bit memory address...

    If I have a problem set like so: Below is a list of 64-bit memory address references, given as word addresses. 3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253 5.2.1 BLOCK SIZE: 1 word CACHE SIZE: 16 1-word blocks a) For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache...

  • Consider a 64-bit computer with a simplified memory hierarchy. This hierarchy contains a single cache and...

    Consider a 64-bit computer with a simplified memory hierarchy. This hierarchy contains a single cache and an unbounded backing memory. The cache has the following characteristics: • Direct-Mapped, Write-through, Write allocate. • Cache blocks are 4 words each. • The cache has 256 sets. (a) Calculate the cache’s size in bytes. (b) Consider the following code fragment in the C programming language to be run on the described computer. Assume that: program instructions are not stored in cache, arrays are...

  • Consider a memory hierarchy using one of the three organization for main memory shown in a...

    Consider a memory hierarchy using one of the three organization for main memory shown in a figure below. Assume that the cache block size is 32 words, That the width of organization b is 4 words, and that the number of banks in organization c is 2. If the main memory latency for a new access is 10 cycles, sending address time is 1 cycle and the transfer time is 1 cycle, What are the miss penalties for each of...

  • Given the following code sequence calculating a matrix norm. double c[96], a[96][96]; for (i=0; i<95; i=++...

    Given the following code sequence calculating a matrix norm. double c[96], a[96][96]; for (i=0; i<95; i=++ ) { c[i] = c[i] + a[i][i]*a[i][i+1]; } with a and c being arrays of double precision floating point numbers with each element being 8 bytes. No element of a or c is in the cache before executing this code. Please assume row-major ordering. Assuming that the cache is large enough to hold both a and c, and has a cache block size of...

  • A short program loop goes through a 16 kB array one word at a time, reads...

    A short program loop goes through a 16 kB array one word at a time, reads a number from the array, adds a random number, and stores the result in the corresponding entry in another array that is located in the memory immediately following the first array. An outer loop repeats the above operation 100 times. The 64-bit processor, operating at a clock frequency of 4 GHz, is pipelined, has 48 address lines, three levels of caches with a 64...

  • Assume you have: 32-bit addresses, 4KB Page size, 4MB Physical Memory Space, 4KB Cache with 4-way...

    Assume you have: 32-bit addresses, 4KB Page size, 4MB Physical Memory Space, 4KB Cache with 4-way set associative and LRU replacement, 32 Byte Cache block size, 4-entry fully associative TLB. A program to be run on this machine begins as follows:   double A[1024]; int i, j; double sum = 0; for( i = 0; i < 1024; i++ )       // first loop      A[i] = i; for( j = 0; j < 1024; j += 16 )   // second loop     ...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT