Question

A short program loop goes through a 16 kB array one word at a time, reads...

A short program loop goes through a 16 kB array one word at a time, reads a number from the array, adds a random number, and stores the result in the corresponding entry in another array that is located in the memory immediately following the first array. An outer loop repeats the above operation 100 times. The 64-bit processor, operating at a clock frequency of 4 GHz, is pipelined, has 48 address lines, three levels of caches with a 64 B block size. Each of the L1 caches has 512 sets, 2-way set-associativity, and an alternate cache replacement policy. The L2 cache is a 4-way set-associative 512 kB structure, whereas the L3 cache features 4 MB and 8-way set-associativity; L2 and L3 caches employ pseudo-LRU cache replacement policy. Write back and write allocate strategies are used in L2 and L3 caches, but simpler write hit and miss policies are used with L1 cache. The virtual address contains 52 bits plus 12 bits for security and PID/ASN, page size is 64 kB, and each of the page table caches contains 40 entries. Miss penalties for L1, L2 and L3 caches are 10, 20 and 50 cc, respectively. a. What is the size (in bytes) of each TLB? b. Compute the numbers of index, tag and block offset bits in each cache. c. Write the MIPS-64 assembly code to implement the problem described in the first sentence of this question. Assume that R30 comes up with a random number every time it is read. Also, assume that register R1 holds the address of the first byte of the source array. d. Explain the steps required for the processor to fetch and execute the first load instruction that reads the first element of the source array in the very first iteration of this program (note: this is not necessarily the first instruction in the program). Remember that some of the required information may not be available and so misses might result. Also, remember that the size of the displacement field in the instruction is limited. Assume that main memory always contains the needed information, whether any level of cache also has this or not. e. Calculate the number of accesses to TLBs, every cache, and main memory when this program executes. Calculate the number of misses in each of the storage structures when this program is executed. f. Calculate the time taken to execute this program in milliseconds. g. Every processor, regardless of its word size, has a byte addressible memory. Why? h. BONUS: How will the miss rates change if the size of each array is 128 kB?
0 0
Add a comment Improve this question Transcribed image text
Request Professional Answer

Request Answer!

We need at least 10 more requests to produce the answer.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the answer will be notified once they are available.
Know the answer?
Add Answer to:
A short program loop goes through a 16 kB array one word at a time, reads...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Similar Homework Help Questions
  • 1. Consider a program that can execute with no stalls and a CPI of 1 if...

    1. Consider a program that can execute with no stalls and a CPI of 1 if the underlying processor can somehow magically service every load instruction with a 1-cycle L1 cache hit. In practice, 5% of all load instructions suffer from an L1 cache miss, 2% of all load instructions suffer from an L2 cache miss, and 1% of all load instructions suffer from an L3 cache miss (and are serviced by the memory system). An L1 cache miss stalls...

  • I require an unplagiarised solution for the question below A. Consider the usage of critical word...

    I require an unplagiarised solution for the question below A. Consider the usage of critical word first and early restart on L2 cache misses. Assume a 1 MB L2 cache with 64 byte blocks and a refill path that is 16 bytes wide. Assume that the L2 can be written with 16 bytes every 4 processor cycles, the time to receive the first 16 byte block from the memory controller is 120 cycles, each additional 16 byte block from main...

  • 2. Cache hierarchy You are building a computer system with in-order execution that runs at 1...

    2. Cache hierarchy You are building a computer system with in-order execution that runs at 1 GHz and has a CPI of 1, with no memory accesses. The memory system is a split L1 cache. Both the I-cache and the D-cache are direct mapped and hold 32 KB each, with a block size of 64 bytes. The memory system is split L1 cache. Both the I-cache and the D-cache are direct mapped and hold 32 KB each, with a block...

  • Question 6 For the following figure shows a hypothetical memory hierarchy going from a virtual address to L2 cache acce...

    Question 6 For the following figure shows a hypothetical memory hierarchy going from a virtual address to L2 cache access. The page size is 8KB, the TLB is direct mapped with 128 entries. The L1 cache is a direct mapped 8 KB, and the L2 cache is 2MB and direct mapped. Both use 64 byte blocks. The virtual address is 64 bits and the physical address is 41 bits. For each block in the figure below, fill in the number...

  • please solve e & f for this question. 1. (40 points) The Corei7-6700K microprocessor has 3...

    please solve e & f for this question. 1. (40 points) The Corei7-6700K microprocessor has 3 cache levels about the L2 cache per core and the main memory: (more detailed information can be found at: http://www.cpu-world.com/CPUs/Core i7/Intel-Core%2017-6700.html The following is the information Memory is byte addressable Maximum memory capacity is 64 GB (G=230). Cache capacity is 256 KB (K=210) 4-way set associative is used - Block offset size is 6 bits. If CPU has generated the physical address (411234)10, what...

  • Consider a memory hierarchy using one of the three organization for main memory shown in a...

    Consider a memory hierarchy using one of the three organization for main memory shown in a figure below. Assume that the cache block size is 32 words, That the width of organization b is 4 words, and that the number of banks in organization c is 2. If the main memory latency for a new access is 10 cycles, sending address time is 1 cycle and the transfer time is 1 cycle, What are the miss penalties for each of...

  • Consider a 64-bit computer with a simplified memory hierarchy. This hierarchy contains a single cache and...

    Consider a 64-bit computer with a simplified memory hierarchy. This hierarchy contains a single cache and an unbounded backing memory. The cache has the following characteristics: • Direct-Mapped, Write-through, Write allocate. • Cache blocks are 4 words each. • The cache has 256 sets. (a) Calculate the cache’s size in bytes. (b) Consider the following code fragment in the C programming language to be run on the described computer. Assume that: program instructions are not stored in cache, arrays are...

  • Assume the cache can hold 64 kB. Data are transferred between main memory and the cache...

    Assume the cache can hold 64 kB. Data are transferred between main memory and the cache in blocks of 4 bytes each. This means that the cache is organized as 16K=2^14 lines of 4 bytes each. The main memory consists of 16 MB, with each byte directly addressable by a 24-bit address (2^24 =16M). Thus, for mapping purposes, we can consider main memory to consist of 4M blocks of 4 bytes each. Please show illustrations too for all work. Part...

  • A 2-way set associative cache consists of four sets 0, 1, 2, 3. The main memory...

    A 2-way set associative cache consists of four sets 0, 1, 2, 3. The main memory is word addressable (i.e. treat the memory as an array of words indexed by the address). It contains 2048 blocks 0 through 2047, and each block has eight words. (a) How many bits are needed to address the main memory? (b) Show how a main memory address will be translated into a tag, a set number, and an offset within a block. Illustrate this...

  • 1 Overview The goal of this assignment is to help you understand caches better. You are...

    1 Overview The goal of this assignment is to help you understand caches better. You are required to write a cache simulator using the C programming language. The programs have to run on iLab machines. We are providing real program memory traces as input to your cache simulator. The format and structure of the memory traces are described below. We will not give you improperly formatted files. You can assume all your input files will be in proper format as...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT