Problem

Exercise 15.9 Consider the following scenario:Emp(eid: integer, sal: integer, age: real, d...

Exercise 15.9 Consider the following scenario:

Emp(eid: integer, sal: integer, age: real, did: integer)
Dept(did: integer, projid: integer, budget: real, status: char(10))
Proj(projid: integer, code: integer, report: varchar)

Assume that each Emp record is 20 bytes long, each Dept record is 40 bytes long, and each Proj record is 2000 bytes long on average. There are 20,000 tuples in Emp, 5000 tuples in Dept (note that did is not a key), and 1000 tuples in Proj. Each department, identified by did, has 10 projects on average. The file system supports 4000 byte pages, and 12 buffer pages are available. All following questions are based on this information. You can assume uniform distribution of values. State any additional assumptions. The cost metric to use is the number of page I/Os. Ignore the cost of writing out the final result.

  1. Consider the following two queries: “Find all employees with age = 30” and “Find all projects with code = 20.” Assume that the number of qualifying tuples is the same in each case. If you are building indexes on the selected attributes to speed up these queries, for which query is a clustered index (in comparison to an unclustered index) more important?
  2. Consider the following query: “Find all employees with age > 30.” Assume that there is an unclustered index on age. Let the number of qualifying tuples be N. For what values of N is a sequential scan cheaper than using the index?
  3. Consider the following query:
    SELECT *FROM Emp E, Dept DWHERE E.did=D.did
    1. Suppose that there is a clustered hash index on did on Emp. List all the plans that are considered and identify the plan with the lowest estimated cost.
    2. Assume that both relations are sorted on the join column. List all the plans that are considered and show the plan with the lowest estimated cost.
    3. Suppose that there is a clustered B+ tree index on did on Emp and Dept is sorted on did. List all the plans that are considered and identify the plan with the lowest estimated cost.
  4. Consider the following query:
    SELECT D.did, COUNT(*)FROM Dept D, Proj PWHERE D.projid=P.projidGROUP BY D.did
    1. Suppose that no indexes are available. Show the plan with the lowest estimated cost.
    2. If there is a hash index on P.projid what is the plan with lowest estimated cost?
    3. If there is a hash index on D.projid what is the plan with lowest estimated cost?
    4. If there is a hash index on D.projid and P.projid what is the plan with lowest estimated cost?
    5. Suppose that there is a clustered B+ tree index on D.did and a hash index on P.projid. Show the plan with the lowest estimated cost.
    6. Suppose that there is a clustered B+ tree index on D.did, a hash index on D.projid, and a hash index on P.projid. Show the plan with the lowest estimated cost.
    7. Suppose that there is a clustered B+ tree index on (D.did, D.projid) and a hash index on P.projid. Show the plan with the lowest estimated cost.
    8. Suppose that there is a clustered B+ tree index on (D.projid, D.did) and a hash index on P.projid. Show the plan with the lowest estimated cost.
  5. Consider the following query:
    SELECT D.did, COUNT(*)FROM Dept D, Proj PWHERE D.projid=P.projid AND D.budget>99000GROUP BY D.did

    Assume that department budgets are uniformly distributed in the range 0 to 100,000.
    1. Show the plan with lowest estimated cost if no indexes are available.
    2. If there is a hash index on P.projid show the plan with lowest estimated cost.
    3. If there is a hash index on D.budget show the plan with lowest estimated cost.
    4. If there is a hash index on D.projid and D.budget show the plan with lowest estimated cost.
    5. Suppose that there is a clustered B+ tree index on 〈D.did,D.budget〉 and a hash index on P.projid. Show the plan with the lowest estimated cost.
    6. Suppose there is a clustered B+ tree index on D.did, a hash index on D. budget, and a hash index on P.projid. Show the plan with the lowest estimated cost.
    7. Suppose there is a clustered B+ tree index on 〈D.did, D.budget, D.projid〉 and a hash index on P.projid. Show the plan with the lowest estimated cost.
    8. Suppose there is a clustered B+ tree index on 〈D.did, D.projid, D.budget〉 and a hash index on P.projid. Show the plan with the lowest estimated cost.
  6. Consider the following query:
    SELECT E.eid, D.did, P.projidFROM Emp E, Dept D, Proj PWHERE E.sal=50,000 AND D.budget>20,000      E.did=D.did AND D.projid=P.projid

    Assume that employee salaries are uniformly distributed in the range 10,009 to 110,008 and that project budgets are uniformly distributed in the range 10,000 to 30,000. There is a clustered index on sal for Emp, a clustered index on did for Dept, and a clustered index on projid for Proj.
    1. List all the one-relation, two-relation, and three-relation subplans considered in optimizing this query.
    2. Show the plan with the lowest estimated cost for this query.
    3. If the index on Proj were unclustered, would the cost of the preceding plan change substantially? What if the index on Emp or on Dept were unclustered?

Step-by-Step Solution

Request Professional Solution

Request Solution!

We need at least 10 more requests to produce the solution.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the solution will be notified once they are available.
Add your Solution
Textbook Solutions and Answers Search
Solutions For Problems in Chapter 15