Question

cs 614 data warehousing

Q 1.Explain steps to be performed in decision tree classification? Alsoexplain the

conditions where it needs to stop.Marks5

Q 2.Briefly explain the properties of k-means algorithmMarks5

Q 3.How a Self – Organizing Map (SOM) could be used to analyzedata in a data

warehouse? Explain briefly.Marks5

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Q1.  

A decision tree classifier is built in two phases:

A growth phase

A prune phase

The tree growing phase is an iterative process whichinvolvessplitting the data into progressively smaller subsets.Eachiteration considers the data in only one node. The firstiterationconsiders the root node that contains all the data.Subsequentiterations work on derivative nodes that will containsubsets ofthe data.

The process of pruning the initial tree (built in growthphase)consists of removing small, deep nodes of the tree resultingfromnoise contained in the training data, thus reducing the riskofoverfitting, and resulting in a more accurate classificationofunknown data.

Tree-building algorithms usually have several stoppingrules.These rules are usually based on several factors includingmaximumtree depth, minimum number of elements in a node consideredforsplitting, or its near equivalent, the minimum number ofelementsthat must be in a new node. One is to stop splittingbeforethe nodes are pure.

Q2.  

The k-means algorithm has the following important properties

(a) In its basic form, it works only on numeric values.

(b) It uses the Euclidean metric and hence the centroid of Cisthe mean of the points in C.

(c) It is efficient in processing large data sets.Thecomputational complexity of the algorithm is O(nkt), where n isthetotal number of objects, k is the number of clusters and t isthenumber of iterations. In clustering large data sets, thek-meansalgorithm is much faster than the hierarchicalclusteringalgorithms, whose general computational complexity isO(1.gif)

(d) It terminates at a local optimum.

Q3.

The SOM technique creates a two-dimensional mapfromn-dimensional input data. This map resembles a landscape inwhichit is possible to identify borders that define differentclusters..The self-organizing map describes a mapping from ahigherdimensional input space to a lower dimensional map space.SinceSelf-Organizing Maps (SOM) provide compact representation ofthedata distribution, efficient process monitoring can be performedinthe two-dimensional projection of the process variables.

SOM has the ability to do drill-down processing. Indrill-downprocessing data is arranged in layers so that access andanalysisof one layer can lead to another layer

SOM quickly relates documents. Once the analyst has examinedtheSOM and if you want to look at the document then direct accessofthe document is allowed.


answered by: computer_science
Add a comment
Know the answer?
Add Answer to:
cs 614 data warehousing
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • cs 614 data warehousing

    Q 2.After completing source system analysis, a source-to-targetspecifications (orsource-to-target data map) can facilitatedevelopers to create a quality ETL logic.You arerequired to draw a source-to-target data map using Kimballmethodtemplate for any sampledata.

  • The latest version of Weka is needed for these questions. Use Weka to complete classification task...

    The latest version of Weka is needed for these questions. Use Weka to complete classification task on "iris" data (total: 150 records). You will definitely need to download the data by yourself and likely pre-process the data before loading it to the Weka explorer. Complete and following tasks and also answer the questions. 1. Briefly explain how you pre-process the data so that you can successfully load it into Weka Explorer. 2. What is the minimum length and maximum length...

  • Answer all the question 1. Warehousing provides time and place utility for three types of inventory...

    Answer all the question 1. Warehousing provides time and place utility for three types of inventory (raw materials-semi finished items-stocks), allowing firms to provide customer service and optimize costs. Classify the factors affecting a company's decisions on the number of warehouses to be used, and give two examples. (6 marks) 2. The function of Warehousing aims typically to satisfy customers' needs and requirements while utilizing space, equipment, and labor effectively. Discuss the role of Warehouse operations in satisfying customers? (Minimum...

  • 1- What is Data Destruction? List and explain the Data Destruction CategoriesWhat is Data Destruction? List...

    1- What is Data Destruction? List and explain the Data Destruction CategoriesWhat is Data Destruction? List and explain the Data Destruction Categories 2- Most applications include metadata in the document properties. Define the metadata. Do you think it is important to remove metadata before distributing a document? Justify your answer. 3- List all of the fire accident classification classes. Find at least three different fire extinguishers (do not touch them) and document their location. What fire accident classification class they...

  • 2.(a). By considering a circuit containing a capacitor shown below, explain briefly why Ampere's law B.dr...

    2.(a). By considering a circuit containing a capacitor shown below, explain briefly why Ampere's law B.dr Ho,1(S,) needs to be modified to allow for time-varying fields. What modification is needed to correct the equation? [3] -Q I s, is a surface bounded by the curve C and cutting the wire. (b). The magnetic field in free space due to a monochromatic plane wave is of the form: B(x, y,z,t) B, cos(kz-ax) where Bo, k and ware constants. Write down the...

  • Given the training data in Question 1 below| (on buying RRSP8), predict the class of the...

    Given the training data in Question 1 below| (on buying RRSP8), predict the class of the following new example using k-nearest-neighbor classification fork = 5: sector = oil industry, income = medium, self-employed = yes, credit-rating fair. For distance measure, use the following similarity measure: similarity(tupleAtupleB)-4-.(w"S(ab/4), where S(ab) is 1 if parameter a equals parameter b and o otherwise The parameters atand biare either Sector, income, self-employed, or credit-rating. The weights wiare all 1, except for income, which is 2....

  • 1. List the steps involved in attribute sampling. 2. What is meant by "sampling risks" and...

    1. List the steps involved in attribute sampling. 2. What is meant by "sampling risks" and what is its impact on audit findings? 3. Identify and define the factors that affect the size of an attribute sample. 4. Explain how the purpose of statistical sampling in tests of monetary values differ from the purpose of statistical sampling in tests of control activities. B. Discuss a situation within Payroll or Accounts Payable where data analytics could be used by an Internal...

  • we have data of 10 stores: Column A contains years in business and column B contains...

    we have data of 10 stores: Column A contains years in business and column B contains inventory volume in thousands of dollars. Do problems a-e with the use of Excel. In addition to the answers, show all your solution, including which Excel functions you used, with all the parameters. a. Develop the estimated regression equation that could be used to estimate the inventory volume given the years in business. b. Interpret the coefficients of the regression equation. c. Predict the...

  • A. Provide your answers to the following questions regarding sampling: 1. List the steps involved in...

    A. Provide your answers to the following questions regarding sampling: 1. List the steps involved in attribute sampling. 2. What is meant by "sampling risks" and what is its impact on audit findings? 3. Identify and define the factors that affect the size of an attribute sample. 4. Explain how the purpose of statistical sampling in tests of monetary values differ from the purpose of statistical sampling in tests of control activities. B. Discuss a situation within Payroll or Accounts...

  • A. Provide your answers to the following questions regarding internal auditing and sampling 1. List the...

    A. Provide your answers to the following questions regarding internal auditing and sampling 1. List the steps involved in attribute sampling. 2. What is meant by "sampling risks" and what is its impact on audit findings? 3. Identify and define the factors that affect the size of an attribute sample. 4.  Explain how the purpose of statistical sampling in tests of monetary values differ from the purpose of statistical sampling in tests of control activities. B. Discuss a situation within Payroll...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT