Question

explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Data Mining also known as Knowledge Discovery in Databases, refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data stored in databases.

Steps Involved in KDD Process:

BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)

  • It is a scalable clustering method.
  • Designed for very large data sets
  • Only one scan of data is necessary
  • It is based on the notation of CF (Clustering Feature) a CF Tree.
  • CF tree is a height balanced tree that stores the clustering features for a hierarchical clustering.
  • Cluster of data points is represented by a triple of numbers (N,LS,SS) Where

    N= Number of items in the sub cluster

    LS=Linear sum of the points

    SS=sum of the squared of the points

A CF Tree structure is given as below:

  • Each non-leaf node has at most B entries.
  • Each leaf node has at most L CF entries which satisfy threshold T, a maximum diameter of radius
  • P(page size in bytes) is the maximum size of a node
  • Compact: each leaf node is a subcluster, not a data point

KDD Process of SEER Databse using Birch algorithm

SEER Dataset represents cancer incidents and mortality rates from various countries across the world.

we can create the clusters using Birch Algorithm. because this algorithm works for huge database.

Data Cleaning :

As it contains the different fields for various countries we can apply the cleaning and Integration process on the data set.

Data Selection and Transformation:

Once the cleaning process is over. apply the Selection of data and remove the use less fields from the data set.

Once the selection of data is over then should start transformation process as it deals with various countries. since the reasons and other fields may not be same in all the countries. we should apply the transformation of data.

After the transformation we need apply the Birch algorithm to create clusters.

Add a comment
Know the answer?
Add Answer to:
explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT