explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

Question

Question

explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Data Mining also known as Knowledge Discovery in Databases, refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data stored in databases.

Steps Involved in KDD Process:

BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)

It is a scalable clustering method.
Designed for very large data sets
Only one scan of data is necessary
It is based on the notation of CF (Clustering Feature) a CF Tree.
CF tree is a height balanced tree that stores the clustering features for a hierarchical clustering.
Cluster of data points is represented by a triple of numbers (N,LS,SS) Where

N= Number of items in the sub cluster

LS=Linear sum of the points

SS=sum of the squared of the points

A CF Tree structure is given as below:

Each non-leaf node has at most B entries.
Each leaf node has at most L CF entries which satisfy threshold T, a maximum diameter of radius
P(page size in bytes) is the maximum size of a node
Compact: each leaf node is a subcluster, not a data point

KDD Process of SEER Databse using Birch algorithm

SEER Dataset represents cancer incidents and mortality rates from various countries across the world.

we can create the clusters using Birch Algorithm. because this algorithm works for huge database.

Data Cleaning :

As it contains the different fields for various countries we can apply the cleaning and Integration process on the data set.

Data Selection and Transformation:

Once the cleaning process is over. apply the Selection of data and remove the use less fields from the data set.

Once the selection of data is over then should start transformation process as it deals with various countries. since the reasons and other fields may not be same in all the countries. we should apply the transformation of data.

After the transformation we need apply the Birch algorithm to create clusters.

Add a comment

Answer 2

Similar Homework Help Questions

Using the Project Management Book of Knowledge a a primary resource, please explain what the process...

Using the Project Management Book of Knowledge a a primary resource, please explain what the process group Communications Knowledge Area is and why it is important and what the best way to implement the project is.
Describe Hough Transform algorithm for line detection. Explain parameter space. Give the steps on the algorithm....

Describe Hough Transform algorithm for line detection. Explain parameter space. Give the steps on the algorithm. Suggest two implementations: (a) using edge point locations only, (b) using edge location and edge direction information.
Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps please.thank you

What is data mining? In your answer, address the following: (In your own words) Is it...

What is data mining? In your answer, address the following: (In your own words) Is it another fad? Out of the three pre-requisite data science skills (database management, statistics, and machine learning) which one(s) are most important to master? Explain how the evolution of database technology led to data mining. Describe the steps involved in data mining when viewed as a process of knowledge discovery.
What is knowledge engineering? List and briefly describe the steps of the AHP data mining process...

What is knowledge engineering? List and briefly describe the steps of the AHP data mining process with an example that is not in your study material.
Calculate and interpret the z-scores using R. Using the mtcars dataset in R. To complete the...

Calculate and interpret the z-scores using R. Using the mtcars dataset in R. To complete the assignment, follow the steps below: In the R console, type mtcars. This shows the whole dataset. Using R, calculate the mean and standard deviation for the variables: mpg, cyl, disp, hp, drat, wt, qsec, gear, carb. Using R, calculate the maximum values for the same variables in (b). Using R, calculate the z-scores for the maximum values. Interpret each z-score in (d). Is the...

Show the steps involved in calculating GCD(2095,200) using Euclidian algorithm.

Show the steps involved in calculating GCD(2095,200) using Euclidian algorithm.
Hi, I need help with this practice problem! Chapter 2 Algorithm Discovery and Design 64C FIGURE...

Hi, I need help with this practice problem! Chapter 2 Algorithm Discovery and Design 64C FIGURE 2.10 Get values for a and b If (either a 0orb 0) then Set the value of product to o se Set the value of count to 0 Set the value of product to 0 While (count <b) do Set the value of product to (product+ a) Set the value of count to (count + 1) End of loop Print the value of product...
Using the FP growth algorithm generate frequent patterns within the following dataset (use min_sup = 3)....

Using the FP growth algorithm generate frequent patterns within the following dataset (use min_sup = 3). TID Itemsets 1 Sausage, Milk, Bread, Yogurt, Beer 2 Milk, Beer, Juice 3 Milk, Bread, Juice 4 Soda, Sausage, Yogurt, Bread 5 Sausage, Milk, Bread, Fruit, Beer 6 Vegetable, Yogurt, Butter, Fruit, Milk 7 Sausage, Vegetable, Butter, Juice 8 Milk, Yogurt, Bread

Execute the steps below using the "AdventureWorksDW2016CTP3" database ----Create a SQL transaction that manages the scheduling...

Execute the steps below using the "AdventureWorksDW2016CTP3" database ----Create a SQL transaction that manages the scheduling of PTO/SICK Report as well as a periodic database backup. Here is the transaction query that I have but how do I add the periodic database backup part? INSERT INTO PTO_SICK_REPORT (EMP_ID, FIRST_NAME, LAST_NAME, VACCATION_HOURS, SICK_HOURS) SELECT EMP_ID, FIRST_NAME, LAST_NAME, VACCATION_HOURS, SICK_HOURS FROM EMPLOYEE; COMMIT; SELECT * FROM PTO_SICK_REPORT ;

explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

Homework Answers

Add Answer to:
explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

Post as a guest

Earn Coins

Using the Project Management Book of Knowledge a a primary resource, please explain what the process...

Describe Hough Transform algorithm for line detection. Explain parameter space. Give the steps on the algorithm....

Can someone explain LM algorithm, GDA algorithm, and BFGS Algorithm with example and detail steps...

What is data mining? In your answer, address the following: (In your own words) Is it...

What is knowledge engineering? List and briefly describe the steps of the AHP data mining process...

Calculate and interpret the z-scores using R. Using the mtcars dataset in R. To complete the...

Show the steps involved in calculating GCD(2095,200) using Euclidian algorithm.

Hi, I need help with this practice problem! Chapter 2 Algorithm Discovery and Design 64C FIGURE...

Using the FP growth algorithm generate frequent patterns within the following dataset (use min_sup = 3)....

Execute the steps below using the "AdventureWorksDW2016CTP3" database ----Create a SQL transaction that manages the scheduling...

explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

Homework Answers

Add Answer to: explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset

Post as a guest

Earn Coins

Add Answer to:
explain Knowledge discovery in database process with steps using BIRCh algorithm and SEER dataset