Hi, I have problem related to data mining, as follow: Best Regards Assume a small database...

Question

Question

Hi, I have problem related to data mining, as follow:

Assume a small database contains eight transactions as shown in Table 1. Let min_support=30% and min_conf=60%. (a) Find all f

Best Regards

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

For frequent itemset mining we can use either apriori algorithm or fp growth algorithm.Here i am taking apriori algorithm to solve this problem.......

Below shows the table containg itemsets,

TID ID	date	items
T100	10/15/2019	A,B,C,D,G
1200	10/16/2019	D,A,C,E,B
T300	10/18/2019	C,A,B,E,D
T400	10/19/2019	B,A,D
T500	10/20/2019	G,A,C,D
T600	10/21/2019	A,C,G
T700	10/22/2019	A,G
T800	10/24/2019	D,E

Before going to the steps ,we have to find the minimum support count and confidence

min_support=30% and min_conf=60%(given in question)

Convert this percentage value to a number.For this,divide the support by 100 and multiply it by the number of transactions.ie,

$\frac{30}{100}*8 = 2.4$

So min_support count=2.4

Step 1:

(I) Create a table which shows each item present in the table and their support count(number of times each item occur in the table).this is called candidate set C1.

item	support
{A}	7
{B}	4
{C}	5
{D}	6
{E}	3
{G}	4

(II) compare each item’s support count with minimum support count(here 2.4). Remove items which does not satisfy min support count. This will gives us a table with itemset.that is called L1. Here in our example all items satisfies min support count.Therefor L1 is

item	support
{A}	7
{B}	4
{C}	5
{D}	6
{E}	3
{G}	4

Step-2: K=2

Create a candidate set C2 using L1 . This is all possible item set with 2 items from L1
then check if the subsets of an itemset are frequent or not .If it is not frequent ,then remove that item.
then find the support count of these itemsets.

item	support
{A,B}	4
{A,C}	5
{A,D}	5
{A,E}	2
{A,G}	4
{B,C}	3
{B,D}	4
{B,E}	2
{B,G}	1
{C,D}	4
{C,E}	2
{C,G}	3
{D,E}	3
{D,G}	2
{E,G}	0

(II) Check C2 candidate set for minimum support count.Remove items which does not satisfy min support count. This will gives us a table with itemset.that is called L2. So our L2 is

item	support
{A,B}	4
{A,C}	5
{A,D}	5
{A,G}	4
{B,C}	3
{B,D}	4
{C,D}	4
{C,G}	3
{D,E}	3

Step-3: K=3

Create a candidate set C3 using L2 . This is all possible item set with 3 items from L2
then check if the subsets of an itemset are frequent or not .If it is not frequent ,then remove that item.
then find the support count of these itemsets.

item	support
{A,B,C}	3
{A,B,D}	4
{A,C,D}	4
{A,C,G}	3
{A,D,G}	2
{B,C,D}	3
{C,D,G}	2

(II) Check C3 candidate set for minimum support count.Remove items which does not satisfy min support count. This will gives us a table with itemset.that is called L3. So our L3 is

item	support
{A,B,C}	3
{A,B,D}	4
{A,C,D}	4
{A,C,G}	3
{B,C,D}	3

Step-4: K=4

Create a candidate set C4 using L3 . This is all possible item set with 4 items from L3
then check if the subsets of an itemset are frequent or not .If it is not frequent ,then remove that item.
then find the support count of these itemsets.

Here we stop ,because eventhough there are many combinations are possible with 4 itemses,they or their subsets are not frequent.For example

{A,B,C,D},{A,B,C,E},{A,B,C,G},{A,B,,D,E},{A,B,D,G},{A,C,D,E},{A,C,D,G},{B,C,D,E},{B,C,D,G},{C,D,E,G} 4 data itemsets .But when we consider each of these ,their subsets are not frequent..

So frequent itemsets are

{A,B,C}

{A,B,D}

{A,C,D}

{A,C,G}

{B,C,D}

ii) asscociative rules are

confidence of a rule A->B IS divding support value by occurance of A,ie support/occurance of A in transaction

Associative rule	support count	confidence	confidence %
A^B=>C	3	3/4	75
A^C=>B	3	3/5	60
B^C=>A	3	3/3	100
A^B=>D	4	4/5	200
A^D=>B	4	4/5	80

B^D=>A	4	4/4	100
A^C=>D	4	4/5	80
A^D=>C	4	4/5	80
D^C=>A	4	4/4	100
A^C=>G	3	3/5	60
A^G=>C	3	3/4	75
C^G=>A	3	3/3	100
B^C=>D	3	3/3	100
B^D=>C	3	3/4	75
D^C=>B	3	3/4	75
A=>B^C	3	3/7	75
B=>A^C	3	3/4	42.8
C=>A^B	3	3/5	60
A=>B^D	4	4/7	57
B=>A^D	4	4/4	100
D=>A^B	4	4/6	133
A=>C^D	4	4/7	66.6
C=>A^D	4	4/5	80
D=>A^C	4	4/6	66.6
A=>C^G	3	3/7	42.8
C=>A^G	3	3/5	60
G>A^C	3	3/4	75
B=>C^D	3	3/4	75
C=>B^D	3	3/5	60
D=>B^C	3	3/6	50

in the given question minimum confidence=60%

So we can take all rules which satisfies confidence % 60 or above from the above table

Add a comment

Answer 2

Hi, I have problem related to data mining, as follow: Best Regards Assume a small database...

Homework Answers

Add Answer to:
Hi, I have problem related to data mining, as follow: Best Regards Assume a small database...

Post as a guest

Earn Coins

(1)A database has five transactions (T100 to T500) as shown in the table below. Let min sup-3 and...

Consider the transactional database shown in the following table. Transaction ID Items Bought T100 Plum, Apple,...

Hi, I have problem related to data mining, as follow: Best Regards Assume a small database...

Homework Answers

Add Answer to: Hi, I have problem related to data mining, as follow: Best Regards Assume a small database...

Post as a guest

Earn Coins

(1)A database has five transactions (T100 to T500) as shown in the table below. Let min sup-3 and...

Consider the transactional database shown in the following table. Transaction ID Items Bought T100 Plum, Apple,...

Add Answer to:
Hi, I have problem related to data mining, as follow: Best Regards Assume a small database...