Hi, I have problem related to data mining, as follow:
Best Regards
For frequent itemset mining we can use either apriori algorithm or fp growth algorithm.Here i am taking apriori algorithm to solve this problem.......
Below shows the table containg itemsets,
TID ID | date | items |
T100 | 10/15/2019 | A,B,C,D,G |
1200 | 10/16/2019 | D,A,C,E,B |
T300 | 10/18/2019 | C,A,B,E,D |
T400 | 10/19/2019 | B,A,D |
T500 | 10/20/2019 | G,A,C,D |
T600 | 10/21/2019 | A,C,G |
T700 | 10/22/2019 | A,G |
T800 | 10/24/2019 | D,E |
Before going to the steps ,we have to find the minimum support count and confidence
min_support=30% and min_conf=60%(given in question)
Convert this percentage value to a number.For this,divide the support by 100 and multiply it by the number of transactions.ie,
So min_support count=2.4
Step 1:
(I) Create a table which shows each item present in the table and their support count(number of times each item occur in the table).this is called candidate set C1.
item | support |
{A} | 7 |
{B} | 4 |
{C} | 5 |
{D} | 6 |
{E} | 3 |
{G} | 4 |
(II) compare each item’s support count with minimum support count(here 2.4). Remove items which does not satisfy min support count. This will gives us a table with itemset.that is called L1. Here in our example all items satisfies min support count.Therefor L1 is
item | support |
{A} | 7 |
{B} | 4 |
{C} | 5 |
{D} | 6 |
{E} | 3 |
{G} | 4 |
Step-2: K=2
item | support |
{A,B} | 4 |
{A,C} | 5 |
{A,D} | 5 |
{A,E} | 2 |
{A,G} | 4 |
{B,C} | 3 |
{B,D} |
4 |
{B,E} | 2 |
{B,G} | 1 |
{C,D} | 4 |
{C,E} | 2 |
{C,G} | 3 |
{D,E} | 3 |
{D,G} | 2 |
{E,G} | 0 |
(II) Check C2 candidate set for minimum support count.Remove items which does not satisfy min support count. This will gives us a table with itemset.that is called L2. So our L2 is
item | support |
{A,B} | 4 |
{A,C} | 5 |
{A,D} | 5 |
{A,G} | 4 |
{B,C} | 3 |
{B,D} |
4 |
{C,D} | 4 |
{C,G} | 3 |
{D,E} | 3 |
Step-3: K=3
item | support |
{A,B,C} | 3 |
{A,B,D} | 4 |
{A,C,D} | 4 |
{A,C,G} | 3 |
{A,D,G} | 2 |
{B,C,D} | 3 |
{C,D,G} | 2 |
(II) Check C3 candidate set for minimum support count.Remove items which does not satisfy min support count. This will gives us a table with itemset.that is called L3. So our L3 is
item | support |
{A,B,C} | 3 |
{A,B,D} | 4 |
{A,C,D} | 4 |
{A,C,G} | 3 |
{B,C,D} | 3 |
Step-4: K=4
Here we stop ,because eventhough there are many combinations are possible with 4 itemses,they or their subsets are not frequent.For example
{A,B,C,D},{A,B,C,E},{A,B,C,G},{A,B,,D,E},{A,B,D,G},{A,C,D,E},{A,C,D,G},{B,C,D,E},{B,C,D,G},{C,D,E,G} 4 data itemsets .But when we consider each of these ,their subsets are not frequent..
So frequent itemsets are
{A,B,C} |
{A,B,D} |
{A,C,D} |
{A,C,G} |
{B,C,D} |
ii) asscociative rules are
confidence of a rule A->B IS divding support value by occurance of A,ie support/occurance of A in transaction
Associative rule | support count | confidence | confidence % |
A^B=>C | 3 | 3/4 | 75 |
A^C=>B | 3 | 3/5 | 60 |
B^C=>A | 3 | 3/3 | 100 |
A^B=>D | 4 | 4/5 | 200 |
A^D=>B | 4 | 4/5 | 80 |
B^D=>A | 4 | 4/4 | 100 |
A^C=>D | 4 | 4/5 | 80 |
A^D=>C | 4 | 4/5 | 80 |
D^C=>A | 4 | 4/4 | 100 |
A^C=>G | 3 | 3/5 | 60 |
A^G=>C | 3 | 3/4 | 75 |
C^G=>A | 3 | 3/3 | 100 |
B^C=>D | 3 | 3/3 | 100 |
B^D=>C | 3 | 3/4 | 75 |
D^C=>B | 3 | 3/4 | 75 |
A=>B^C | 3 | 3/7 | 75 |
B=>A^C | 3 | 3/4 | 42.8 |
C=>A^B | 3 | 3/5 | 60 |
A=>B^D | 4 | 4/7 | 57 |
B=>A^D | 4 | 4/4 | 100 |
D=>A^B | 4 | 4/6 | 133 |
A=>C^D | 4 | 4/7 | 66.6 |
C=>A^D | 4 | 4/5 | 80 |
D=>A^C | 4 | 4/6 | 66.6 |
A=>C^G | 3 | 3/7 | 42.8 |
C=>A^G | 3 | 3/5 | 60 |
G>A^C | 3 | 3/4 | 75 |
B=>C^D | 3 | 3/4 | 75 |
C=>B^D | 3 | 3/5 | 60 |
D=>B^C | 3 | 3/6 | 50 |
in the given question minimum confidence=60%
So we can take all rules which satisfies confidence % 60 or above from the above table
Hi, I have problem related to data mining, as follow: Best Regards Assume a small database...
(1)A database has five transactions (T100 to T500) as shown in the table below. Let min sup-3 and mi-conf-8090. TID T100 M, O, N, K, E, Y T200 D, O, N, K, E, Y ) T300{M, A, K, E) T400 M, U, C, K, Y) T500 | {C, О. О. К. 1 ,E) items bought Find all the frequent itemset晜using Apriori algorithm. You must show the contents of Ck and Lk tables in each step (please refer to your lecture...
Consider the transactional database shown in the following table. Transaction ID Items Bought T100 Plum, Apple, Peach, Orange, Pear, Banana T200 Cherry, Apple, Peach, Orange, Pear, Banana T300 Plum, Mango, Orange, Pear, Kiwi, Strawberry T400 Plum, Watermelon, Avocado, Orange, Banana T500 Avocado, Apple, Orange, Lemon, Pear CONDITION: The minimum support is 60% and minimum confidence is 70%. Based on the CONDITION above, answer the following five questions. (1) Find all frequent itemsets using the Apriori algorithm. Show how the algorithm...