Consider the transactional database shown in the following
table.
Transaction ID Items Bought
T100 Plum, Apple, Peach, Orange, Pear, Banana
T200 Cherry, Apple, Peach, Orange, Pear, Banana
T300 Plum, Mango, Orange, Pear, Kiwi, Strawberry
T400 Plum, Watermelon, Avocado, Orange, Banana
T500 Avocado, Apple, Orange, Lemon, Pear
CONDITION: The minimum support is 60% and minimum confidence is
70%.
Based on the CONDITION above, answer the following five
questions.
(1) Find all frequent itemsets using the Apriori algorithm. Show
how the algorithm works in a
step by step manner.
(2) List all the association rules found by the Apriori
algorithm.
(3) Find all frequent itemsets using the FP-tree algorithm. Show
the final FP-tree you
constructed. Note that the FP-tree algorithm must have a
pre-processing step, which sorts items
in a transaction based on the support values of the items. If
several items have the same support
value, the pre-processing step sorts these items based on the
alphabetical (lexicographical) order
of the items.
(4) List all the association rules found by the FP-tree
algorithm.
(5) In this example, indicate whether the association rules
produced by the Apriori algorithm are
the same as those produced by the FP-tree algorithm.
Remove the CONDITION above, and then answer the following ten
questions.
(6) What is the maximum number of association rules that can be
extracted from this database
(including rules that have zero support)?
(7) What is the maximum size of frequent itemsets that can be
extracted (assuming minsup > 0)?
(8) Write an expression for the maximum number of size-3 itemsets
that can be derived from this
database (including size-3 itemsets that have zero support).
(9) Write an expression for the maximum number of size-2 itemsets
that can be derived from this
database (including size-2 itemsets that have zero support).
(10) Write an expression for the maximum number of size-5 itemsets
that can be derived from
this database (including size-5 itemsets that have zero
support).
(11) Find an itemset (of size 2 or larger) that has the largest
support.
(12) Find an itemset (of any size) that has the largest
support.
(13) Find a pair of items, X and Y, such that the rules {X} —>
{Y} and {Y} —> {X} have the
same confidence value (if there are no such items X and Y, indicate
NONE).
(14) Find a pair of items, X and Y, such that the rule {X} —>
{Y} has a larger confidence value
than the rule {Y} —> {X} (if there are no such items X and Y,
indicate NONE).
(15) Find a pair of items, X and Y, such that the rule {X} —>
{Y} has a smaller confidence
value than the rule {Y} —> {X} (if there are no such items X and
Y, indicate NONE).
Answer:
Apriori Algorithm is used to mine the frequent items in the transactional Databases.
It is based on the prior knowledge of frequent itemset properties.
It is level-wise iterative search, where we need to find out the n itemsets,which is used to explore the n+1 itemsets in the algorithm.
Two important properties of Apriori Algorithm reduces the time complexity of this algorithm are defined as:
1) Downward closure property: Any subset of a frequent itemset must be frequent.
2) Apriori Pruning property: If any subset of an itemset is infrequent, then its superset should not be tested (antimonotonicity)
Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.
Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}.
Part1:
Transaction_id |
List_of_items |
T100 |
Plum, Apple, Peach, Orange, Pear, Banana |
T200 |
Cherry, Apple, Peach, Orange, Pear, Banana |
T300 |
Plum, Mango, Orange, Pear, Kiwi, Strawberry |
T400 |
Plum, Watermelon, Avocado, Orange, Banana |
T500 |
Avocado, Apple, Orange, Lemon, Pear |
ITERATION1: Generate a Candidate table say C1:
1 itemset
Itemset |
Frequency (No. of transactions) |
{Plum} {Apple} {Peach} {Orange} {pear} {Banana} {Cherry} {Mango} {Kiwi} {Strawberry} {Watermelon} {Avocado} {Lemon} |
3 3 2 5 4 3 2 1 1 1 2 1 |
Now in your problem the minimum support is 60%, e.g: 3/5=0.6(fulfills your minimum count criteria). Where 3 is the frequency of plum and 5 is the total no. of transactions in your database.
L1: The itemset which satisfies the minimum count.
Itemset |
Support_count |
{Plum} |
3 |
{Apple} |
3 |
{Orange} |
5 |
{pear} |
4 |
{Banana} |
3 |
Iteration-2:
Now generate candidate set C2, or 2-frequent itemset by generating the combination of each itemset with other. Scan your database and again checks the support count for each combination.
{Plum,apple} |
1 |
{Plum,orange} |
3 |
{plum,pear} |
1 |
{plum,Banana} |
2 |
{Apple,orange} |
3 |
{apple,pear} |
3 |
{apple,banana} |
2 |
{orange,pear} |
4 |
{orange,banana} |
3 |
{pear,banana} |
2 |
From the above iteration make Level L2, by eliminating all the items which does'nt stisfy the minimum support count.
{Plum,orange} |
3 |
{Apple,orange} |
3 |
{apple,pear} |
3 |
{orange,pear} |
4 |
{orange,banana} |
3 |
Iteration3
Now generate candidate set C3, or 3-frequent itemset by generating the combination of each itemset with other. Then Scan your database and again checks the support count for each combination.
*eliminate repeat items
*Apply property 1 of this algorithm in this step,i-e, by comparing all the subsets of each itemset in L2. Therefore, it reduces the time complexity.
Here i mentioned NO for there is no subset of the corresponding itemset in L2 AND YEES vice-versa.
{Plum,orange,apple}(NO) |
|
{Plum,orange,pear}(NO) |
|
{Plum,orange,banana}(NO) |
|
{Apple,orange,pear}(YES) |
3 |
{Apple,orange,banana}(NO) |
|
{orange,pear,banana}(NO) |
|
L3:
{Apple,orange,pear}(YES) |
3 |
ITERATION C4:
{plum ,orange,apple,pear}(YES) |
null |
Here your algorithm is terminated because the subset of C4 doesn't appear in l3
The item set {Apple,orange,pear} has been found by this algorithm.
Part2:
Now to find association rule, the confidence measure is used.
C(X--->Y)= SUPPORT (X U Y)/SUPPORT(X)
Making the ASSOCIATION Rule for this itemset{Apple,orange,pear} found by the Apriori algorithm:
X Y
{ Apple,orange }--->pear = 3/3=100%
{apple,pear}---->orange =3/3=100%
{orange,pear}---->apple =3/4= 75%
apple---> {orange,pear}= 3/3=100%
orange--->{apple,pear}=3/5=60% (REJECT THIS BECAUSE IT IS LESS THAN MINIMUM CONFIDENCE)
pear---->{ Apple,orange }=3/4=75%
Calculate the confidence for each of the rule mentioned above. In your question the minimum confidence is 70%.
Therefore from the above calculation of confidence, the strong rules are:
{ Apple,orange }--->pear = 3/3=100%
{apple,pear}---->orange =3/3=100%
{orange,pear}---->apple =3/4= 75%
apple---> {orange,pear}= 3/3=100%
pear---->{ Apple,orange }=3/4=75%
Consider the transactional database shown in the following table. Transaction ID Items Bought T100 Plum, Apple,...
Table 1: Data set of market-basket transactions ansaction ID Items Bought [A, B, D, E (B, C, D (A, B, D, E) A, C, D, E) (B,C, D, E B, D, E (C, D) (A, B, C (A, D, E) 6 7 [15 points] Answer the following questions for the data set in Table 1. (a) What is the maximum number of association rules that can be extracted from this data set (including rules that have zero support)? (b) What...
Suppose that a large store has a transaction database that is distributed among four locations. Transactions in each component database have the same format, namely Tj: {i1; …; im}, where Tjis a transaction identifier and ik(1<=k <= m) is the identifier of an item purchased in the transaction. Propose an efficient algorithm to mine global association rules. You may present your algorithm in the form of an outline. Your algorithm should not require shipping all of the data to one...
(1)A database has five transactions (T100 to T500) as shown in the table below. Let min sup-3 and mi-conf-8090. TID T100 M, O, N, K, E, Y T200 D, O, N, K, E, Y ) T300{M, A, K, E) T400 M, U, C, K, Y) T500 | {C, О. О. К. 1 ,E) items bought Find all the frequent itemset晜using Apriori algorithm. You must show the contents of Ck and Lk tables in each step (please refer to your lecture...
A database has five transactions, where each letter, such as "M", represents an item: T_1 = {M, O, N, K, E, Y}; T_2 = {D, O, N, K, E, Y}; T_3 = {M, A, K, E}; T_4 = {M, U, C, K, Y}; T_5 = {C, O, K, I, E}; Let min-sup = 60% and min-conf = 80%, then Find all frequent itemsets; List all of the strong association rules (with support s greaterthanorequalto min-sup and c greaterthanorequalto min-conf) matching...