Question

Consider the transactional database shown in the following table. Transaction ID Items Bought T100 Plum, Apple,...

Consider the transactional database shown in the following table.
Transaction ID Items Bought
T100 Plum, Apple, Peach, Orange, Pear, Banana
T200 Cherry, Apple, Peach, Orange, Pear, Banana
T300 Plum, Mango, Orange, Pear, Kiwi, Strawberry
T400 Plum, Watermelon, Avocado, Orange, Banana
T500 Avocado, Apple, Orange, Lemon, Pear
CONDITION: The minimum support is 60% and minimum confidence is 70%.
Based on the CONDITION above, answer the following five questions.
(1) Find all frequent itemsets using the Apriori algorithm. Show how the algorithm works in a
step by step manner.
(2) List all the association rules found by the Apriori algorithm.
(3) Find all frequent itemsets using the FP-tree algorithm. Show the final FP-tree you
constructed. Note that the FP-tree algorithm must have a pre-processing step, which sorts items
in a transaction based on the support values of the items. If several items have the same support
value, the pre-processing step sorts these items based on the alphabetical (lexicographical) order
of the items.
(4) List all the association rules found by the FP-tree algorithm.
(5) In this example, indicate whether the association rules produced by the Apriori algorithm are
the same as those produced by the FP-tree algorithm.

Remove the CONDITION above, and then answer the following ten questions.
(6) What is the maximum number of association rules that can be extracted from this database
(including rules that have zero support)?
(7) What is the maximum size of frequent itemsets that can be extracted (assuming minsup > 0)?
(8) Write an expression for the maximum number of size-3 itemsets that can be derived from this
database (including size-3 itemsets that have zero support).
(9) Write an expression for the maximum number of size-2 itemsets that can be derived from this
database (including size-2 itemsets that have zero support).
(10) Write an expression for the maximum number of size-5 itemsets that can be derived from
this database (including size-5 itemsets that have zero support).
(11) Find an itemset (of size 2 or larger) that has the largest support.
(12) Find an itemset (of any size) that has the largest support.
(13) Find a pair of items, X and Y, such that the rules {X} —> {Y} and {Y} —> {X} have the
same confidence value (if there are no such items X and Y, indicate NONE).
(14) Find a pair of items, X and Y, such that the rule {X} —> {Y} has a larger confidence value
than the rule {Y} —> {X} (if there are no such items X and Y, indicate NONE).
(15) Find a pair of items, X and Y, such that the rule {X} —> {Y} has a smaller confidence
value than the rule {Y} —> {X} (if there are no such items X and Y, indicate NONE).

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Answer:

Apriori Algorithm is used to mine the frequent items in the transactional Databases.

It is based on the prior knowledge of frequent itemset properties.

It is level-wise iterative search, where we need to find out the n itemsets,which is used to explore the n+1 itemsets in the algorithm.

Two important properties of Apriori Algorithm reduces the time complexity of this algorithm are defined as:

1) Downward closure property: Any subset of a frequent itemset must be frequent.

2) Apriori Pruning property: If any subset of an itemset is infrequent, then its superset should not be tested (antimonotonicity)

Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}.

Part1:

Transaction_id

List_of_items

T100

Plum, Apple, Peach, Orange, Pear, Banana

T200

Cherry, Apple, Peach, Orange, Pear, Banana

T300

Plum, Mango, Orange, Pear, Kiwi, Strawberry

T400

Plum, Watermelon, Avocado, Orange, Banana

T500

Avocado, Apple, Orange, Lemon, Pear

ITERATION1: Generate a Candidate table say C1:

1 itemset

                      Itemset

Frequency (No. of transactions)

{Plum}

{Apple}

{Peach}

{Orange}

{pear}

{Banana}

{Cherry}

{Mango}

{Kiwi}

{Strawberry}

{Watermelon}

{Avocado}

{Lemon}

3

3

2

5

4

3

2

1

1

1

2

1

Now in your problem the minimum support is 60%, e.g: 3/5=0.6(fulfills your minimum count criteria). Where 3 is the frequency of plum and 5 is the total no. of transactions in your database.

L1: The itemset which satisfies the minimum count.

                                                  Itemset

Support_count

{Plum}

3

{Apple}

3

{Orange}

5

{pear}

4

{Banana}

3

Iteration-2:

Now generate candidate set C2, or 2-frequent itemset by generating the combination of each itemset with other. Scan your database and again checks the support count for each combination.

{Plum,apple}

1

{Plum,orange}

3

{plum,pear}

1

{plum,Banana}

2

{Apple,orange}

3

{apple,pear}

3

{apple,banana}

2

{orange,pear}

4

{orange,banana}

3

{pear,banana}

2

From the above iteration make Level L2, by eliminating all the items which does'nt stisfy the minimum support count.

{Plum,orange}

3

{Apple,orange}

3

{apple,pear}

3

{orange,pear}

4

{orange,banana}

3

Iteration3

Now generate candidate set C3, or 3-frequent itemset by generating the combination of each itemset with other. Then Scan your database and again checks the support count for each combination.

*eliminate repeat items

*Apply property 1 of this algorithm in this step,i-e, by comparing all the subsets of each itemset in L2. Therefore, it reduces the time complexity.

Here i mentioned NO for there is no subset of the corresponding itemset in L2 AND YEES vice-versa.

{Plum,orange,apple}(NO)

{Plum,orange,pear}(NO)

{Plum,orange,banana}(NO)

{Apple,orange,pear}(YES)

3

{Apple,orange,banana}(NO)

{orange,pear,banana}(NO)

L3:

{Apple,orange,pear}(YES)

3

ITERATION C4:

{plum ,orange,apple,pear}(YES)

null

Here your algorithm is terminated because the subset of C4 doesn't appear in l3

The item set {Apple,orange,pear} has been found by this algorithm.

Part2:

Now to find association rule, the confidence measure is used.

C(X--->Y)= SUPPORT (X U Y)/SUPPORT(X)

Making the ASSOCIATION Rule for this itemset{Apple,orange,pear} found by the Apriori algorithm:

          X                 Y

{ Apple,orange }--->pear = 3/3=100%

{apple,pear}---->orange =3/3=100%

{orange,pear}---->apple =3/4= 75%

apple---> {orange,pear}= 3/3=100%

orange--->{apple,pear}=3/5=60% (REJECT THIS BECAUSE IT IS LESS THAN MINIMUM CONFIDENCE)

pear---->{ Apple,orange }=3/4=75%

Calculate the confidence for each of the rule mentioned above. In your question the minimum confidence is 70%.

Therefore from the above calculation of confidence, the strong rules are:

{ Apple,orange }--->pear = 3/3=100%

{apple,pear}---->orange =3/3=100%

{orange,pear}---->apple =3/4= 75%

apple---> {orange,pear}= 3/3=100%

pear---->{ Apple,orange }=3/4=75%

Add a comment
Know the answer?
Add Answer to:
Consider the transactional database shown in the following table. Transaction ID Items Bought T100 Plum, Apple,...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Table 1: Data set of market-basket transactions ansaction ID Items Bought [A, B, D, E (B, C, D (A...

    Table 1: Data set of market-basket transactions ansaction ID Items Bought [A, B, D, E (B, C, D (A, B, D, E) A, C, D, E) (B,C, D, E B, D, E (C, D) (A, B, C (A, D, E) 6 7 [15 points] Answer the following questions for the data set in Table 1. (a) What is the maximum number of association rules that can be extracted from this data set (including rules that have zero support)? (b) What...

  • Suppose that a large store has a transaction database that is distributed among four locations. Transactions...

    Suppose that a large store has a transaction database that is distributed among four locations. Transactions in each component database have the same format, namely Tj: {i1; …; im}, where Tjis a transaction identifier and ik(1<=k <= m) is the identifier of an item purchased in the transaction. Propose an efficient algorithm to mine global association rules. You may present your algorithm in the form of an outline. Your algorithm should not require shipping all of the data to one...

  • (1)A database has five transactions (T100 to T500) as shown in the table below. Let min sup-3 and...

    (1)A database has five transactions (T100 to T500) as shown in the table below. Let min sup-3 and mi-conf-8090. TID T100 M, O, N, K, E, Y T200 D, O, N, K, E, Y ) T300{M, A, K, E) T400 M, U, C, K, Y) T500 | {C, О. О. К. 1 ,E) items bought Find all the frequent itemset晜using Apriori algorithm. You must show the contents of Ck and Lk tables in each step (please refer to your lecture...

  • A database has five transactions, where each letter, such as "M", represents an item: T_1 =...

    A database has five transactions, where each letter, such as "M", represents an item: T_1 = {M, O, N, K, E, Y}; T_2 = {D, O, N, K, E, Y}; T_3 = {M, A, K, E}; T_4 = {M, U, C, K, Y}; T_5 = {C, O, K, I, E}; Let min-sup = 60% and min-conf = 80%, then Find all frequent itemsets; List all of the strong association rules (with support s greaterthanorequalto min-sup and c greaterthanorequalto min-conf) matching...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT