ISSN ONLINE(2320-9801) PRINT (2320-9798)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

AN EFFICIENT APPROACH FOR THE IMPLEMENTATION OF FP TREE

SADHANA KODALI1, KAMALAKAR M2, CH. GAYATRI 3, K.PRAVALLIKA4
  1. Assistant professor, Department of IT, Lendi institute of Engineering and Technology, Vizianagaram, AndhraPradesh,India
  2. Assistant Professor, Department of CSE, Raghu Engineering College, Visakhapatnam, Andhra Pradesh,India
  3. 10KD1A1212, IV/IV B Tech, Department of IT, Lendi institute of Engineering and Technology, Andhra Pradesh, India
  4. 10KD1A1239, IV/IV B Tech, Department of IT, Lendi institute of Engineering and Technology, Andhra Pradesh, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

Frequent pattern mining is one of the most common mining techniques to identify the frequent patterns in large data sets. The Apriori algorithm is one algorithm very efficient for mining frequent patterns. But the drawback is it generates a number of candidate item sets. The FP tree is a frequent pattern technique without candidate item set generation. We have proposed an approach for improving the performance of the FP tree using the parameter average support count.

Keywords

Frequent pattern mining, Apriori algorithm, FP tree, candidate item sets, average support count.

INTRODUCTION

Mining frequent patterns in transactional databases is a popular field of study in data mining. Apriori algorithm [1] is a traditional algorithm for finding the frequent patterns using some statistical measures such as support count and confidence. But the drawback of the Apriori algorithm is repeated database scanning and pruning of infrequent candidate item sets. Generating of the candidate item sets and pruning of the infrequent items is cost effective if large data sets are considered. An approach was proposed for mining frequent patterns without candidate generation [5] and an efficient FPtree based mining method, FP-growth was developed, for mining the complete set of frequent patterns by pattern fragment growth. [2].In our paper we would like to improve the efficiency of this algorithm by proposing an approach to still reduce the infrequent candidate generation and improve the space and time complexity of the algorithm. The algorithm efficient FP tree has the following major steps:
-Scan the database for the first pass and prune the item sets with infrequency using a threshold value avgsup. [3].
-Now the large database is condensed to a smaller data structure called the FP tree.
-Partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space.

PROCESSING THE DATABASE

The first step is to preprocess the database before applying the algorithm. We take an example dataset to which we apply the formulas derived in [3].We compare this with the normal way of generating the fp tree and show the difference between these two approaches. We proposed a measure called as avgsup which is obtained by calculating the total support counts of n items and calculating the average of these items. This approach is much better for pruning of the infrequent items. The reason for this is, all the infrequent items fall below this threshold value called the avgsup. If there are {x1, x2, x3…. xk} items the avgsup is calculated using [3]
image
The example dataset is taken in the following table:
The individual support counts of the items in the transaction are given in Table 2:
Step 1: Calculation of average support:
image
Avgsup= (4+6+6+2+4+1)/6
The support count of each item divided by the number of items. The value of avgsup for our example would be 3.5 for which we consider the floor value. The floor value of the avgsup denoted as ? avgsup?is given as 3.
Step-2: Pruning of the transactional database with avgsup count less than 3.We get the following items considered as the most frequent items
After removing the list of transactions which do not satisfy the threshold avgsup we get the above table.

CONSTRUCTION OF FP-TREE

An fp tree is constructed for the items in Table 3 using the normal approach for the construction of fp tree.
The same data is used to normally generate the fp tree. Figure: 2 represents the normal way of construction of the fp tree [6].
Observing the tree in figure 1 we can say it is an optimal way of implementing the fp tree. The tree in figure 1 clearly reduces the space complexity when compared with the normal way of generating the fp tree.
From this we can understand that items d and f are infrequent and those item-sets with the infrequent items become less frequent and the elimination of such items from the transactional database for the construction of the fp tree will not affect the frequent pattern.

EFFICIENCY OF THE PROPOSED APPROACH

Using the same transactional database we apply Apriori algorithm and compare our results of the efficient fp tree with that of the existing Apriori algorithm.The apriori algorithm follows the steps below for the generation of frequent item-sets.
Step1: Apriori uses breadth-first search and a Hash tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k-1.
Step2: Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent k-length item sets. [4]
Step3: After that, it scans the transaction database to determine frequent item sets among the candidates.
Observing the figure 1 and figure 3, we have the same frequent 3 item-sets. In our proposed approach efficient fp tree we obtained the frequent 3-item-sets only with one scan. But with the existing technique Apriori algorithm we need to scan the database four to five times and apply the pruning step to eliminate the infrequent item sets.
CONCLUSION
From the above we can observe that the efficiency of the proposed approach is more when compared with the existing techniques. The approach proposed by us will improve the time and space complexity. The approach proposed compresses the dataset and requires only one scan and two passes for the identification of frequent itemsets.

Tables at a glance

Table icon Table icon Table icon
Table 1 Table 2 Table 3

Figures at a glance

Figure 1 Figure 2 Figure 3 Figure 4 Figure 5
Figure 1 Figure 2 Figure 3 Figure 4 Figure 5

References







/td>