王伟, 储泽楠, 韩毅, 吴朝霞, 焦清局. 基于MapReduce的Apriori前后项约束关联规则改进算法[J]. 信阳师范学院学报(自然科学版), 2020, 33(3): 448-453. DOI: 10.3969/j.issn.1003-0972.2020.03.019
引用本文: 王伟, 储泽楠, 韩毅, 吴朝霞, 焦清局. 基于MapReduce的Apriori前后项约束关联规则改进算法[J]. 信阳师范学院学报(自然科学版), 2020, 33(3): 448-453. DOI: 10.3969/j.issn.1003-0972.2020.03.019
WANG Wei, CHU Zenan, HAN Yi, WU Zhaoxia, JIAO Qingju. Improved Algorithm for Association Rules of Apriori Before and After Items Based on MapReduce[J]. Journal of Xinyang Normal University (Natural Science Edition), 2020, 33(3): 448-453. DOI: 10.3969/j.issn.1003-0972.2020.03.019
Citation: WANG Wei, CHU Zenan, HAN Yi, WU Zhaoxia, JIAO Qingju. Improved Algorithm for Association Rules of Apriori Before and After Items Based on MapReduce[J]. Journal of Xinyang Normal University (Natural Science Edition), 2020, 33(3): 448-453. DOI: 10.3969/j.issn.1003-0972.2020.03.019

基于MapReduce的Apriori前后项约束关联规则改进算法

Improved Algorithm for Association Rules of Apriori Before and After Items Based on MapReduce

  • 摘要: 针对经典的Apriori算法依赖内存,只适用于小规模数据集,在面对海量数据集时显得无能为力以及该算法没有考虑用户的需求情况等问题,提出了基于MapReduce的Apriori前后项约束关联规则改进算法.该方法首先对经典Apriori算法挖掘过程进行了改进,加入了用户的前后项约束规则,使得在挖掘过程中剪枝的程度更大并且获取到更加精准的规则.然后利用云计算的MapReduce编程技术,对改进的Apriori算法的各个步骤并行化.实验结果表明,改进的算法在处理不同的数据集时有一定的优势,然后经过MapReduce模型并行化后,提高了对海量数据的处理能力和效率,并且具有良好的扩展性.

     

    Abstract: Aiming at the memory dependence of the classic Apriori algorithm, it is only suitable for small-scale datasets, it seems to be powerless in the face of massive datasets, and the algorithm does not consider the user's needs.The improved algorithm of Apriori pre-term constraint association rules based on MapReduce is proposed. Firstly, the method of the classic Apriori algorithm mining process is improved, and the user's pre-and post-item constraint rules are added, which makes the pruning degree more in the mining process and obtains more precise rules. Then, using the MapReduce programming technology of cloud computing, the steps of the improved Apriori algorithm are parallelized. The experimental results show that the improved algorithm has certain advantages in dealing with different data sets. After parallelization by MapReduce model, it improves the processing ability and efficiency of massive data and has good scalability.

     

/

返回文章
返回