基于HDF5的多层次结构并行IO算法
Multilevel Structure Parallel IO Algorithm Based on HDF5
-
摘要: 针对大规模数据输入输出的应用场景,提出了一种基于层次存储格式HDF5(Hierarchical Data Format 5)的多层次并行IO(Input/Output)方案。该并行IO方案分为节点间和节点内两层:节点间以节点为单位IO数据并允许节点内部协同或独立工作,根据节点内部的工作方式分别提出了多层次并行IO算法和多层次哨兵并行IO算法,以有效提升IO效率并避免输出文件冗余。考虑异构计算和纯CPU计算两个典型应用场景,分别在曙光平台和Intel平台进行最大核数为4096、最大数据量为256G的多组实验。结果表明,多层次并行IO算法IO效率提高了1.97~25.87倍,多层次哨兵并行IO算法IO效率提高了6.53~9.36倍,且输出文件数量减少到多区并行IO算法的1/4和1/32。Abstract: A multi-level parallel IO (Input/Output) scheme based on Hierarchical Data Format (HDF5) was proposed for large-scale data input and output applications. The parallel IO scheme was divided into two layers: Inter-node IO data was taken as unit, intra-node IO data was allowed to work cooperatively or independently. According to the internal working mode of nodes, a multi-level parallel IO algorithm and a multi-level sentinel parallel IO algorithm were proposed respectively, which could effectively improve IO efficiency and avoid redundancy of output files. Considering the two typical application scenarios of heterogeneous computing and pure CPU computing, multi-group experiments with a maximum of 4096 cores and 256G data were carried out on Shuguang platform and Intel platform, respectively. The results showed that the IO efficiency of multi-level parallel IO algorithm was increased by 1.97~25.87 times. The IO efficiency of multi-level sentinel parallel IO algorithm was increased by 6.53~9.36 times, and the number of output files was reduced to 1/4 and 1/32 of the number of parallel IO algorithms.