研究目的
To anonymize transactional data with a small information loss and reduce the computational complexity of the anonymization process while ensuring k-anonymity.
研究成果
The proposed PTA system effectively anonymizes transactional data with minimal information loss and reduced computational complexity. It outperforms existing algorithms in terms of runtime and information loss, making it a viable solution for privacy-preserving data mining.
研究不足
The anonymization process may still introduce some information loss, and the choice of center points directly influences this loss. The system's performance is dependent on the number of segments and the value of k, which may require tuning for optimal results.
1:Experimental Design and Method Selection:
The PTA system is designed with three modules (Pre-processing, TSP, and Anonymity) to anonymize transactional data. The Pre-processing module encodes transactions as bitmaps and sorts them using the Gray order. The TSP module applies a Traveling Salesman Problem solving approach to each segment to find a cyclical loop between transactions. The Anonymization module groups similar transactions into groups and replaces them with their center point to achieve k-anonymity.
2:Sample Selection and Data Sources:
Experiments were conducted on five real-world datasets (chess, mushroom, pumsb, connect, accidents) and a synthetic dataset (T10I4D100K).
3:List of Experimental Equipment and Materials:
A personal computer equipped with an Intel Core i3-4160 dual-core processor and 4 GB of RAM, running the 32-bit Microsoft Windows 7 operating system.
4:Experimental Procedures and Operational Workflow:
The PTA system processes each dataset by first encoding and sorting transactions, then partitioning them into segments, finding the shortest cyclical path for each segment, and finally anonymizing the data by grouping and replacing transactions with their center points.
5:Data Analysis Methods:
The performance of the PTA system was compared with state-of-the-art algorithms (Gray-TSP and GSC) in terms of runtime and information loss for various values of k and number of segments.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容