研究目的
To detect inadvertent sensitive data leaks caused by human mistakes in network traffic without revealing the sensitive data to the detection provider.
研究成果
The proposed privacy-preserving data-leak detection method effectively identifies inadvertent data leaks with high accuracy and low false positives. It enables safe delegation of detection operations to semi-honest providers without revealing sensitive data. Future work includes designing host-assisted mechanisms for large-scale organizations.
研究不足
The approach has limited power to capture heavily modified data leaks, may introduce false positives due to collisions, and requires continuous updates for dynamically changing sensitive data. It also faces challenges in detecting selective fragments leaks and encrypted traffic.
1:Experimental Design and Method Selection:
The methodology involves a privacy-preserving data-leak detection (DLD) solution using a special set of sensitive data digests. The approach includes preprocessing sensitive data to generate digests, releasing a subset of these digests to a DLD provider, monitoring network traffic, detecting potential leaks, reporting alerts, and post-processing to confirm real leaks.
2:Sample Selection and Data Sources:
The experiments use 20,000 personal financial records as sensitive data, network traffic from 20 users, and the Enron dataset (2.6 GB) to simulate various data-leak scenarios.
3:6 GB) to simulate various data-leak scenarios.
List of Experimental Equipment and Materials:
3. List of Experimental Equipment and Materials: The implementation is in Python, using VirtualBox for networking environment setup, with Windows 7 hosts and a Fedora gateway. The DLD server runs on Linux.
4:Experimental Procedures and Operational Workflow:
The process includes generating Rabin fingerprints from sensitive data, fuzzifying these fingerprints, monitoring network traffic, comparing traffic fingerprints with fuzzy fingerprints, and post-processing to identify real leaks.
5:Data Analysis Methods:
The detection accuracy is evaluated based on the sensitivity of packets, with thresholds set to minimize false positives and maximize true positives.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容