Privacy-Preserving Data Mining Techniques in Big Data: Balancing Security and Usability
Main Article Content
Abstract
The exponential growth of big data across industries presents both opportunities and challenges, particularly regarding the protection of sensitive information while maintaining data utility. The problem lies in balancing privacy preservation with the ability to extract meaningful insights from large datasets, which are often vulnerable to re-identification, breaches, and misuse. Current privacy-preserving data mining (PPDM) techniques, such as anonymization, differential privacy, and cryptographic methods, provide important solutions but introduce trade-offs in terms of data utility, computational performance, and compliance with privacy regulations. The objective of this study is to evaluate these PPDM methods, focusing on their effectiveness in safeguarding privacy while minimizing the impact on data accuracy and system performance. Additionally, the study seeks to assess the compliance of these methods with legal frameworks such as GDPR and HIPAA, which impose strict data protection requirements. By conducting an exhaustive analysis with regard to privacy-utility trade-offs, computation times, and communication complexities, this work attempts to outline the respective strengths and weaknesses of each method. Since these results can be elicited from the fact that indeed anonymization techniques contribute more to data utility by reducing the risk of re-identification, whereas differential privacy guarantees a high privacy at the cost of accuracy due to the introduction of noise in data through a privacy budget epsilon. Other cryptographic techniques, like homomorphic encryption and secure multiparty computation, are computationally expensive and hard to scale but offer strong security. In that respect, this work concludes that these techniques protect privacy with great efficiency; however, a number of privacy-data usability and performance trade-offs need to be performed. Future research should be focused on enhancing the scalability and efficiency of these methods toward fulfilling the needs of real-time big data analytics applications without loss of privacy.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.