Feature Based Data Anonymization for High Dimensional Data
Abstract
Information surges and advances in machine learning tools have enable the collection and storage of large amounts of data. These data are highly dimensional. Individuals are deeply concerned about the consequences of sharing and publishing these data as it may contain their personal information and may compromise their privacy. Anonymization techniques have been used widely to protect sensitive information in published datasets. However, the anonymization of high dimensional data while balancing between privacy and utility is a challenge. In this paper we use feature selection with information gain and ranking to demonstrate that the challenge of high dimensionality in data can be addressed by anonymizing attributes with more irrelevant features. We conduct experiments with real life datasets and build classifiers with the anonymized datasets. Our results show that by combining feature selection with slicing and reducing the amount of data distortion for features with high relevance in a dataset, the utility of anonymized dataset can be enhanced.
Keywords: High Dimension, Privacy, Anonymization, Feature Selection, Classifier, Utility
DOI: 10.7176/JIEA/9-2-03
Publication date: April 30th 2019
To list your conference here. Please contact the administrator of this platform.