Feature Based Data Anonymization for High Dimensional Data

Esther Gachanga, Michael Kimwele, Lawrence Nderu


Information surges and advances in machine learning tools have enable the collection and storage of large amounts of data. These data are highly dimensional.  Individuals are deeply concerned about the consequences of sharing and publishing these data as it may contain their personal information and may compromise their privacy. Anonymization techniques have been used widely to protect sensitive information in published datasets. However, the anonymization of high dimensional data while balancing between privacy and utility is a challenge. In this paper we use feature selection with information gain and ranking to demonstrate that the challenge of high dimensionality in data can be addressed by anonymizing attributes with more irrelevant features. We conduct experiments with real life datasets and build classifiers with the anonymized datasets. Our results show that by combining feature selection with slicing and reducing the amount of data distortion for features with high relevance in a dataset, the utility of anonymized dataset can be enhanced.

Keywords: High Dimension, Privacy, Anonymization, Feature Selection, Classifier, Utility

DOI: 10.7176/JIEA/9-2-03

Publication date: April 30th 2019

Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email: JIEA@iiste.org
ISSN (Paper)2224-5782 ISSN (Online)2225-0506
Please add our address "contact@iiste.org" into your email contact list.
This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.
Copyright © www.iiste.org