It appears no script is enabled within your browser. Please enable JavaScript to use this site.
Skip header and navigation
Home
View Selections:
0
Items
Help
Print
Protecting Privacy in Large Datasets-First We Assess the Risk; Then We Fuzzy the Data.
https://arctichealth.org/en/permalink/ahliterature297961
Source
Cancer Epidemiol Biomarkers Prev. 2017 08 01; 26(8):1219-1224
Publication Type
Journal Article
Date
08-01-2017
More detail
Author
Giske Ursin
Sagar Sen
Jean-Marie Mottu
Mari Nygård
Author Affiliation
Cancer Registry of Norway, Oslo, Norway. giske.ursin@kreftregisteret.no.
Source
Cancer Epidemiol Biomarkers Prev. 2017 08 01; 26(8):1219-1224
Date
08-01-2017
Language
English
Publication Type
Journal Article
Keywords
Confidentiality
Data Anonymization
Electronic Health Records - standards
Female
Humans
Norway
Registries
Risk Assessment - methods
Abstract
Background: Privacy of information is an increasing concern with the availability of large amounts of data from many individuals. Even when access to data is heavily controlled, and the data shared with researchers contain no personal identifying information, there is a possibility of reidentifying individuals. To avoid reidentification, several anonymization protocols are available. These include categorizing variables into broader categories to ensure more than one individual in each category, such as k-anonymization, as well as protocols aimed at adding noise to the data. However, data custodians rarely assess reidentification risks.Methods: We assessed the reidentification risk of a large realistic dataset based on screening data from over 5 million records on 0.9 million women in the Norwegian Cervical Cancer Screening Program, before and after we used old and new techniques of adding noise (fuzzification) of the data.Results: Categorizing date variables (applying k-anonymization) substantially reduced the possibility of reidentification of individuals. Adding a random factor, such as a fuzzy factor used here, makes it even more difficult to reidentify specific individuals.Conclusions: Our results show that simple techniques can substantially reduce the risk of reidentification.Impact: Registry owners and large-scale data custodians should consider estimating and if necessary, reducing reidentification risks before sharing large datasets. Cancer Epidemiol Biomarkers Prev; 26(8); 1-6. ©2017 AACR.
PubMed ID
28754793
View in PubMed
Less detail
Permalink