Sunday, August 2, 2015

2015-033: Data anonymization and Valuation, Privacy, and Ethical medical research

Katherine Carpenter is a pivacy consultant who has worked all over the world helping to develop guidelines for ethical medical research, sharing of anonymized data, and helping companies understand privacy issues association with storing and sharing of medical data.


This week, we discuss how companies should assign value to their data, the difficulties of doing research with anonymized data, and the ramifications of research organizations that share data irresponsibly.


email contact:



Katherine’s note, comment, and links.

It is good to be thinking about de-identification (especially regarding health care data)


I think a better question to ask is how easy is it to re-identify information that has been de-identified. The HIPAA rule has 18 Identifiers which count as Personally Identifiable Information (PII) or Personal Health Information (PHI) include birth date, zip code, and IP address; When data is collected in non-health contexts, these identifiers are not considered PII/PHI (for example: this kind of information can be used for marketing purposes or financial/credit-related purposes).


A brief history on the topic:

in 1997 a precocious grad student IDed the Governor of MA using purchased voter records to reID deIDed health information that was released. (This study was one motivator to pass HIPAA.) Further research along the same lines of the previous project can be summed up with a simple and scary statistic: in 2000, 87% of Americans may be uniquely identified by combining zip code, birthday and sex(gender).


For this reason, health information is threatened not only by deID’n & reID’n, but by the combination of and other types of information that are publicly available or available for purchase and could reveal things about an individual that would contribute to reID of individual’s health info.


Here are a bunch of articles that discuss the topic from different angles.


Dwork, C. and Yekhanin, S. (2008), “New Efficient Attacks on Statistical Disclosure Control Mechanisms,” Advances in Cryptology—CRYPTO 2008, to appear, also at


Is Deidentification Sufficient to Protect Health Privacy in Research?

Mark A. Rothstein

Here is a new episode of Brakeing Down Security!

No comments: