3 data privacy solutions for public health research
How can public health researchers leverage patient data without compromising privacy?
When it comes to patient data privacy, public health leaders are caught between a rock and a hard place. On one hand, they require high-quality and comprehensive data to fuel their research and develop effective interventions. But they also must protect privacy, as patient health data is frequently targeted in data breach schemes. In fact, between 2015 and 2022, 32% of all recorded data breaches were in the healthcare sector—almost double the number recorded in the financial and manufacturing sectors.
Complicating matters is the need to connect and combine patient data sets to derive insights. For example, if a patient’s name and social security number are the link between their HIV test results and their medical risk factors, a data breach could publicize the patient’s HIV status, name, social security number, and more. Likewise, for smaller test groups like historically underrepresented communities and rare disease patients, it can be difficult to make progress on research because the ability to identify sensitive data becomes greater as the sample size reduces.
“Despite the challenges, data sharing is essential. The public and health care sectors need to share data to prevent and control infectious disease outbreaks, chronic diseases, and other risks to the public. We saw the importance of such sharing during the COVID-19 pandemic when policymakers and the public wanted the most accurate assessment of risk. But such sharing must be done with the utmost caution and privacy protections, or other serious problems will result—including loss of trust in the health system.”
How can agencies protect highly sensitive, personally identifiable information (PII) and protected health information (PHI) without restricting it so much that it can’t be used at all?
Promising health data privacy solutions
Here are three techniques we’ve been exploring and researching for our federal health clients:
Homomorphic encryption is a specific cryptographic technique that allows analysts to perform analytics and data processing with patient-level data—without needing to decrypt it first. Because the data is fully encrypted and never exposed, it remains unreadable even by those doing the computations, protecting patient privacy while offering the full research value of the data to agencies.
Using homomorphic encryption, our data scientists have successfully carried out analytics and trained classification models on data while it was fully encrypted—in other words, the data was not only encrypted both at rest and in motion/transport, but also while in use. We conducted analytics as both single-party and multi-party computations for a leading U.S. public health agency, helping them assess the limitations and opportunities homomorphic encryption presents for public health research.
Homomorphic encryption is best suited for simple computations on small to moderately sized quantitative datasets. However, advancements in techniques and hardware acceleration are gradually improving its performance.
Confidential computing is an infrastructure technology that protects data as it’s being used by analyzing it in a secure area of a main processor, which prevents unauthorized access or data manipulation. Confidential computing works by establishing a security boundary, or secure enclave called a trusted execution environment (TEE), to isolate the computation from the rest of the system. Data is decrypted only within the TEE—once the computation is complete, the data is re-encrypted and returned to its original state.
Our data scientists have developed a proof-of-concept that demonstrates single- and multi-party computational analytics in a TEE in the cloud. While there are many intricacies to the confidential computing architecture, this technique is suitable for complex workloads and large datasets, and often requires collaboration with a cloud provider or an enterprise partner.
Privacy-preserved datasets use a hybrid of masking privacy techniques to create variance data sets, so they can be shared and protected at different levels for different purposes and population sizes, with varying levels of granularity. This bypasses the typical limitations seen in sophisticated analysis by mixing synthetic data in with the real data, or by masking certain fields, without losing the significance of the data set. The original data can then be multi-purposed in a variety of ways. We are exploring public health uses cases with Anonos and their patented implementation of privacy-preserved datasets, Variant Twins.
These three techniques—homomorphic encryption, confidential computing, and privacy-preserved datasets—make it easier for risk-averse data owners to share their data, and the promise of privacy-preserving technologies is likely to play a prominent role in shaping the legal and regulatory landscape surrounding public health data management and sharing.
Making a choice
These are just three of many data privacy technologies now available. Some can be combined at scale, but since none are the single-best solution across the board of public health data privacy challenges, it can be hard to know what to look for, especially with new techniques frequently coming online.
Our initial R&D work has helped our public health agency clients understand the fundamental differences in the use cases that these techniques apply to—when it’s prudent to use one versus another—and will help them make informed decisions moving forward.
A trusted partner with experience not only in data privacy research and development in general, but in the public health sector as well, is vital to applying the right technology to your unique challenge. Explore our health IT and data and analytics capabilities.