Anonymous Database Data is Nothing without Encryption

In recent years, the world has witnessed a major change in the way personal data is managed and protected.

EU General Data Protection Regulation (GDPR) and other regulatory initiatives have granted consumers powerful new rights to determine how organizations collect and use personally identifiable information (PII).  Companies that hold on to personal data without consent, or who fail to employ adequate measures to protect it, may face stringent penalties.

Yet, there is one important exception. Anonymized data – information held without key details to prevent identification – is exempt from the rules.

Recent research, however, has revealed how individuals can be recognized by cross-referencing anonymized data against related data sets in the public domain.

Encryption technology, commonly used by enterprise virtual private networking (VPN) software, is the only reliable way to protect anonymised database data from those intent on piecing things together for unscrupulous ends.

Why Anonymize Data?

Anonymized datasets are outside the purview of the GDPR and similar regulations around the world. They let organizations collect, analyze and store information unimpeded by individuals submitting data subject access requests (DSARS) or demanding to access or delete their details.

Data in anonymized form is meant to reduce the chance of a breach or damage from its loss because it cannot be used to identify specific individuals. Received wisdom holds that with no threat to personal privacy there is no risk of punitive fines.

Anonymized data is ideal for medical trials or market research. A regional health authority, for example, may decide to take patient names, addresses and dates of birth out of digitally stored medical records. Files missing this information can be used for research purposes without risk of disclosing individual identities.

It’s not just medical research that benefits from anonymized data. In one recent example, Transport for London mined anonymized mobile phone data of passengers to gather information that enabled it to create more accurate journey times and arrival estimates.

Yet, while anonymized data undoubtedly has its uses, it is far from perfect.

Putting the Pieces Together

On its own, anonymized data is impossible to decipher. There’s nothing wrong with it.

Until, that is, someone starts to cross-reference it against related, publicly available data sets such as an electoral roll or a national census.

Belgium’s Université Catholique de Louvain (UCLouvain) and Imperial College London discovered this can actually be achieved with alarming accuracy.

One example from the study found that an anonymised dataset containing 15 demographic attributes could be used to identify individuals in the state of Massachusetts with 99.98 percent accuracy. Remarkable when you consider the state population is close to seven million people.

In another prominent example, researchers found that publicly available anonymous data about journeys taken by New York City cab drivers could be used to reveal their home addresses.

In 2017, a study by the University of Melbourne found anonymised public medical records could be used to identify close to 10 percent of all Australians.

A guiding principle is the smaller the dataset the more accurate the de-anonymizing process – especially when cross-referenced against the right database.

Double Down with Encryption

European regulators have shown they are ready to issue stiff penalties to organizations that do not take proper precautions with anonymized data. Denmark’s data protection agency saw fit to fine a taxi company approximately $180,000 in March this year for failing to anonymise data properly.

Moreover, companies cannot afford to stand still. They must advance their protection techniques in step with tools and technology. A process deemed sufficient to anonymize data one year might be out of date the next.

Clearly, organizations cannot expect anonymized database data alone to protect sensitive customer information. Firms must take additional steps to ensure privacy is given adequate safeguards.

Encryption is one of the most reliable strategies for protecting the privacy of digital assets, especially if the organization needs to send or share them over the public Internet. Encrypted data is encoded and can only be accessed with the correct key, usually using symmetric- or public-key encryption. Data treated this way is impossible to decypher, effectively rendering it unintelligible to outside observers.

Encryption is essential to protect database data in storage but also on the move. A professional, enterprise-quality VPN is an extremely effective way to secure digital communications.

In summary, database anonymization is a good way to store personal information collected in the course of research. However, with even the largest sets at risk of being re-identified through cross referencing, researchers cannot trust anonymization alone to keep private any sensitive personal data they hold. Firms must go the extra mile by implementing robust, enterprise-standard VPN to guarantee customers’ personal privacy remains fully protected at all times.