Telephone +44(0)1524 64544
Email: info@shadowcat.co.uk

GDPR and Data Management

Introduction

Mon Dec 4 13:00:20 2017

The GDPR, the General Data Protection Regulation, is enforceable from 25th May 2018, the law was passed into effect in 2016 and was then moved to a transitionary phase to allow businesses and organisations to adapt. I will be discussing some of the aspects of the GDPR as I navigate helping businesses and organisations I am involved with change to reflect the new legislation.

In this article I am going to focus on some of the challenges we will be facing when we consider the holding of personal data and how to manage, store and erase it. A lot of this article is based around an excellent talk I attended at the London Perl Workshop by JJ Allen1 who gave a view on: ‘To Delete or not to Delete: A Practical Guide to the Erasure of Data’.

The 5 Step Plan

There are five stages for us to consider and you might want to think about them as a five point plan. They are: analyse, consolidate, identify, anonymise and match. We are going to be looking at them in that order.

An important fact to note is that although the GDPR will cover the UK until Brexit is complete we will still need to have the procedures in place if storing or processing data on any EU citizen. The UK is currently producing its own Data Protection Bill which is going through the House of Lords at this moment.

1. Analyse

The first thing we are going to need to do is to analyse the whole of your system. Where is your data? Who has access to that data? Where do you share that data?

Some of the places to consider are an ERP (Enterprise Resource Planning), your Accounting software, any CRM (Customer/Consumer Resource Manager), on the Website, and what about Email, both as users and mentioned in other emails?

Then you have to think about all the 3rd Party Suppliers you might use as an organisation. So do you use any SAAS (Software as a Service), HR software, Payroll, what about Cloud suppliers, and the real hidden depths that are Backups, Logfiles, Developer Environments.

What about the devices where data may have been stored or exchanged such as memory sticks, file shares, phones, tablets, anywhere an excel spreadsheet may have gone. Speaking of which there are what JJ calls the ‘Feral Systems’. How many random spreadsheets are floating around in emails and directories that may not have been audited, what about the whole notion of Shadow IT with mobile services and online providers like Trello, Jira etc. that people sign up to and use and never tell anyone else. If your organisation is large enough there will be people in departments using systems where only one or two people will know what is used and by whom.

The whole area of individually managed SaaS providers is a nightmare in the auditing process. Let’s not even talk about Social Media and the random shares from your employees. In a later post I am going to start to talk about how we have to take a step towards asking for permission up front when we use Social Media and similar. Even when we consider our employees.

At some point you need to create an Organisational Information Schema - the best way is to visualise this into a Data Map. You should track all the potential areas that your data may be, audit them and then place them on a diagram so you have a way of representing how the data is collected, moved and stored. This visual map will be an easier way for you to examine what you need to do in the later steps, especially as we look to find a single point of validity and consolidate the data.

2. Consolidate

Your next task is to work out a manner that you can consolidate your data. This would be a sort of canonical store (obviously easier to secure and a single verification point for validity.2 This would be our single customer view. This will make the steps that are enforced under the GDPR such as: Transparency, Rectification, Deletion/Erasure, Transportability a lot easier to perform.

The consolidation means that you will have to think about how you store and process people’s information. There has to be a reduction in the duplication of data, especially sensitive data. But the right to Rectification gives a data subject the ability to request that any one piece of information that is wrong about them can be changed, in fact -must- be changed. If you store the data in multiple areas it will be a nightmare to alter, if there is a single source it will be much easier.

3. Identify

The next stage is to consider what data you have is personally identifiable. The GDPR states that you must treat as sensitive, in fact consider not storing or using, any data that can personally identify someone. But what can we consider as personally identifiable?

The obvious elements are: Name, Address but what about IP Address? Unless you are spoofing a Mac and an IP you are pretty much identifiable. We can lock IP address down to time and location of usage as well as location.

How about combinations of data? These can become personally identifiable, so postcode plus date of birth will often narrow a search down to one or two individuals (we have to consider twins and the vagaries of chance), add gender however and you are almost likely have the exact person. So we have to consider what data we return and to whom as multiple pieces of data will build into an identifiable person.

So try to think of what data you keep. Think about what you need to keep. Maybe you should only keep the accounting, and if needed the transaction data as long as it is not identifiable, so don’t keep who made it, or store this data away in some tokenised form, which makes us think about data as an anonymous object.

4. Anonymise

Discussing tokenisation takes us neatly into the notion of storing as much data as possible in an anonymised manner. What we are doing is separating the data that has personally identifiable components, which is of course any biological data along with transactional data that has geographical, chronological or other location based components, and storing it in a non-relational manner.

We should also be building systems that wherever possible the revealing of any data is not needed. For instance, we might use encrypted keys to lock personal data where the key is related to the data subject and stored next to their name, but not revealed to the data controller. If the system needs to compare data it can use the encrypted information as a verification without revealing the information or directly relating it to an individual. This can be used to study statistical information in abstract.

Also it is almost mandatory that we move all data into an encrypted system or to being anonymised, we should strive for a suituation where only absolutely essential information is stored. This is not about just compliance it is efficient and correct.

When doing encryption and wherever possible is should be difficult to undo, I would love to say impossible but that would be unlikely.

5. Match

We have stored our minimal data and we have anonymised it as much as possible. There is now only a token to use that the system will recover and match data to only on specific request. All that is good. But we are required to be able to manipulate data (if incorrect) be transparent and to delete data.

There was this conundrum presented at a technical meeting JJ attended:

‘Delete everything you know about me and never contact me again?’

So, if we delete everything about you how do we know not to contact you if we come across you again in some other way? It is a little knotty but there are ways to manage this. JJ suggested you might want to create an anonymised block list and use tokenisation and automatic rejection which is an excellent solution to the problem.3

So the system knows to reject the person by doing a match to its list before it is ever passed into being accepted as a data source to be held. Since the identifiable component might be anonymised (to data controllers) and encrypted (to prevent use by others) this is as close to getting to obey that request as is possible.

We should however note that in this particular situation they have not breached the GDPR. They did not contact the individual, it went the other way. If this was by a 3rd party and not the data subject it is the 3rd party who is in error for sharing that data and not informing the data subject of where it was going and being shared with. If however it was the data subject themselves, they have chosen to contact you and that is at least giving consent for you to respond to this new request, even if that response is to state that they have no record and that the communication invalidates a standard check to authorise any further communication without consent.

In this regard you are always seeking to gain appropriate consent for your data processing which is fully compliant with Article 9(?) of the GDPR.

However we might consider it would logically follow that if the user contacts you and presents the information themselves a second time you are not contacting them but vice versa and that implies a certain level of consent. But, make sure to always check and have a clearly written consent process they can agree to.

In the next article I am going to be following on with some thinking about encryption and anonymisation and what data we apply these to.

[Don't forget that you can join in this conversation by using the comments form at the bottom of the page or by tweeting at @shadowcat_mdk]


  1. In regards to JJ’s excellent presentation this piece is strongly based and influenced on it. However it is my interpretation of what he said along with my own understandings and opinions of the GDPR. Any errors, omissions or misrepresentations are therefore mine, and my mistakes. Please try to watch his excellent piece and I will accept any contribution that highlights issues or offers corrections or suggestions. Please use the comment area to contact me or any of the usual methods. ↩

  2. There are concerns about a single point of failure and making sure that you archive and back-up the data well but this should be a part of your process anyway. A single verification point allows you to have a location that you can have a request and push system. Elements are not written to but requested from making the over-writing and changing of sensitive data more difficult. ↩

  3. In regards to the security of such lists and encryption you might want to use a slow hashing function for extra security and always use keyed hashes. ↩