Data Literacy and GDPR (Know Your Risk)

JUNE, 2020 by Carlos Schults
In today’s post, we’ll discuss data literacy and its relevance in the context of GDPR. We start by defining data literacy and giving a brief overview of GDPR. Then we proceed to explain some of the challenges organizations might face when trying to find risk spots in their data. Finally, we give advice on how to locate and fix those dangerous needles in your gigantic haystacks of data. Let’s get started.  

What Is Data Literacy?

Let’s start by defining data literacy. Here’s Wikipedia’s definition: Data literacy is the ability to read, work with, analyze, and argue with data. Much like literacy as a general concept, data literacy focuses on the competencies involved in working with data. So in short, data literacy simply means being able to understand data. The analogy with “regular” literacy isn’t perfect, however. The Wikipedia article goes on to say: It is, however, not similar to the ability to read text since it requires certain skills involving reading and understanding data. According to the two parts of the definition, we can conclude that data literacy means being able to analyze and understand data, but that goes well beyond simply reading it. It definitely involves further abilities (probability and statistics come to mind, for instance.)

Why Is Data Literacy so Valuable?

To understand the motivations behind data literacy and why your organization should care about it, you just have to take a good look around. I mean, literally. Take a good look around the room you are in right now, and answer this question: how many internet-connected devices do you see? The minimum number will be obviously one. If you’re reading this on a computer, then the likely minimum answer is two, because I bet you have a smartphone on yourself. From there, the number can only go up: tablets, additional devices (an e-reader, perhaps?), a Google Home or Amazon Echo device, a smart thermostat, and so on. Why am I asking this? Simple: we live in a hyper-connected, data-driven era. In 2019, 188 million emails were sent daily, on average. This 2019 article claims that, by some point in 2020, the amount of digital data produced by humankind is expected to reach 44 zettabytes. That’s a number larger than what any of us could easily wrap our heads around, but that’s beside the point. The point is we produce and consume an amazing amount of data, and the trend shows no signs of slowing down—quite the opposite. Making sense of all that data is and will remain an essential skill for the twenty-first century.

Welcome to the Brave New World of GDPR

The next piece in this puzzle is GDPR. Your knowledge of GDPR might range from “I’m mildly aware of it” to “I know quite a lot about it,” (especially if you live in Europe). If you belong to the former group, we’ll now fill you in. GDPR stands for General Data Protection Regulation. It is an EU regulation on data protection and user privacy. You can easily read its full text online, but we’ll give you the gist of it. Businesses that handle personal data (called controllers and processors) must design their processes in a way that provides safeguards to protect the data. That could include techniques such as pseudonymization or anonymization. Controllers must always use the most strict privacy settings by default. Processors can’t process any personal data except under one of six bases that GDPR stipulates (consent, contract, public tasks, vital interest, legitimate interest, or legal requirement.) When such processing is done via consent, the user has the right to revoke their consent at any time.

GDPR Challenges

Now let’s put data literacy and GDPR together. What are some of the hardest challenges organizations face when it comes to GDPR compliance? How can data literacy help with those challenges? That’s what we’re going to see now.

Locating Risk Points

The first challenge companies might face here refers to localizing the risky spots in their data. That might not be too hard if you only have a single, small database containing a few tables with not a lot of columns. But as we’ve just mentioned, we live in a data-driven world. Gone are the days when companies only gathered data through forms on their web pages. Nowadays, data is being increasingly generated via mobile devices. And let’s not forget IoT devices, whose ranks only grow larger by the day. All of that is to say that the simple-database-with-few-tables scenario we’ve described above is getting rarer and rarer. If you’re reading this post, you probably belong to a demographic that’s likely to have more data than less. Now imagine having to locate all the columns with personal data in a gigantic database containing hundreds of tables, each one with dozens of columns. Suddenly, that doesn’t look so easy anymore.

Fixing Them (Remediation)

Locating the potential trouble spots in your data is hard enough, but it’s just the first step. After that, you’ll have to do something about them. This step is often called remediation. What does remediation look like when it comes to GDPR compliance? Put simply, the answer is data masking. Sensitive information, such as personally identifiable information (PII), needs to be properly obfuscated. Such masking can be done mainly in two ways: pseudonymization and anonymization. Pseudonymization is a process that replaces values in a sensitive field with a pseudonym, which is an artificial identifier. The main point of pseudonymization is that it allows the original information to be restored. Anonymization, on the other hand, refers to irreversibly de-identificating the data. Anonymization is supposed to be irreversible, while pseudonymization isn’t. The main challenge when it comes to remediation is that it’s a time-consuming and costly process. Engineers can spend months coding scripts to mask data, which represents a huge loss in opportunity cost. Then you have the possibility of these scripts containing bugs, which is pretty much a given. Even if the scripts are perfect, they still have to be maintained, causing you to incur more opportunity costs. In short: creating, validating, and maintaining data-masking scripts can represent an enormous waste for your organization.

How Enov8 Can Help You

As you’ve just seen, the challenges organizations face when it comes to GDPR are hard to overcome. Fortunately, there are tools that can help you. For instance, Enov8 and its Data Compliance Suite. Enov8’s Data Compliance Suite (DCS) is a DataOps platform that has data compliance as one of its main focuses. One of the platform’s main capabilities is its data profiling feature, which employs AI to automatically find the risk areas we’ve mentioned in the previous section and can save your organization a huge amount of time and money. However, as we’ve mentioned before, locating the risk points is just the first step. The next and necessary phase is remediation, and Enov8 DCS can help there as well by providing powerful data-masking capabilities. Finally, DCS also provides a data-validation feature that allows you to check whether a given piece of data is properly obfuscated.

Data Literacy & GDPR: Take Measures Today

Historically, digital security has always been a priority for companies that collect and handle consumer data. Failing to protect such data could lead to a financial or reputational catastrophe. But in this day and age, data security is even more important due to laws and regulations on consumer data, of which GDPR is certainly the most prominent example. Failure to comply will not only taint the reputation of your company; it will also cost you huge amounts of money in fines, and you could face legal consequences. Want your organization to stay alive and kicking? Then make digital security a top priority and give Enov8’s Data Compliance Suite a try today.  
Carlos Schults 

This post was written by Carlos Schults. Carlos is a .NET software developer with experience in both desktop and web development, and he’s now trying his hand at mobile. He has a passion for writing clean and concise code, and he’s interested in practices that help you improve app health, such as code review, automated testing, and continuous build.

Relevant Articles

Sand Castles and DevOps at Scale

03JUNE, 2022 by Niall Crawford & Carlos "Kami" Maldonado. Modified by Eric Goebelbecker.DevOps at scale is what we call the process of implementing DevOps culture at big, structured companies. Although the DevOps term was back in 2009, most organizations still...

Test Environment Management Explained

Test Environment Management Explained3JUNE, 2022 by Erik Dietrich, Ukpai Ugochi, and Jane Temov. Modified by Eric GoebelbeckerMost companies spend between 45%-55% of their IT budget on non-production activities like  Training, Development & Testing and lose 20-40%...

Serverless Computing for Dummies

3JUNE, 2022 by Eric GoebelbeckerWhat Is Serverless Computing? Serverless computing is a cloud architecture where you don’t have to worry about buying, building, provisioning, or maintaining servers. In return for structuring your code around their APIs, your cloud...

Test Environments – The Tracks for Agile Release Trains

25MAY, 2022 by Niall Crawford & Justin Reynolds. Modified by Eric Goebelbecker.So, you’ve decided to implement a Scaled Agile Framework (SAFe) and promote a continuous delivery pipeline by implementing “Agile Release Trains” (ART)*.  Definition: An Agile Release...

What Is Data Masking and How Do We Do It?

24MAY, 2022 by Michiel Mulders. Modified by Eric Goebelbecker.With the cost of data breaches increasing every year, there’s a need for higher security standards. According to IBM’s 2021 security report, the average total cost of a data breach has risen to $4.24...

Test Environments: Why You Need One and How to Set It Up

24MAY, 2022 by Keshav MalikWith the rise of agile development methodologies, the need to quickly test new features is more critical than ever. This is especially true for websites and applications that rely on real-time data and interaction. The only way to ensure...