Data Sec

What is Data Masking? And Best Practice!

20

MAY, 2022

by Jane Temov. 

 

Most organizations employ strong security measures to keep production data secure while being made available for day-to-day business activity.

However, Data may be utilized for less secure activities like testing and training, or by third parties vendors not affiliated with the organization. This use of data for the “secondary purpose” might put the data at risk, resulting in regulatory breaches.

Masking data is a way of producing a replica of data that appears to be structurally identical to the original while concealing sensitive information like personal data. The version with the obfuscated information, the secure data, may then be utilized for a variety of applications, such as user training and software testing.

The primary goal of masking data is to develop a functional replacement that is both usable, and production-like, and will protect sensitive data.

The ability to mask data is important in many situations, such as when you want to access information, or functionality, while still keeping sensitive data private. Simply put, Data masking techniques employ the same data format to mimic the original data while altering important values.

In this Enov8 article, we will discuss:

  • Data Masking Methods
  • What Data Requires Data Masking?
  • What Regulations Require Data to be Masked?
  • The Types of Data Masking
  • The Challenges of Data Masking
  • Data Management Best Practices

Data Masking Methods

There are many methods for manipulating data, including character shuffling, word or character replacement, and encryption. Each technique has its own set of benefits.

However, when it comes to masking sensitive data the values must always be altered in some way that makes reverse engineering impossible, thus avoiding the opportunity for a data breach.

Here are some examples of supporting data security via data masking:

  • Lookup Replacement. Replacing personally identifiable information like names with alternative “lookup” values. For example, replacing “James” with “William”.
  • Data Values Deleting or “nulling out”. Deleting or Nulling sensitive values, for example, “Comments”.

Note: This is a useful method when the data isn’t considered to be of significant value for reuse. And/or the classified information is too “random” to be replaced with anything meaningful.

  • Data Values Ciphering. Protect Ciphering the sensitive data, so that you can not easily read the data without access to the cipher method(s) employed.

Note: Ciphering sensitive data sources has its place, particularly when you want to keep the shape of the data and/or want to ensure uniqueness. However be warned, it can be reversed, so use it carefully.

  • Data Encryption. Encrypting the sensitive data, so that you can not read the data without a decryption key. For example, using symmetric key encryption like AES.

Note: Although “hard to crack”, encrypting sensitive data has its downsides. Firstly it changes the shape of the data, For example, an AES string does not look like a real name.

Mary | 8cb2237d0679ca88db6464eac60da963 
  • Data Shuffling. Protect sensitive data by Scrambling, or Shuffling the data, so the new data no longer lines up with the original data. For example, shuffling Account Numbers, so that the customer names are now “misaligned” (or no longer reflect production reality).

And/Or

  • A Combination of Obfuscation Methods. Some data, dependent on data sensitivity and your security protocol, better suits certain data masking methods.

 

 

What Data Requires Data Masking?

How sensitive data is, and the need to mask data may depend on the end function, the type of data, and vulnerability to a data breach. Here are the most common data types that require data masking:

  • Personally identifiable information (PII). The data that can be used to help identify individuals, for example, customers & employees. Good examples of PII include information like the first name, last name, address, passport number, driver’s license number, and social security number.
  • Payment card information (PCI). Businesses that handle credit and debit card transactions must protect consumer information securely under the Payment Card Industry Data Security Standard (PCI DSS).
  • Protected health information (PHI). The data that is collected by healthcare service providers to identify appropriate care. Examples include medical histories, health conditions, insurance information, demographic information, and test and laboratory results.
  • Intellectual property (IP). Intellectual property (IP) is data related to creations of the mind, including inventions, business plans, designs, and specifications. Unsurprisingly, these assets are considered high-value by organizations and must be securely protected from unauthorized access and theft.

 

 

What Regulations Promote Data to be Masked?

Three notable examples of regulations that have been put in place by governments and by industry to protect personal data are HIPAA, GDPR, and PCI-DSS.

HIPAA. HIPAA is the Health Insurance Portability and Accountability Act, a law in the United States that sets national standards for the privacy and security of health information.

GDPR. The General Data Protection Regulation (GDPR) is a regulation in the European Union in the area of data protection. The GDPR was adopted on April 14, 2018, and came into force on May 25, 2018. The GDPR regulates the handling of personal data by controllers and processors within the European Union.

PCI-DSS. The Payment Card Industry Data Security Standard (PCI-DSS) is a set of security standards designed to protect cardholder data. PCI-DSS applies to all organizations that process, store or transmit credit card information. The standard is managed by the Payment Card Industry Security Standards Council (PCI SSC), an organization founded by major credit card companies.

 

 

The Types of Data Masking

Fundamentally, there are two data masking processes:

  • Static data masking. Static Data Masking involves creating a duplicated database (or data set), with all or part of the data masked. This new “secured data” would be maintained separately from the production data, probably within a non-production environment.

Note: Static Data masking is the most common form of data masking and is particularly useful in the Software Development Lifecycle, where developers and testers need production-like fake data but don’t need to see sensitive personal data.

Note: In the modern world, where a lot of IT services are outsourced, it is very important to ensure your data in your Test Environments is fully desensitized. According to Gartner, 70% of Data Breaches are committed internally.

  • Dynamic data masking. Dynamic data masking changes information, on the fly, in real-time, as it is accessed by the end-users. This technique is applied directly to production datasets. The technique is effective in preventing unauthorized users from viewing sensitive data by encrypting and hashing the data displayed.

Note: This method is typically more useful in the production world, where certain roles and responsibilities have different security ratings and access privileges. For example, a Doctor would have more access to the patient’s PII data than the Medical Receptionist.

 

 

What Are the Challenges of Data Masking?

Here are some of the key challenges involved in data masking:

  • Data Discovery. Data by its nature is inherently complex, potentially thousands of tables & billions of data points on a single platform alone. The task of data discovery should not be underestimated & can not usually be done manually.
  • Format preservation. When the masking system replaces the original data with fake data, it should preserve the original format and avoid data loss. That means the data masking solution understands the data it represents. The risk of not maintaining the original data format is the business logic could fail and the application stop working.
  • Retain Referential integrity. Data inside an application is connected by relationships/dependent, for example, Primary & Foreign keys. When the masking solution changes the values, these values must be modified consistently across the dataset*.

Note*: This applies to cross-platform/multiple data sets also. For example, if Application 1 talks to Application 2 & 3, then ensure the masking rules are consistent across each.

  • Value Range Integrity. Database constraints are often designed to restrict the range of values that may be entered, for example, the range of salaries. To preserve the data’s semantics, any masked data must fall within the specified range.
  • Preserve Uniqueness. The masking system should apply distinct values to every sensitive data element when masking “unique data”. For example, If the table in question stores a Customer Account ID, then each customer should receive a unique Account ID after masking. There should be no duplicates.

Note: By doing this, the shape/distribution of the masked data should be retained and you avoid the potential for data contention & application breakage.

  • Gender preservation. When replacing a person’s name in the database, the masking mechanism should be gender-aware and able to distinguish between male and female names.

Note: If the masking system changes names at random, the gender balance in a data set will be disrupted.

 

 

Data Masking Best Practices

Best practices data masking methods include:

  • Data & Risk Discovery. Before you can safeguard your data, you must first understand what data you are dealing with and separate it into categories based on its sensitivity. A group of Subject Matter Experts, like A Data Protection Officer, Test Data Manager, Security, and Platform experts will typically work together to compile a comprehensive record of all data risks in an organization. However, this can be time-consuming.

Tip! The Enov8 Test Data Management “Profiling Module” uses AI to do this automatically.

  • Data Obfuscation. Refer to “Data Masking Methods” above. The act of securing data so that sensitive risks, like customer information, are addressed without impacting the useability of the data.

Tip: The Enov8 Test Data Management “Masking Module” allows you to use the Profiling Information to generate “consistent” methods on the fly. Thus reducing months of engineering effort & potential for cross-platform mistakes.

  • Data Compliance Validation. The ability to test/check that your obfuscation methods have been successful. An essential post securitization function that ensures mistakes were not made.

Tip: The Enov8 Test Data Management “Data Validation” module helps you scan areas of concern, looking for production smells i.e. indications that something has been missed.

 

 

Data Masking Conclusion

Data masking is a technique used to protect sensitive data. Obfuscating or replacing the original data with fake data, preserves the format and structure of the original while hiding the sensitive information. There are several challenges involved in data masking, but following best practices can help you overcome them. Data masking is an essential step in protecting your data and ensuring compliance with data privacy regulations.

Data Masking Next Steps

  • Interested in identifying sensitive information inside real data?
  • Interested in protecting sensitive data, your business information or trade secrets?
  • Interested in helping testing teams by rapidly delivering test database(s) to your training or testing environment?
  • Interested in implementing an Enterprise Data Masking Capability?

Why not ask for a demo of Enov8 Test Data Manager (aka Data Compliance Suite).

 

 

Enov8 Test Data Manager (aka Data Compliance Suite)

A Data Securitization and Test Data Management platform that helps you DevSecOps your Test Data & Privacy Risks. A platform that helps you identify where data sensitivity resides within a production database, rapidly remediate these risks to remove data sensitivity, avoid unauthorized disclosure and/or data breaches, and centrally validate your compliance success. The solution also comes with IT delivery accelerators to support Data DevOps (DataOps), data analysis, data mining, test data bookings, and ultimately accelerate your software delivery process.

Post Author

This post was written by Jane Temov. Jane is a Environment, Release & DataOps Evangelist working at Enov8 in Sydney Australia.

 

Relevant Articles

Enov8 DCT – The Data Control Tower

Enov8 DCT – The Data Control Tower

April,  2024 by Jane Temov. Author Jane Temov.  Jane is a Senior Consultant at Enov8, where she specializes in products related to IT and Test Environment Management, Enterprise Release Management, and Test Data Management. Outside of her professional work, Jane...

Enterprise Release Management: The Ultimate Guide

Enterprise Release Management: The Ultimate Guide

April,  2024 by Niall Crawford   Author Niall Crawford Niall is the Co-Founder and CIO of Enov8. He has 25 years of experience working across the IT industry from Software Engineering, Architecture, IT & Test Environment Management and Executive Leadership....

Understanding ERM versus SAFe

Understanding ERM versus SAFe

April,  2024 by Jane Temov. Author Jane Temov.  Jane is a Senior Consultant at Enov8, where she specializes in products related to IT and Test Environment Management, Enterprise Release Management, and Test Data Management. Outside of her professional work, Jane...

Serverless Architectures: Benefits and Challenges

Serverless Architectures: Benefits and Challenges

April,  2024 by Jane Temov. Author Jane Temov. Jane is a Senior Consultant at Enov8, where she specializes in products related to IT and Test Environment Management, Enterprise Release Management, and Test Data Management. Outside of her professional work, Jane enjoys...

The Crucial Role of Runsheets in Disaster Recovery

The Crucial Role of Runsheets in Disaster Recovery

March,  2024 by Jane Temov.   Author Jane Temov Jane Temov is an IT Environments Evangelist at Enov8, specializing in IT and Test Environment Management, Test Data Management, Data Security, Disaster Recovery, Release Management, Service Resilience, Configuration...