Encryption vs. tokenization vs. whatever: What you need to know

About

Terminology in the data protection world is often surprisingly imprecise. “Encryption” is something we all understand, or at least think we do: “A reversible algorithm that uses a key [some secret material] to make data unusable.”

But folks also throw around terms such as “tokenization” and “masking” and “obfuscation.” Sometimes these refer to specific technologies; sometimes they are used generically, to mean “some form of data protection.” For example, people often say things such as, “We [tokenize, encrypt, mask, obfuscate, protect] the data”—without specifying how, and meaning different things at different times.

Even when the terms are used precisely, people often misunderstand the differences between them. And those differences matter. Here’s my take on this mess.

[ Get up to speed on new privacy laws with this Webcast: California’s own GDPR? It’s not alone. Plus: Go deeper with TechBeacon’s guide to GDPR and CCPA. ]

By Phil Smith III.

Obfuscate and protect

These are generic terms, meaning “The data is hidden somehow.” That might mean replacing it with other data through encryption or another method, or simply not displaying it, perhaps showing just ***-**-**** for a US Social Security number.

Actually, when you’re not intending to call out a specific technology, these are probably the best terms precisely because of their, well, imprecision.

Mask

“Masking” is perhaps the most abused term. Sometimes people use it to mean “we keep the real data hidden” when they are actually just “obfuscating” or “protecting.”

But masking can be more specific, referring to dynamic data masking (DDM) or static data masking (SDM). These technologies, conceptually similar, mean changing sensitive information in production data to other, similar values, at specific points in the data flow.

DDM occurs as the data is used:

When a data feed is sent from one application to another, less trusted environment
When data is displayed—for example, showing only the last four digits of a credit card number on screen in a call center application, so it can be used for account verification

While DDM has its place and is obviously better than nothing, using it as a primary means of data protection tends to give security folks agita, since it means that a database breach is guaranteed to reveal cleartext.

SDM means copying production data (in toto or a subset) and replacing sensitive information with other values. Typically this is done to enable realistic test conditions based on production volume, variability, etc., without risking exposure.

Some SDM tools preserve referential integrity; that is, all occurrences of “Bob” (and only “Bob”) might be changed to “Thomas.” Others change values without consistency. Sometimes “Bob” becomes “Thomas,” and other times it becomes “Frank”; perhaps both “Bob” and “Thomas” become “Frank,” consistently or otherwise.

In either case, the resulting data can be used without regard for security, since it no longer contains actual sensitive values.

[ Get on top of access with TechBeacon’s guide to identity governance. Plus: Learn how to secure and manage cloud-based Linux resources with Active Directory in this Webinar. ]

Read the full article here.