← Back to Resources
PseudonymizationGDPRTechnical

Practical Pseudonymization Techniques for GDPR Compliance

A technical guide to pseudonymization techniques that reduce data protection risk while preserving data utility.

GlobalDataShield Team||6 min read

What Is Pseudonymization?

Pseudonymization is the processing of personal data so that it can no longer be attributed to a specific individual without the use of additional information. That additional information must be kept separately and protected by technical and organizational measures.

GDPR explicitly encourages pseudonymization as a data protection safeguard (Article 25 and Recital 28). Pseudonymized data is still personal data under GDPR -- unlike truly anonymized data -- but it benefits from a reduced risk profile and more flexible processing options.

Pseudonymization vs. Anonymization

FeaturePseudonymizationAnonymization
ReversibleYes, with additional informationNo (irreversible)
Still personal data under GDPRYesNo
GDPR appliesYes, but with benefitsNo
Data utilityHigh (can be re-linked)Variable (may lose granularity)
Suitable for analyticsYesYes, but limited
Suitable for individual-level processingYes (when re-linked)No

The key distinction: pseudonymized data can be re-identified using the separately stored mapping, while truly anonymized data cannot be re-identified by any reasonably available means.

Benefits of Pseudonymization Under GDPR

  • Risk reduction: Pseudonymized data poses less risk if breached, as the attacker cannot immediately identify individuals
  • Broader processing grounds: Recital 29 suggests pseudonymization can facilitate processing beyond the original purpose under certain conditions
  • DPIA mitigation: Pseudonymization is recognized as a risk mitigation measure in Data Protection Impact Assessments
  • Breach notification: A breach involving pseudonymized data may not require individual notification if the data is practically unintelligible to unauthorized parties
  • Data minimization: Pseudonymization supports the data minimization principle by reducing the identifiability of data in systems that do not need to identify individuals

Pseudonymization Techniques

1. Token Replacement (Tokenization)

Replace identifying values with randomly generated tokens. Maintain a separate lookup table that maps tokens back to original values.

How it works:

  • Original: john.smith@email.com becomes TKN-8f3a2b1c
  • The mapping TKN-8f3a2b1c -> john.smith@email.com is stored in a secured, separate system

Best for: Structured data in databases where you need to maintain referential integrity across tables

Considerations:

  • The token mapping table is the critical asset -- it must be secured with the highest controls
  • Tokens should be random, not derived from the original data
  • One-to-one mapping preserves uniqueness for joins and analytics

2. Hashing

Apply a cryptographic hash function to identifying values to produce a fixed-length output.

How it works:

  • Original: john.smith@email.com
  • SHA-256 hash: e3b0c44298fc1c149afbf4c8996fb924...

Best for: Scenarios where you need consistent pseudonymization (same input always produces same output) without needing to reverse it

Considerations:

  • Hashing alone is vulnerable to rainbow table attacks, especially for low-entropy inputs like email addresses
  • Always use a secret salt or keyed hash (HMAC) to prevent reversal through brute force
  • Hashing is deterministic, which enables linking records across datasets -- this can be a benefit or a risk depending on context

3. Keyed Hashing (HMAC)

A more secure variant of hashing that incorporates a secret key.

How it works:

  • HMAC-SHA256 with a secret key produces a pseudonym that cannot be reversed without the key
  • Different keys produce different pseudonyms for the same input

Best for: Cross-dataset linkage where you control the key, research scenarios, analytics pipelines

Considerations:

  • The secret key must be managed with the same rigor as an encryption key
  • Rotating the key requires re-pseudonymizing all affected data

4. Format-Preserving Encryption (FPE)

Encrypt data while preserving the format and length of the original value.

How it works:

  • Original credit card: 4532-1234-5678-9012
  • FPE output: 8271-6543-2109-3847

Best for: Legacy systems that validate data formats (credit card numbers, phone numbers, postal codes)

Considerations:

  • Uses approved algorithms like FF1 or FF3-1 (NIST SP 800-38G)
  • Reversible with the encryption key
  • Preserves format constraints, which is valuable for system compatibility

5. Data Masking

Replace parts of a data value with placeholder characters while retaining some original information.

How it works:

  • Original email: john.smith@email.com becomes j***@email.com
  • Original phone: +44 7700 900123 becomes +44 7700 ***123

Best for: Display purposes, customer service screens, reports where partial identification is sufficient

Considerations:

  • Static masking permanently replaces data; dynamic masking applies at query time
  • Partial masking may not be sufficient pseudonymization if the remaining visible data allows re-identification
  • Dynamic masking requires database or application-level support

6. Generalization

Replace specific values with broader categories.

How it works:

  • Age 34 becomes age range 30-39
  • Postal code EC2A 4NE becomes EC2A
  • Date of birth 1991-03-15 becomes 1991

Best for: Analytics and statistical processing where exact values are not needed

Considerations:

  • Reduces data utility with each level of generalization
  • May not qualify as pseudonymization on its own if the generalized data is still identifying in context
  • Often used in combination with other techniques

Implementation Architecture

Separation of Mapping Data

The mapping between pseudonyms and original identifiers is the most sensitive component. Protect it with:

  • Storage in a separate, access-controlled system
  • Encryption at rest with customer-managed keys
  • Strict access controls (minimal number of authorized users)
  • Comprehensive audit logging of all access
  • Geographic separation from the pseudonymized data where practical

Pseudonymization Service Pattern

Build a centralized pseudonymization service that:

  • Accepts original identifiers and returns pseudonyms
  • Maintains the mapping securely
  • Supports reverse lookup for authorized re-identification
  • Enforces access policies based on the requester's role and purpose
  • Logs all pseudonymization and re-identification operations

Key Rotation and Re-Pseudonymization

For techniques that use keys (HMAC, FPE), establish a key rotation schedule:

  • Rotate keys annually or upon suspected compromise
  • Re-pseudonymize affected data with the new key
  • Securely destroy old keys after re-pseudonymization is complete

Choosing the Right Technique

ScenarioRecommended Technique
Database records with cross-table referencesTokenization
Analytics pipeline without need for re-identificationKeyed hashing (HMAC)
Legacy systems requiring format compatibilityFormat-preserving encryption
Customer service screensDynamic data masking
Statistical reportingGeneralization
Research datasetsCombination of generalization + keyed hashing

Pseudonymization and Data Residency

Pseudonymization becomes especially powerful when combined with data residency controls. By pseudonymizing personal data before it leaves a jurisdiction, you can process it in other regions while the re-identification mapping stays within the original jurisdiction.

GlobalDataShield's region-specific hosting complements pseudonymization strategies by ensuring that the sensitive mapping data -- the keys to re-identification -- remains within the geographic boundaries you define, while pseudonymized data can be used more flexibly for analytics and processing across your infrastructure.

Ready to Solve Data Residency?

Get started with GlobalDataShield - compliant document hosting, ready when you are.