← Back to Resources
DatabaseShardingData Residency

Using Database Sharding to Enforce Data Residency

How to use database sharding strategies to keep personal data within required geographic boundaries.

GlobalDataShield Team||6 min read

What Is Geographic Database Sharding?

Database sharding is the practice of splitting a large database into smaller, independent pieces called shards. Geographic sharding takes this further by partitioning data based on the geographic region it belongs to, with each shard deployed in the corresponding jurisdiction.

This approach enforces data residency at the database level: EU customer data lives in an EU shard, Australian customer data lives in an Australian shard, and so on. The database architecture itself becomes a compliance control.

Why Sharding for Residency?

Advantages Over Application-Level Controls

ApproachEnforcement LevelRisk of Violation
Application logic onlyApplication codeHigh (bugs, misconfigurations)
Network policiesInfrastructureMedium (can be overridden)
Database shardingData storageLow (data physically cannot leave the shard)
Combined approachMultiple layersLowest

When data residency is enforced at the database level, personal data physically resides in the correct jurisdiction. No application bug can accidentally serve German customer data from a US server because the data simply is not there.

Sharding Strategies for Residency

Strategy 1: Region-Based Hash Sharding

Assign each record to a shard based on the data subject's region:

Shard key: Country code or region identifier of the data subject

Shard map:

  • Shard EU-1 (Frankfurt): Records where region = DE, AT, CH
  • Shard EU-2 (Dublin): Records where region = IE, GB, NL, BE
  • Shard EU-3 (Paris): Records where region = FR, ES, IT, PT
  • Shard US-1 (Virginia): Records where region = US
  • Shard APAC-1 (Sydney): Records where region = AU, NZ

Best for: Organizations with clear geographic segmentation of customers or users.

Strategy 2: Jurisdiction-Based Sharding

Shard based on the legal jurisdiction that applies, rather than physical location:

Shard key: Applicable jurisdiction

Shard map:

  • Shard GDPR (eu-central-1): All data subject to GDPR regardless of specific country
  • Shard LGPD (sa-east-1): All data subject to Brazil's LGPD
  • Shard CCPA (us-west-2): All data subject to California's CCPA
  • Shard DEFAULT (us-east-1): All other data

Best for: Organizations that need to comply with specific regulatory frameworks rather than country-level residency laws.

Strategy 3: Tenant-Based Sharding

For multi-tenant applications, shard by tenant with each tenant assigned to a region:

Shard key: Tenant ID

Tenant-to-shard mapping:

  • Tenant A (German enterprise) -> Shard EU-Frankfurt
  • Tenant B (US startup) -> Shard US-Virginia
  • Tenant C (Australian company) -> Shard APAC-Sydney

Best for: B2B SaaS applications where each customer has specific residency requirements.

Implementation Guide

Step 1: Define Your Shard Topology

Map your residency requirements to physical database deployments:

  • List all jurisdictions where data must remain
  • Identify available database regions for your chosen database technology
  • Design the shard map linking jurisdictions to database instances
  • Plan for future jurisdictions (how will you add a new shard?)

Step 2: Choose Your Sharding Technology

Different databases offer different sharding capabilities:

DatabaseNative ShardingGeographic Control
PostgreSQL (Citus)YesManual shard placement by node location
MongoDBYes (zone sharding)Tag-based zone assignment to specific regions
CockroachDBYesPartition-by-region with regional survival goals
YugabyteDBYesTablespace-level geographic placement
MySQL (Vitess)YesShard-to-region mapping via topology
Cloud SpannerYesInstance-level regional configuration

Step 3: Implement the Shard Router

Build a routing layer that directs queries to the correct shard:

Key components:

  • Shard resolver: Determines which shard holds the requested data based on the shard key
  • Connection pool: Maintains connections to all shard instances
  • Query router: Sends queries to the appropriate shard
  • Cross-shard query handler: Manages queries that span multiple shards (with careful attention to residency implications)

Important: Cross-shard queries that combine personal data from multiple jurisdictions may create compliance issues. Limit cross-shard operations to non-personal data or aggregated/anonymized results.

Step 4: Handle Data Migration Between Shards

When a data subject's residency status changes (for example, a customer moves from Germany to Australia), you need a migration process:

  • Identify all records belonging to the data subject in the source shard
  • Transfer records to the destination shard
  • Verify completeness of the transfer
  • Delete records from the source shard
  • Update the shard routing map
  • Log the migration for audit purposes

Step 5: Manage Global Reference Data

Some data is needed across all regions but is not personal data:

  • Product catalogs
  • Configuration settings
  • Currency and exchange rates
  • Generic templates

Replicate this non-personal reference data to all shards without residency concerns. Keep a clear separation between personal data (shard-local) and reference data (globally replicated).

Step 6: Address Backup and Replication

Ensure that shard-level backups and replication respect residency:

  • Configure backups for each shard within the same region
  • If using read replicas, place them within the same jurisdiction as the primary shard
  • Disable cross-region replication for personal data shards
  • Test backup restoration to confirm data stays in the correct region

Operational Considerations

Monitoring

Monitor each shard independently and as a fleet:

  • Shard-level performance metrics (latency, throughput, storage)
  • Data distribution across shards (balance)
  • Cross-shard query frequency and performance
  • Replication lag within each shard

Scaling

Geographic shards may have uneven load:

  • The EU shard may handle 60% of total traffic
  • The APAC shard may handle 5%
  • Scale each shard independently based on its workload
  • Do not balance load by routing data to less-busy shards in other regions

Schema Changes

Rolling out schema changes across geographic shards requires coordination:

  • Plan for per-shard migrations that may execute at different times
  • Ensure application compatibility during the migration window
  • Test migrations against each shard's data characteristics

Common Sharding Mistakes

  • Choosing the wrong shard key: A shard key that does not cleanly map to jurisdictions creates data that cannot be cleanly assigned to a region
  • Cross-shard joins on personal data: Joining data across shards to produce reports that combine personal data from multiple jurisdictions
  • Ignoring the routing layer: A routing misconfiguration can send data to the wrong shard, violating residency
  • Uneven shard distribution: All data ending up in one shard defeats the purpose and creates performance issues
  • Neglecting shard-level backups: Backing up all shards to a central location undermines geographic isolation

When Sharding Is Not the Right Answer

Geographic sharding is powerful but adds significant complexity. It may not be the right approach if:

  • Your data volumes are small and a single-region database suffices
  • You only need to comply with one jurisdiction's residency requirements
  • Your application does not require the performance benefits of sharding
  • Your team lacks the operational expertise to manage a sharded database

For document hosting specifically, using a purpose-built platform like GlobalDataShield can be simpler than implementing geographic sharding yourself. GlobalDataShield handles the geographic data placement, replication controls, and residency enforcement at the infrastructure level, letting you focus on your application logic rather than database topology.

Ready to Solve Data Residency?

Get started with GlobalDataShield - compliant document hosting, ready when you are.