What

We are designing a system where clients are routed to regional servers to reduce latency, while maintaining session consistency during multi-step API calls.

Key components:

  • Regional servers in different geographies
  • Central discovery server tracking server heartbeat and load
  • JWT tokens for authentication:
    • Primary token expiry: 1–5 minutes
    • Refresh token expiry: 7–30 days
  • Client-server locking: ensures multi-step requests stay on the same server
  • Load balancing and failover via Route 53 geo-proximity with health checks

Why

  • Simple geo-proximity DNS (Route 53) is insufficient for multi-step API workflows
  • Multi-step POST requests can fail if a client jumps servers due to geo routing
    • This happens because even with active-active database replication, there’s latency in the replication process
    • When clients hit different servers too frequently, there’s a high chance that a server might not have the latest data required
    • This data inconsistency leads to request failures, as the server processing a request might be working with stale data
  • Need to lock clients to a server during critical operations
  • Need flexibility to load balance or move clients across regions safely when not performing critical tasks
  • Health checks in Route 53 ensure traffic isn’t routed to servers that are down

How

1. Central Discovery Server

  • Each regional server sends heartbeat with its unique ID, region code, and public URL
  • Optionally collect telemetry/load data from each server (either directly from regional servers or via a central telemetry system)
  • Discovery server maintains the active server list, their public URLs, and load information

2. DNS Setup

  • Each regional server gets its own URL
  • Central URL uses Route 53 geo-proximity with health checks to route clients to nearest healthy server

3. Client Login & Locking

  • Client hits the central geo-proximity URL
  • Login request routed to nearest server
  • Server returns:
    • JWT token
    • Its own server URL → marks the client lock
  • All further requests from this client use the locked server URL

4. Server Discovery Sync

  • Regional servers periodically pull the active server list (with public URLs) + load information from discovery server (load data can originate from a central telemetry system or directly from servers)
  • Enables load balancing within regions and global awareness

5. Refresh Token API & Closest Server

  • Before sending a refresh token request, client calls /closestToMe on central geo URL
  • Returns closest server identifier
  • Payload includes:
    • closestServerId
    • criticalTaskInProgress (boolean)

6. Refresh Token Handling

  • If criticalTaskInProgress = true:
    • Do not switch servers
    • Refresh token and maintain lock with current server
  • If criticalTaskInProgress = false:
    • Check closest server and region:
      • Same region → pick server with lowest load, update token with new server URL
      • Different region → switch client to that server and update token
    • Ensures safe cross-region movement while maintaining active tasks

7. Load Balancing

  • Regional servers use closest server identifier + server load to redistribute clients
  • Maintains even load distribution while keeping active sessions safe

8. Frontend Considerations

  • Detect if primary server URL changed in token response
  • Show user-friendly message:
    • “Your primary server has changed. Any missing data will be synced within 5 minutes.”
  • Ensures users are aware but do not panic over temporary replication delays

9. Handling Server Failures

  • If a server goes down, client will receive a 500-series error
  • Client should wait 30 seconds with a timer: “Reconnecting…”
  • During this time, discovery server confirms the server stopped sending heartbeats and updates its registry of available servers and their public URLs
  • After the wait, attempt a refresh token request again
  • Request will now hit the closest healthy server
  • Refresh token response will include the new server URL

Thoughts / Caveats

  • Client lock is critical for multi-step operations
  • Discovery server is the single source of truth for server status, public URLs, and load (whether collected directly or via a central telemetry system)
  • Token expiry strategy (short-lived JWT, long-lived refresh token) balances security vs availability
  • Cross-region movement and load balancing happen only when safe (no critical tasks)
  • Frontend intelligence improves user experience during server switches
  • Route 53 health checks ensure no traffic is sent to unhealthy servers
  • Automatic refresh/reconnect handles server failures without breaking client workflows