What

We are designing a system where clients are routed to regional servers to reduce latency, while maintaining session consistency during multi-step API calls.

Key components:

Regional servers in different geographies
Central discovery server tracking server heartbeat and load
JWT tokens for authentication:
- Primary token expiry: 1–5 minutes
- Refresh token expiry: 7–30 days
Client-server locking: ensures multi-step requests stay on the same server
Load balancing and failover via Route 53 geo-proximity with health checks

Why

Simple geo-proximity DNS (Route 53) is insufficient for multi-step API workflows
Multi-step POST requests can fail if a client jumps servers due to geo routing
- This happens because even with active-active database replication, there’s latency in the replication process
- When clients hit different servers too frequently, there’s a high chance that a server might not have the latest data required
- This data inconsistency leads to request failures, as the server processing a request might be working with stale data
Need to lock clients to a server during critical operations
Need flexibility to load balance or move clients across regions safely when not performing critical tasks
Health checks in Route 53 ensure traffic isn’t routed to servers that are down

How

1. Central Discovery Server

Each regional server sends heartbeat with its unique ID, region code, and public URL
Optionally collect telemetry/load data from each server (either directly from regional servers or via a central telemetry system)
Discovery server maintains the active server list, their public URLs, and load information

2. DNS Setup

Each regional server gets its own URL
Central URL uses Route 53 geo-proximity with health checks to route clients to nearest healthy server

Client hits the central geo-proximity URL
Login request routed to nearest server
Server returns:
- JWT token
- Its own server URL → marks the client lock
All further requests from this client use the locked server URL

4. Server Discovery Sync

Regional servers periodically pull the active server list (with public URLs) + load information from discovery server (load data can originate from a central telemetry system or directly from servers)
Enables load balancing within regions and global awareness

5. Refresh Token API & Closest Server

Before sending a refresh token request, client calls /closestToMe on central geo URL
Returns closest server identifier
Payload includes:
- closestServerId
- criticalTaskInProgress (boolean)

6. Refresh Token Handling

If criticalTaskInProgress = true:
- Do not switch servers
- Refresh token and maintain lock with current server
If criticalTaskInProgress = false:
- Check closest server and region:
  - Same region → pick server with lowest load, update token with new server URL
  - Different region → switch client to that server and update token
- Ensures safe cross-region movement while maintaining active tasks

7. Load Balancing

Regional servers use closest server identifier + server load to redistribute clients
Maintains even load distribution while keeping active sessions safe

8. Frontend Considerations

Detect if primary server URL changed in token response
Show user-friendly message:
- “Your primary server has changed. Any missing data will be synced within 5 minutes.”
Ensures users are aware but do not panic over temporary replication delays

9. Handling Server Failures

If a server goes down, client will receive a 500-series error
Client should wait 30 seconds with a timer: “Reconnecting…”
During this time, discovery server confirms the server stopped sending heartbeats and updates its registry of available servers and their public URLs
After the wait, attempt a refresh token request again
Request will now hit the closest healthy server
Refresh token response will include the new server URL

Thoughts / Caveats

Client lock is critical for multi-step operations
Discovery server is the single source of truth for server status, public URLs, and load (whether collected directly or via a central telemetry system)
Token expiry strategy (short-lived JWT, long-lived refresh token) balances security vs availability
Cross-region movement and load balancing happen only when safe (no critical tasks)
Frontend intelligence improves user experience during server switches
Route 53 health checks ensure no traffic is sent to unhealthy servers
Automatic refresh/reconnect handles server failures without breaking client workflows

What#

Why#

How#

1. Central Discovery Server#

2. DNS Setup#

3. Client Login & Locking#

4. Server Discovery Sync#

5. Refresh Token API & Closest Server#

6. Refresh Token Handling#

7. Load Balancing#

8. Frontend Considerations#

9. Handling Server Failures#

Thoughts / Caveats#

What

Why

How

1. Central Discovery Server

2. DNS Setup

3. Client Login & Locking

4. Server Discovery Sync

5. Refresh Token API & Closest Server

6. Refresh Token Handling

7. Load Balancing

8. Frontend Considerations

9. Handling Server Failures

Thoughts / Caveats