What
We are designing a system where clients are routed to regional servers to reduce latency, while maintaining session consistency during multi-step API calls.
Key components:
- Regional servers in different geographies
- Central discovery server tracking server heartbeat and load
- JWT tokens for authentication:
- Primary token expiry: 1–5 minutes
- Refresh token expiry: 7–30 days
- Client-server locking: ensures multi-step requests stay on the same server
- Load balancing and failover via Route 53 geo-proximity with health checks
Why
- Simple geo-proximity DNS (Route 53) is insufficient for multi-step API workflows
- Multi-step POST requests can fail if a client jumps servers due to geo routing
- This happens because even with active-active database replication, there’s latency in the replication process
- When clients hit different servers too frequently, there’s a high chance that a server might not have the latest data required
- This data inconsistency leads to request failures, as the server processing a request might be working with stale data
- Need to lock clients to a server during critical operations
- Need flexibility to load balance or move clients across regions safely when not performing critical tasks
- Health checks in Route 53 ensure traffic isn’t routed to servers that are down
How
1. Central Discovery Server
- Each regional server sends heartbeat with its unique ID, region code, and public URL
- Optionally collect telemetry/load data from each server (either directly from regional servers or via a central telemetry system)
- Discovery server maintains the active server list, their public URLs, and load information
2. DNS Setup
- Each regional server gets its own URL
- Central URL uses Route 53 geo-proximity with health checks to route clients to nearest healthy server
3. Client Login & Locking
- Client hits the central geo-proximity URL
- Login request routed to nearest server
- Server returns:
- JWT token
- Its own server URL → marks the client lock
- All further requests from this client use the locked server URL
4. Server Discovery Sync
- Regional servers periodically pull the active server list (with public URLs) + load information from discovery server (load data can originate from a central telemetry system or directly from servers)
- Enables load balancing within regions and global awareness
5. Refresh Token API & Closest Server
- Before sending a refresh token request, client calls
/closestToMeon central geo URL - Returns closest server identifier
- Payload includes:
closestServerIdcriticalTaskInProgress(boolean)
6. Refresh Token Handling
- If
criticalTaskInProgress = true:- Do not switch servers
- Refresh token and maintain lock with current server
- If
criticalTaskInProgress = false:- Check closest server and region:
- Same region → pick server with lowest load, update token with new server URL
- Different region → switch client to that server and update token
- Ensures safe cross-region movement while maintaining active tasks
- Check closest server and region:
7. Load Balancing
- Regional servers use closest server identifier + server load to redistribute clients
- Maintains even load distribution while keeping active sessions safe
8. Frontend Considerations
- Detect if primary server URL changed in token response
- Show user-friendly message:
- “Your primary server has changed. Any missing data will be synced within 5 minutes.”
- Ensures users are aware but do not panic over temporary replication delays
9. Handling Server Failures
- If a server goes down, client will receive a 500-series error
- Client should wait 30 seconds with a timer: “Reconnecting…”
- During this time, discovery server confirms the server stopped sending heartbeats and updates its registry of available servers and their public URLs
- After the wait, attempt a refresh token request again
- Request will now hit the closest healthy server
- Refresh token response will include the new server URL
Thoughts / Caveats
- Client lock is critical for multi-step operations
- Discovery server is the single source of truth for server status, public URLs, and load (whether collected directly or via a central telemetry system)
- Token expiry strategy (short-lived JWT, long-lived refresh token) balances security vs availability
- Cross-region movement and load balancing happen only when safe (no critical tasks)
- Frontend intelligence improves user experience during server switches
- Route 53 health checks ensure no traffic is sent to unhealthy servers
- Automatic refresh/reconnect handles server failures without breaking client workflows