Part of the MediaBridge series.
Why Not a Normal HTTP Search
S3 has no search API. To find a file, you have to list folders. A bucket with a deep directory tree could require hundreds of ListObjectsV2 calls to walk completely, each taking 100-500ms. A synchronous HTTP endpoint would either time out or make the user wait minutes for a response.
The alternative is to stream results as they arrive. Start the traversal, send each matching file to the browser the moment it is found, let the user see results accumulating in real time. WebSocket is the natural transport: a single persistent connection that the server writes to continuously.
Connection and Auth
The search endpoint is a WebSocket at GET /buckets/:id/search. HTTP headers are not a reliable place for auth in WebSocket - the spec does not support custom headers in the browser’s native WebSocket API.
Instead, auth happens via the first message the client sends after the connection opens:
ws.onopen = () => {
ws.send(JSON.stringify({ token, q, prefix, scope }));
};
The server’s once('message') handler reads this first message, verifies the JWT, checks the force_logout table, validates the query (minimum 3 characters), and confirms the user has access to the bucket - all before starting any traversal. If any check fails, the socket closes immediately.
The AbortController is wired to the socket’s close event. When the socket closes - whether from a normal completion, a browser tab closed, or a network drop - the signal fires and the traversal engine stops processing new folders.
Message Protocol
The server sends a stream of typed JSON messages:
type WsMessage =
| { type: 'traversing'; folder: string }
| { type: 'cache_hit'; folder: string; count: number }
| { type: 'result'; item: SearchResultItem }
| { type: 'folder_done'; folder: string }
| { type: 'complete' }
| { type: 'error'; code: string; message: string };
traversing signals that a new folder is being entered (useful for showing “Searching projects/marketing/…” in the UI). cache_hit signals that a folder was served from the PostgreSQL cache without an S3 call. result delivers a matching file or folder. complete signals the traversal is finished.
The Engine: LIFO Stack, 10 Workers
The traversal engine is a TraversalEngine class that manages a shared stack of prefixes to process and a pool of concurrent workers.
class TraversalEngine {
private stack: string[] = []; // LIFO
private activeWorkers = 0;
async run(): Promise<void> {
this.stack.push(this.opts.startPrefix);
await new Promise<void>((resolve) => {
const spawnWorkers = () => {
while (this.activeWorkers < MAX_WORKERS && this.stack.length > 0) {
const prefix = this.stack.pop()!;
this.activeWorkers++;
this.processFolder(prefix)
.catch(() => {})
.finally(() => {
this.activeWorkers--;
spawnWorkers();
tryResolve();
});
}
tryResolve();
};
spawnWorkers();
});
}
}
MAX_WORKERS is 10. Workers run concurrently - up to 10 folders can be traversed simultaneously. When a worker finishes a folder, it calls spawnWorkers() which immediately picks up the next prefix from the stack. The engine resolves when the stack is empty and no workers are active.
The stack is LIFO (last-in, first-out). Sub-folders discovered during traversal are pushed in reverse order so the first discovered sub-folder stays at the top of the stack. This produces a depth-first traversal: the engine dives deep into one branch before exploring siblings. For file systems that mirror how people organize work, this tends to surface relevant results faster than breadth-first - you are more likely to find a file near where you started looking.
Workers do not block waiting for their children. A worker lists a folder, pushes the sub-folders onto the shared stack, and exits. The children are picked up by the pool as capacity becomes available. No worker is ever waiting for a subtree to complete - it just hands off and moves on.
Three States per Folder
Each folder goes through one of three paths in processFolder:
State 1: is_complete in folder_index. The entire subtree under this prefix has been fully traversed and cached. The engine queries resource_cache directly with a LIKE prefix match:
if (indexEntry?.is_complete) {
const results = await sql`
SELECT resource_name, prefix, type, file_size, uploaded_at
FROM resource_cache
WHERE bucket_id = ${bucketId}
AND prefix LIKE ${prefix + '%'}
AND resource_name ILIKE ${'%' + q + '%'}
AND resource_name NOT LIKE 't-%'
`;
this.opts.send({ type: 'cache_hit', folder: prefix, count: results.length });
for (const row of results) { this.opts.send({ type: 'result', item: ... }); }
await this.cascadeComplete(prefix);
return;
}
No S3 call. No sub-folder traversal. A single SQL query returns everything matching the search term in the entire subtree. The engine then cascades the completion upward.
State 2: is_listed in folder_index. The direct contents of this prefix are in resource_cache (populated by a previous browse), but the subtree is not fully indexed. The engine reads the cached rows, emits matching results, discovers sub-folders, and pushes them onto the stack - all without hitting S3:
if (indexEntry?.is_listed) {
const cachedRows = await sql`
SELECT resource_name, type, file_size, uploaded_at
FROM resource_cache
WHERE bucket_id = ${bucketId} AND prefix = ${prefix}
`;
// ... emit results, push sub-folders
}
State 3: No cache. The engine calls ListObjectsV2, streams results as they arrive, writes each entry to resource_cache as a fire-and-forget insert, and pushes sub-folders onto the stack.
Reference Counting and Completion Cascade
The is_complete flag does not get set immediately when a folder finishes. A folder is complete only when every folder in its subtree is also complete. This is tracked with pending_children.
When a folder is processed, pending_children is set to the number of sub-folders it contains:
await sql`
INSERT INTO folder_index (bucket_id, prefix, is_listed, is_complete, pending_children)
VALUES (${bucketId}, ${prefix}, true, false, ${subFolders.length})
ON CONFLICT (bucket_id, prefix) DO UPDATE SET
pending_children = ${subFolders.length}
`;
When a child folder completes, it decrements its parent’s counter:
private async cascadeToParent(prefix: string): Promise<void> {
const parent = getParentPrefix(prefix);
if (parent === null) return;
const updated = await sql`
UPDATE folder_index
SET pending_children = GREATEST(pending_children - 1, 0)
WHERE bucket_id = ${this.opts.bucketId} AND prefix = ${parent}
RETURNING pending_children
`;
if (updated[0]?.pending_children === 0) {
await this.markComplete(parent);
}
}
When pending_children reaches zero, markComplete is called on the parent. markComplete sets is_complete = true and cascades upward to the grandparent. This bubbles all the way up to the root prefix - when the root’s counter hits zero, the entire bucket is marked complete.
A leaf folder (no sub-folders) is marked complete immediately. Its parent’s counter drops by one. If the parent was waiting on only that one child, the parent completes too. The cascade propagates upward until it hits a folder that still has pending children.
Cache-hit folders (state 1) also participate in the cascade via cascadeComplete. Even though they did not traverse S3, they still decrement their parent’s counter, which is correct: their subtree was already complete before this search started.
Eviction Resets the Index
When a file is uploaded or deleted, markFolderIncomplete walks the ancestor chain and resets is_listed = false, is_complete = false on every affected prefix. The next search re-traverses those folders. The completeness cascade only fires again once all children of a given folder complete again.
This keeps the search index consistent with actual bucket contents without requiring a full re-index. Only the affected branch of the tree is invalidated.