Part of the MediaBridge series.

Why Cache S3 at All

S3 ListObjectsV2 is not free. Each call costs money, takes time, and returns at most 1,000 objects per page. A bucket with 10,000 files in a single prefix requires 10 paginated S3 calls just to render one folder. Do that on every page load and you burn money, slow the UI, and hit S3 rate limits under concurrent users.

Presigned URL generation is also not free. It is CPU work on the server. A folder with 50 files requires 50 presigned PUT or GET URL generations per load if nothing is cached.

MediaBridge uses PostgreSQL to cache both: file metadata in resource_cache, URLs in url_cache. A third table, folder_index, tracks the state of the DFS traversal engine used by search. These three tables work together.

resource_cache

resource_cache stores the result of S3 folder listings. Each row represents one entry in one folder - either a file or a subfolder:

bucket_id | prefix             | resource_name      | type   | uploaded_at | file_size
----------+--------------------+--------------------+--------+-------------+----------
abc123    | projects/marketing/| campaign.pdf       | file   | 2026-04-01  | 4194304
abc123    | projects/marketing/| assets/            | folder | 2026-04-01  | NULL

The browse endpoint checks this table before touching S3:

const cached = await sql`
  SELECT id, resource_name, type, uploaded_at, file_size
  FROM resource_cache
  WHERE bucket_id = ${id} AND prefix = ${fullPrefix}
    AND resource_name NOT LIKE 't-%'
  ORDER BY type DESC, uploaded_at DESC
  LIMIT ${limit} OFFSET ${offset}
`;

if (cached.length > 0) {
  return { items: cached, total: ..., fromCache: true };
}

On a cache hit, the S3 listing is skipped entirely. On a miss, the endpoint paginates S3 to completion, writes everything into resource_cache, and returns the results. The cache is populated once and reused until an upload, delete, or Lambda webhook evicts it.

Thumbnail entries (t- prefixed filenames) are filtered out at the query level. They exist in S3 but are invisible to users.

url_cache

url_cache stores the resolved access URLs for each S3 object. Each row represents one file, with its display URL (CloudFront or presigned GET) and its thumbnail URL:

bucket_id | s3_key                              | file_url | thumbnail_url | expires_at
----------+-------------------------------------+----------+---------------+-----------
abc123    | projects/marketing/campaign.pdf     | https:// | https://      | 2026-05-03

The batch URL fetch endpoint resolves up to 100 keys per request. It checks url_cache first, then generates URLs only for keys that missed:

const cached = await sql`
  SELECT s3_key, file_url, thumbnail_url
  FROM url_cache
  WHERE bucket_id = ${id}
    AND s3_key = ANY(${keys})
    AND (expires_at IS NULL OR expires_at > NOW())
`;

For CloudFront buckets, expires_at is null - the CloudFront URL never expires. For private buckets, the presigned GET URL is valid for 7 days, so expires_at is set accordingly. Expired rows are skipped and regenerated on the next request.

Cache misses are resolved in parallel:

const newEntries = await Promise.all(missedKeys.map(async (key) => {
  if (isPrivate) {
    const [fileUrl, thumbnailUrl] = await Promise.all([
      getSignedUrl(s3Client!, new GetObjectCommand({ ... }), { expiresIn: CONFIG.presignedGet.expiresInSeconds }),
      getSignedUrl(s3Client!, new GetObjectCommand({ ... }), { expiresIn: CONFIG.presignedGet.expiresInSeconds }),
    ]);
    return { s3_key: key, file_url: fileUrl, thumbnail_url: thumbnailUrl, expires_at: ... };
  } else {
    return { s3_key: key, file_url: `${bucket.cloudfront_base_url}/${key}`, thumbnail_url: `${bucket.cloudfront_base_url}/t-${filename}`, expires_at: null };
  }
}));

New entries are inserted with ON CONFLICT DO NOTHING to handle concurrent requests safely - if two requests race to generate the same URL, the second insert is silently discarded.

folder_index

folder_index is not used by the browser - it is the traversal state for the DFS search engine. Each row represents one prefix and tracks how much of its subtree has been indexed:

bucket_id | prefix             | is_listed | is_complete | pending_children | indexed_at
----------+--------------------+-----------+-------------+------------------+-----------
abc123    | projects/          | true      | false       | 2                | NULL
abc123    | projects/marketing/| true      | true         | 0               | 2026-04-01
abc123    | projects/assets/   | false     | false        | 0               | NULL

is_listed means the direct contents of this prefix have been read (either from S3 or from the browse cache). is_complete means the entire subtree - this prefix and all descendants - has been fully traversed and cached. pending_children is a reference counter tracking how many child folders have not yet completed.

When browsing writes to resource_cache, it also sets is_listed = true on folder_index for that prefix. This tells the search engine it can use resource_cache for that folder without hitting S3 again.

When the search engine finishes a folder and all its children, it marks is_complete = true. Future searches that encounter a complete prefix can return all matching results from resource_cache via a SQL LIKE query instead of traversing S3 at all.

Eviction

Three things evict cached data: uploads, deletes, and the Lambda webhook.

Upload eviction (on presign): The resource_cache for the upload prefix and all ancestor prefixes is deleted, the url_cache entry for the specific key is evicted, and the folder_index for the prefix and ancestors is marked incomplete:

for (const p of [...new Set(prefixesToEvict)]) {
  await sql`DELETE FROM resource_cache WHERE bucket_id = ${bucketId} AND prefix = ${p}`;
}
await evictUrlCache(bucketId, s3Key);
await markFolderIncomplete(bucketId, fullPath);

markFolderIncomplete walks the ancestor chain and sets is_listed = false, is_complete = false on each prefix in folder_index. The next browse or search will re-fetch from S3.

Delete eviction: Same pattern - resource_cache for the folder, url_cache for the key, folder_index ancestors marked incomplete.

Lambda webhook eviction: When a file changes in S3 via any means (rclone, CLI, direct AWS console), the Lambda fires a POST /cache/evict webhook. The backend resolves the bucket, evicts the resource_cache row for the affected prefix, evicts the url_cache entries for the key, and marks folder_index ancestors incomplete. This keeps the browser view consistent with the actual bucket state even when uploads happen outside MediaBridge.

Browsing Links folder_index to Search

When a user browses a folder for the first time, the backend calls ListObjectsV2 and populates resource_cache. It also writes to folder_index:

const subFolderCount = rows.filter(r => r.type === 'folder').length;
sql`
  INSERT INTO folder_index (bucket_id, prefix, is_listed, is_complete, pending_children)
  VALUES (${id}, ${fullPrefix}, true, false, ${subFolderCount})
  ON CONFLICT (bucket_id, prefix) DO UPDATE SET
    is_listed = true,
    pending_children = ${subFolderCount}
`.catch(() => {});

This is the link between browsing and searching. Every folder a user browses gets pre-indexed for free. When a search later reaches that folder, the traversal engine sees is_listed = true and skips the S3 call - it already has the data in resource_cache. Browsing progressively warms the search index without any extra work.

Next: Streaming DFS Search over WebSocket

Why Cache S3 at All#

resource_cache#

url_cache#

folder_index#

Eviction#

Browsing Links folder_index to Search#