Part of the Sanchayam series.

The Problem

When an Indian equity undergoes a 2-for-1 stock split, a holding that showed 100 shares should now show 200 shares at half the price. The historical lot prices need to be adjusted retroactively - otherwise the cost basis and XIRR calculations are wrong.

Similarly, when a company changes its name or trading symbol (Infosys was previously INFOSYSTCH, Hindustan Unilever was Hindustan Lever), historical price data retrieved by the current symbol will not cover periods before the name change.

Sanchayam resolves both problems through a backfill pipeline that runs when a new asset is added.

Three-Stage Queue

Each asset goes through three stages in sequence:

  1. symbols - resolve corporate actions (splits, bonuses, name changes, mergers) and build the asset alias table
  2. splits - fetch split and bonus events from Yahoo Finance and store them in corporate_actions
  3. prices - fetch full weekly price history from Yahoo Finance (or mfapi.in for mutual funds) back to the earliest lot date
CREATE TABLE backfill_queue (
  id           UUID PRIMARY KEY,
  asset_id     UUID NOT NULL REFERENCES assets(id),
  stage        VARCHAR NOT NULL,  -- 'symbols' | 'splits' | 'prices'
  status       VARCHAR NOT NULL DEFAULT 'pending',
  retry_count  INT NOT NULL DEFAULT 0,
  priority     INT NOT NULL DEFAULT 2,
  ...
);

Stages run in order. A worker picks the next pending item from the queue, runs the stage, marks it done, and moves on. The same FOR UPDATE SKIP LOCKED pattern used by the price feed worker prevents duplicate processing.

Rate limits between stages are enforced via collector_call_log. NSE API calls get a 5-second gap. Yahoo Finance gets 2 seconds.

Stage 1: Splits and Bonuses from Yahoo Finance

Yahoo Finance’s chart API returns adjustment factors for historical splits and bonuses. For each split event, a row is inserted into corporate_actions:

CREATE TABLE corporate_actions (
  asset_id      UUID NOT NULL REFERENCES assets(id),
  action_type   VARCHAR NOT NULL,  -- 'split' | 'bonus' | 'merger'
  action_date   DATE NOT NULL,
  ratio_from    NUMERIC,
  ratio_to      NUMERIC,
  notes         TEXT,
  CONSTRAINT corporate_actions_unique UNIQUE (asset_id, action_type, action_date, ratio_from, ratio_to)
);

A 2-for-1 split is stored as ratio_from = 1, ratio_to = 2. A 1-for-3 bonus (1 free share for every 3 held) is stored as ratio_from = 3, ratio_to = 4. The unique constraint prevents duplicates from repeated backfill runs.

Stage 2: NSE Announcements via DeepSeek

NSE publishes corporate announcements as free-text: “The name of the Company has been changed from XYZ Limited to ABC Limited w.e.f. 15-Jun-2021.” Yahoo Finance does not surface name changes or mergers consistently. Sanchayam fetches the announcement feed directly from the NSE API and uses DeepSeek to extract structured events.

The NSE API requires specific headers to avoid 403 responses:

const NSE_HEADERS = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Referer': 'https://www.nseindia.com/',
  'Accept': 'application/json',
}

The announcement feed returns hundreds of entries per company. Most are irrelevant - quarterly results, board meetings, dividend announcements. DeepSeek filters for the ones that matter and extracts:

  • Name changes: old name, new name, effective date
  • Symbol changes: old symbol, new symbol, effective date
  • Mergers: merger date, counterparty (if determinable)

The extracted data feeds the asset_aliases table:

CREATE TABLE asset_aliases (
  asset_id   UUID NOT NULL REFERENCES assets(id),
  symbol     VARCHAR NOT NULL,
  name       VARCHAR NOT NULL,
  from_date  DATE NOT NULL,
  to_date    DATE,
  CONSTRAINT asset_aliases_unique UNIQUE (asset_id, symbol, from_date)
);

An alias row covers the period when the asset traded under that symbol and name. to_date = NULL means it is the current name. When fetching historical prices, the backfill worker checks the alias table to know which symbol to use for each date range.

Pre-2003 Gap

NSE’s announcement feed does not cover events before approximately 2003. If a holding has lots dated before 2003, the alias history may be incomplete. When this is detected, a CORPORATE_ACTION_PRE2003_GAP notification is emitted:

if (asset.earliest_date < '2003-01-01') {
  result.pre2003_gap = true
  emit('CORPORATE_ACTION_PRE2003_GAP', {
    symbol: asset.symbol,
    earliestDate: asset.earliest_date,
  })
}

The admin receives an email flagging that the historical record for this asset may be incomplete before 2003 and manual verification may be needed.

Stage 3: Historical Price Backfill

Once corporate actions and aliases are resolved, the prices stage fetches weekly closing prices from the earliest lot date to the present. For Indian equities and US equities, Yahoo Finance’s chart API is used with the alias-corrected symbol for each date range. For Indian mutual funds, mfapi.in provides the full NAV history.

Prices are stored in asset_price_history as one row per date. The backfill worker writes one week’s data at a time with a delay between calls to respect rate limits.

Triggering Snapshot Backfill

When the prices stage for an asset completes, the worker checks whether any other prices stages for the same asset are still pending:

if (item.stage === 'prices') {
  const [remaining] = await sql`
    SELECT COUNT(*)::int AS cnt FROM backfill_queue
    WHERE asset_id = ${item.asset_id} AND stage = 'prices'
      AND status IN ('pending', 'in_progress')
  `
  if (remaining.cnt === 0) {
    runHistoricalSnapshotBackfill().catch(...)
  }
}

When prices are fully loaded for an asset, the historical snapshot backfill is triggered. The Saturday snapshot backfill now has data to work with for the weeks before this asset was added to the system, and it will generate historical portfolio snapshots covering this asset’s full history.


Next: Deploying Sanchayam