How We Verify Polymarket Data Quality: Cross-Checking Against Ground Truth

Indexing Polygon blockchain data is not a simple task. Between dealing with chain reorganizations and various edge cases, we don’t guarantee 100% data quality.

But we do verify our data quality. Here’s how.

The Challenge: Why Polygon Indexing Is Hard

When indexing blockchain data from Polygon, several issues make perfect accuracy difficult:

Chain Reorganizations (Forks)

Polygon, like most blockchains, experiences occasional chain reorganizations. A block you indexed might suddenly become invalid as the chain reorganizes. Events appear, then disappear. Transaction ordering changes.

Edge Cases

The blockchain is messy:

  • Complex smart contract interactions
  • Failed transactions with partial state
  • Events that are hard to parse correctly
  • RPC node inconsistencies

Reality: With millions of events to process, some data will slip through the cracks.

Our Verification Approach

We verify our data quality by comparing against a second source of proposed truth: Polymarket’s own API.

Polymarket maintains their official data through their infrastructure. By comparing our indexed data against their API, we can detect discrepancies and verify our accuracy.

The Verification Method: PnL Comparison

One way to verify data is by comparing profit and loss calculations. If our indexed trades are correct, we should calculate the same PnL as Polymarket’s API for any given wallet.

Calculate PnL from Our Database

Here’s the SQL query we use to calculate a wallet’s positions and PnL from our indexed data:

WITH actions AS
(
    SELECT DISTINCT
        rmm.slug                              AS slug,
        rmm.outcome                           AS outcome,
        rmm.outcome_price                     AS outcome_price,
        pofv.side                             AS side,              -- 'B' or 'S'
        pofv.amount_token                     AS amount_token_raw,  -- micro (1e6)
        pofv.amount_usdc                      AS amount_usdc_raw,   -- micro (1e6)
        (pofv.amount_token / 1e6)             AS amount_token,
        (pofv.amount_usdc / 1e6)              AS amount_usdc,
        (pofv.amount_usdc / NULLIF(pofv.amount_token, 0)) AS price_usdc_per_token_raw,
        pofv.transaction_hash                 AS transaction_hash,
        pofv.log_index                        AS log_index
    FROM polymarket.polymarket_order_filled_v2 AS pofv
    INNER JOIN polymarket.raw_market_meta AS rmm
        ON pofv.asset = rmm.clob_token_id
    WHERE pofv.wallet = '0xc919a85c2e70855016553944eabeeb533729c851'
),

pos AS
(
    SELECT
        slug,
        outcome,
        outcome_price,

        -- Net position in tokens (positive = long)
        SUM(IF(side = 'B', amount_token, -amount_token)) AS net_tokens,

        -- Net cash in USDC (positive = received; negative = spent)
        SUM(IF(side = 'S', amount_usdc, -amount_usdc))   AS net_cash_usdc,

        -- Optional helpful stats
        sumIf(amount_token, side = 'B') AS bought_tokens,
        sumIf(amount_usdc,  side = 'B') AS spent_usdc,
        sumIf(amount_token, side = 'S') AS sold_tokens,
        sumIf(amount_usdc,  side = 'S') AS received_usdc,

        (spent_usdc    / NULLIF(bought_tokens, 0)) AS avg_buy_price,
        (received_usdc / NULLIF(sold_tokens,  0)) AS avg_sell_price
    FROM actions
    GROUP BY
        slug,
        outcome,
        outcome_price
)
SELECT
    p.slug,
    p.outcome,
    p.net_tokens,
    p.net_cash_usdc,
    p.avg_buy_price,
    p.avg_sell_price,

    -- Standard unrealized PnL identity:
    -- PnL = (position * mark_price) + net_cash
    (p.net_tokens * p.outcome_price + p.net_cash_usdc) AS pnl
FROM pos AS p
ORDER BY
    p.slug,
    p.outcome;

Understanding the PnL Formula

The PnL calculation uses the position accounting identity:

PnL = (position × current_price) + net_cash_flow

Example:

  • Buy 1,000 tokens at $0.60 → Spend $600
  • Sell 400 tokens at $0.70 → Receive $280
  • Current price: $0.75

Position:

  • Net tokens: 1,000 - 400 = 600 tokens
  • Net cash: -$600 + $280 = -$320

PnL:

PnL = (600 × 0.75) + (-320)
    = $450 - $320
    = $130 profit

The query calculates this by:

  • Summing buys (+) and sells (-) to get net position
  • Summing cash received (+) and spent (-) to get net cash flow
  • Multiplying position by current price and adding net cash

More details on our verification process coming soon.


About Onchain Divers: We index and analyze blockchain data for DeFi research. Visit us at onchaindivers.com or join our Discord.

Follow us: @parrotexplore on X.