# ClearTrace — Open Dataset

Neutral, reproducible measurement of DEX **execution quality**, **revert rates**,
and **quote accuracy** — sliced by aggregator *and* by the **originating frontend**
that actually sent the trade. The attribution slice is the part no conflicted
incumbent can publish about itself.

Produced by [ClearTrace](https://cleartracedata.com) (a Rantum asset). Regenerate
with `python scripts/export_dataset.py`.

## Files

| File | What | Source / volume |
|---|---|---|
| `execution_by_frontend.csv` | Median realized slippage (bps) by **originating frontend**, with an execution-quality score (100 − toxicity). | On-chain (`dex.trades` + oracle prices), ≥300 trades/cell. |
| `revert_rates.csv` | Router **revert rate** (%) per project, over *all* routing txs. | On-chain, tens of thousands of txs/cell. |
| `execution_slippage.csv` | Median realized slippage by aggregator project. | On-chain, thousands of trades/cell. |
| `leaderboard.json` | The joined board: on-chain execution + revert + quote accuracy per aggregator, plus the by-frontend execution view. | Join of the above + the quote sample. |

## Methodology (in one screen)

**Each metric comes from where it is actually credible.**

- **Execution slippage** — per fill, the absolute relative gap between the
  minute-level oracle USD value of the bought leg vs the sold leg, in bps; we
  report the per-cohort **median** (robust to outliers). Computed over *all*
  on-chain trades, not a sample.

- **Revert rate** — reverted routing txs ÷ total routing txs, counted from raw
  transactions (a reverted tx produces no trade row), deduped so a shared router
  isn't double-counted across projects. Real production failure rate.

- **Quote accuracy** — the only *sampled* metric. We capture live aggregator
  quotes for a small basket (major pairs × size cohorts), then **fork-simulate
  the aggregator's own swap calldata** against current chain state and read the
  realized output. `quote_gap = (quoted − realized) / quoted`. This is a
  forward-captured **characterization sample** (each costs compute), flagged
  `preliminary` until enough samples accrue — never presented as a precise rating
  from a handful of points. Trades routed through RFQ/PMM market-maker legs can't
  be fork-simulated (they need off-chain signed orders) and are excluded, not
  counted as failures.

**Originating-frontend attribution.** Each swap tx is mapped to one originating
frontend via four vectors — calldata-suffix fingerprint, fee-recipient skim,
aggregator-router match, and router naming — with a confidence tier
(suffix = high, router/fee = medium, inferred = low). A raw pool swap with no UI
signal is **`unattributed`**, never force-labeled. Under-claiming beats
over-claiming: that's the neutrality the whole asset rests on.

## Honest limitations

- Frontend attribution is **Ethereum-first**; other chains currently fall to
  `unattributed`.
- Quote-accuracy cells are **preliminary** until sample counts grow.
- `confidence` columns carry the attribution strength — filter to `high`/`medium`
  for the strictest view.

## License

Data: CC0 / public domain. Use it, cite ClearTrace.
