Build a Competitor Ad Database: Schema, Pipeline, and Queries
A screenshots folder forgets what your competitors ran. Four tables and a nightly cron remember it, and answer in SQL.
Sections
Most competitor research dies in a screenshots folder. You spot a strong hook, grab it, paste it into a doc, and three months later nobody can tell you whether that brand doubled its video output or quietly killed its UGC angle. A competitor ad database fixes this. Pull ads through an API on a schedule, land them in tables you own, and every creative question becomes a SQL query instead of an afternoon of scrolling.
TL;DR: Build a competitor ad database with four tables: advertisers, ads, snapshots, and enrichments. Feed it from the AdLibrary API on a nightly cron, dedup on
ad_key, and snapshot engagement daily. SQLite is fine to start, DuckDB wins for analysis, Postgres wins for teams. The eight queries below cover creative velocity, format mix, kill rate, hook reuse, and platform expansion.
You do not need a data engineering background for any of this. If you can write a SELECT, you can run the whole stack. The schema fits on one screen, the pipeline is sixty lines of Python, and the payoff compounds every single night the cron fires.
Why a competitor ad database beats a folder of screenshots
Three arguments, in descending order of how often people ignore them.
History outlives platform retention. Public ad libraries are windows, not archives. Meta's Ad Library API keeps ads about social issues, elections, and politics for seven years, but ordinary commercial ads drop out of public view once they stop running (EU and UK ads stay queryable for roughly a year). Google's Ads Transparency Center behaves the same way: a useful lens on what is in-market right now, a poor record of what ran last quarter. The moment a competitor kills a campaign, the public evidence starts evaporating. Your database keeps it. We covered the broader version of this problem in why your historical ad data sits unused, and the same logic applies to competitor data with extra force, because you cannot re-export what a platform no longer shows. Manual export workflows hit the same wall.
Trends need time series. A single pull tells you what is live today. The questions that actually change your media plan are deltas. Did their launch rate jump after the funding round? Is video share rising quarter over quarter? Is that carousel concept gaining impressions or bleeding out? None of those are answerable from one snapshot. They require the same ads observed repeatedly, with each observation stored.
Owned data joins. Competitor launch dates sitting next to your own CPM and CTR history is where competitive intelligence stops being a mood board and starts being analysis. A screenshot cannot join. A row can.
The feed: what the API gives you to store
Meta's free Ad Library API is the original transparency feed and deserves credit for existing at all. For a Meta-only, politics-focused question it works. The catch for a database builder is coverage: outside the EU and UK it returns political and social-issue ads, and it covers exactly one company's platforms. Meta's free API is fine for one platform. The moment you want TikTok, YouTube, or LinkedIn creatives in the same table, you need a different feed.
The AdLibrary API is that feed: one adl_ key, commercial ads across Facebook, Instagram, TikTok, YouTube, Google, LinkedIn, Twitter, Pinterest and more, returned as flat JSON through one search endpoint. No app review, no token that expires every 60 days. Each result carries the fields a schema actually wants:
ad_key— the stable creative identifier across calls. This is your primary key and your dedup key.advertiser_name,platform,ads_type(1 image, 2 video, 3 carousel, 4 collection)- Ad creative and copy:
title,body,button_text,preview_img_url,video_url,landing_page_url - Performance signals:
impression,heat(a 0–1000 momentum score),like_count,share_count,comment_count - Time and place:
first_seen,last_seen(Unix timestamps),days_count(runtime),geo
Two honesty notes before you design columns around these. Impressions come back as bucketed ranges on Meta, not exact counts, so treat them as bands. And spend figures are always estimates, never advertiser-reported numbers. Store both, label both accordingly.
A first call looks like this:
curl -X POST "https://adlibrary.com/api/search" \
-H "Authorization: Bearer adl_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"keyword": "protein powder",
"appType": "3",
"platform": ["facebook", "instagram", "tiktok"],
"geo": ["USA"],
"daysBack": 30,
"sortField": "-last_seen",
"page": 1
}'
The response includes results[], a total count for the whole match set, page, and a _credits object with your remaining balance. That total is also your free-ish market-sizing instrument: one search tells you how many ads exist for a keyword before you commit to paging through all of them. The multi-platform coverage is the entire reason this works as a single schema rather than eleven scrapers, an approach that breaks the way scraping tools always eventually break.
Competitor ad database schema: four tables that earn their keep
Resist the one-big-table instinct. Ads have immutable facts (the creative, the copy, the launch date) and mutable facts (impressions, heat, engagement counts). Mixing them means every refresh overwrites history, which defeats the point of owning the data. Four tables separate them cleanly:
CREATE TABLE advertisers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
domain TEXT,
saved_advertiser_id TEXT, -- uuid from POST /api/advertisers
first_tracked TEXT DEFAULT (date('now'))
);
CREATE TABLE ads (
ad_key TEXT PRIMARY KEY, -- stable id from the API
advertiser_id INTEGER REFERENCES advertisers(id),
advertiser_name TEXT,
platform TEXT NOT NULL,
ads_type INTEGER, -- 1 image, 2 video, 3 carousel, 4 collection
title TEXT,
body TEXT,
button_text TEXT,
landing_page_url TEXT,
preview_img_url TEXT,
video_url TEXT,
geo TEXT, -- JSON array stored as text
first_seen INTEGER NOT NULL, -- unix ts
last_seen INTEGER NOT NULL,
is_active INTEGER DEFAULT 1,
inserted_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX idx_ads_advertiser ON ads(advertiser_id, first_seen);
CREATE TABLE snapshots (
ad_key TEXT REFERENCES ads(ad_key),
captured_at TEXT NOT NULL, -- date of the pull
impression INTEGER, -- bucketed band, not exact
heat INTEGER, -- 0-1000 momentum score
like_count INTEGER,
share_count INTEGER,
comment_count INTEGER,
days_count INTEGER,
PRIMARY KEY (ad_key, captured_at)
);
CREATE TABLE enrichments (
ad_key TEXT PRIMARY KEY REFERENCES ads(ad_key),
summary TEXT,
analysis TEXT,
transcription TEXT,
enriched_at TEXT DEFAULT (datetime('now'))
);
Design decisions worth naming:
ad_key as the primary key. The API guarantees it is stable across calls, which makes dedup a one-line ON CONFLICT clause instead of a fuzzy-matching project. Same creative seen in tonight's pull and last week's pull lands on the same row.
Snapshots are append-only. Each nightly run inserts one row per live ad. That turns mutable counters into a time series you can diff. The composite primary key (ad_key, captured_at) makes accidental double-runs harmless.
Enrichments live apart. AI teardowns from the enrichment endpoint cost a credit each, so you will only ever enrich a minority of ads. A sparse separate table beats nullable columns on ads.
Advertisers carry the API's saved-advertiser uuid. Save a brand once via POST /api/advertisers with its platform IDs and the curate endpoint pulls all of its accounts, merged and deduped, in one call. The local table maps that uuid to your own foreign key.
SQLite, Postgres, or DuckDB: an honest comparison
Every tutorial pretends this choice is hard. For a competitor ad database under a few million rows, it mostly is not, and the honest answer is unfashionable: start with SQLite.
| SQLite | Postgres | DuckDB | |
|---|---|---|---|
| Setup | Zero. One file. | A server to run and patch | Zero. One file. |
| Writes | One writer at a time | Many concurrent writers | Single-process |
| Analytical speed | Fine at this scale | Good with indexes | Exceptional (columnar) |
| Team access | Awkward | Native | Awkward |
| Best fit | Solo builder, nightly cron | Team, dashboards, app on top | Heavy ad-hoc analysis |
SQLite's own guidance says it outright: for a single-application data store with modest write concurrency, it is the right default, and a nightly ingestion job with one writer is exactly that. Five competitors tracked for two years lands you in the tens of thousands of ad rows and the low millions of snapshot rows. SQLite shrugs at that.
Move to Postgres when a second writer or a dashboard shows up. Concurrent access, JSONB for the raw API payloads you were too lazy to normalize, and every BI tool speaks to it. If the database feeds a Metabase instance for the whole team, Postgres is the grown-up answer.
DuckDB is the analyst's cheat code rather than a primary store. It is columnar and built for OLAP, so aggregations over millions of snapshot rows return in milliseconds, and it can query your SQLite file or Parquet exports directly. A pattern that works well in practice: ingest into SQLite or Postgres, then point DuckDB at the data for the heavy exploratory queries later in this guide.
What you should not do is reach for BigQuery on day one. If your stack already lives there, fine, the warehouse route is well-trodden and managed connectors like Airbyte can carry your own performance data in alongside. But a cloud warehouse for five competitors is a yak shave.
The ingestion pipeline: fetch, dedup, upsert, snapshot
The pipeline has four jobs: call the API, deduplicate on ad_key, upsert immutable facts, append a snapshot. Here it is in Python with the standard library plus requests:
import requests, sqlite3, json, time
API = "https://adlibrary.com/api"
HEADERS = {"Authorization": "Bearer adl_your_api_key"}
db = sqlite3.connect("competitor_ads.db")
def pull_keyword(keyword, max_pages=2):
for page in range(1, max_pages + 1):
r = requests.post(f"{API}/search", headers=HEADERS, json={
"keyword": keyword,
"appType": "3",
"platform": ["facebook", "instagram", "tiktok"],
"geo": ["USA"],
"daysBack": 30,
"sortField": "-last_seen",
"page": page,
})
r.raise_for_status()
data = r.json()
upsert(data["results"])
if page * data["pageSize"] >= data["total"]:
break
time.sleep(7) # stay under the 10 requests/min limit
def upsert(results):
today = time.strftime("%Y-%m-%d")
for ad in results:
db.execute("""
INSERT INTO ads (ad_key, advertiser_name, platform, ads_type,
title, body, button_text, landing_page_url,
preview_img_url, video_url, geo,
first_seen, last_seen)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)
ON CONFLICT(ad_key) DO UPDATE SET
last_seen = excluded.last_seen, is_active = 1
""", (ad["ad_key"], ad.get("advertiser_name"), ad.get("platform"),
ad.get("ads_type"), ad.get("title"), ad.get("body"),
ad.get("button_text"), ad.get("landing_page_url"),
ad.get("preview_img_url"), ad.get("video_url"),
json.dumps(ad.get("geo", [])),
ad.get("first_seen"), ad.get("last_seen")))
db.execute("""
INSERT OR IGNORE INTO snapshots
(ad_key, captured_at, impression, heat, like_count,
share_count, comment_count, days_count)
VALUES (?,?,?,?,?,?,?,?)
""", (ad["ad_key"], today, ad.get("impression"), ad.get("heat"),
ad.get("like_count"), ad.get("share_count"),
ad.get("comment_count"), ad.get("days_count")))
db.commit()
The ON CONFLICT clause is the entire dedup strategy. New creative, new row. Known creative, refreshed last_seen and a fresh snapshot. Nothing else to maintain.
For the brands you track persistently, keyword search is the wrong primitive. Resolve the brand once with the free GET /api/advertisers/search lookup, save it with POST /api/advertisers, and have the nightly job hit POST /api/advertisers/{id}/curate instead. Curate fans out across the brand's saved accounts on every platform and costs one credit per 30-minute session, with pagination inside that session free. One brand is rarely one account, so this also saves you from the Nike-versus-Nike-Football identity mess.
Schedule it however you already schedule things. A cron line on any box, a GitHub Action, an n8n workflow if you want retries and alerting without writing them, or a Claude Code agent if the pipeline is part of a larger research loop. Teams running automated competitor monitoring usually settle on nightly pulls at 6am, so the morning's analysis sees yesterday's launches.

Refresh and retention: keeping the database alive
A schedule without a retention policy produces a swamp. Four rules keep the database trustworthy:
Mark dead, never delete. When an ad stops appearing in pulls, flip is_active to 0 after seven consecutive missing days rather than deleting the row. Dead ads are half your analytical value. Kill rate, creative fatigue curves, and survivorship comparisons all depend on the losers staying in the table.
Snapshot live ads daily, dead ads never. Snapshots of an inactive ad are noise. Gate the snapshot insert on the ad having appeared in tonight's pull.
Thin old snapshots, keep all ads. Ad rows are small and forever. Snapshot rows grow linearly with tracking scope, so after 18 months downsample them to weekly granularity. You lose nothing you will ever query at daily resolution that far back.
Enrich selectively, on a threshold. Run AI enrichment only when an ad proves it deserves analysis, say a runtime past 21 days or a heat score above your watchlist median. Each call returns the full teardown (transcript, strategic read, persuasion analysis) for one credit on text and short video, and the enrichments table caches it forever. Enriching everything on ingest is how people burn a month of credits in a week.
Twenty minutes of monitoring design up front decides whether this database is an asset or a liability in a year.
Eight queries that pay for the whole setup
This is the part screenshots can never do. All eight run on the schema above in SQLite syntax.
1. Creative velocity by week. Who is shipping, who is coasting. Velocity spikes usually precede offers and product launches.
SELECT a.name,
strftime('%Y-%W', ads.first_seen, 'unixepoch') AS week,
COUNT(*) AS new_ads
FROM ads JOIN advertisers a ON a.id = ads.advertiser_id
WHERE ads.first_seen >= strftime('%s', 'now', '-90 days')
GROUP BY a.name, week
ORDER BY week DESC, new_ads DESC;
2. Format mix, trailing 90 days. Catches strategic shifts like a quiet pivot from static to video.
SELECT a.name,
CASE ads.ads_type WHEN 1 THEN 'image' WHEN 2 THEN 'video'
WHEN 3 THEN 'carousel' ELSE 'collection' END AS format,
COUNT(*) AS n,
ROUND(100.0 * COUNT(*) /
SUM(COUNT(*)) OVER (PARTITION BY a.name), 1) AS pct
FROM ads JOIN advertisers a ON a.id = ads.advertiser_id
WHERE ads.first_seen >= strftime('%s', 'now', '-90 days')
GROUP BY a.name, format;
3. The survivors. Runtime is the strongest public signal an ad converts, because losers get killed early and winners get budget.
SELECT advertiser_name, title,
(last_seen - first_seen) / 86400 AS runtime_days
FROM ads
ORDER BY runtime_days DESC
LIMIT 25;
4. Kill rate. What share of each competitor's ads die inside 14 days. A high kill rate means aggressive testing, and their survivors deserve your closest attention.
SELECT a.name,
ROUND(100.0 * AVG(CASE WHEN (ads.last_seen - ads.first_seen)
< 14 * 86400 THEN 1.0 ELSE 0 END), 1) AS pct_dead_14d
FROM ads JOIN advertisers a ON a.id = ads.advertiser_id
WHERE ads.is_active = 0
GROUP BY a.name;
5. Hook reuse. Group by the opening of the body copy to find the hooks a competitor keeps going back to. Repetition is conviction.
SELECT substr(body, 1, 60) AS hook_start,
COUNT(*) AS uses,
COUNT(DISTINCT advertiser_id) AS advertisers
FROM ads
WHERE body IS NOT NULL
GROUP BY hook_start
HAVING uses >= 3
ORDER BY uses DESC;
6. Momentum, week over week. The snapshots table earns its storage here: which live ads gained the most impressions in seven days.
SELECT s1.ad_key,
s1.impression - s0.impression AS impression_gain
FROM snapshots s1
JOIN snapshots s0
ON s0.ad_key = s1.ad_key
AND s0.captured_at = date(s1.captured_at, '-7 days')
WHERE s1.captured_at = date('now')
ORDER BY impression_gain DESC
LIMIT 20;
7. Landing page concentration. Many creatives pointing at one URL means a funnel they believe in. New URLs flag new offers before any newsletter announces them.
SELECT landing_page_url,
COUNT(*) AS creatives,
date(MIN(first_seen), 'unixepoch') AS first_launch
FROM ads
WHERE advertiser_id = 3
GROUP BY landing_page_url
ORDER BY creatives DESC;
8. Platform expansion. A competitor's first-ever ad on a new platform is a strategy document published for free.
SELECT a.name, ads.platform,
date(MIN(ads.first_seen), 'unixepoch') AS first_ad_on_platform
FROM ads JOIN advertisers a ON a.id = ads.advertiser_id
GROUP BY a.name, ads.platform
HAVING first_ad_on_platform >= date('now', '-30 days')
ORDER BY first_ad_on_platform DESC;
Each query is a standing question. Wire the interesting ones to a weekly digest email and the database starts talking to you, the diff-and-alert pattern from our competitor ad research framework with SQL doing the diffing.
Join it with your own performance data
Owning competitor data pays double once it sits beside your own first-party numbers. Export your campaign metrics daily, load them into a my_performance table keyed by date, and a new class of question opens up.
Did your CPM rise in the weeks a competitor's velocity spiked? Join query one against your CPM history and look at the correlation. Auction pressure becomes visible instead of vibes, something you can sanity-check against the ad spend estimator when you want a rough read on what a competitor's push cost them. Did your CTR sag while a rival's lookalike creative gained momentum? Query six against your CTR by week answers whether their scaling correlates with your ad fatigue symptoms. Channel-level planning gets sharper too: platform expansion data (query eight) feeding a media mix model tells you whether you would be following competitors onto a channel or arriving early.
This join is the moat. Plenty of tools show you competitor ads. Almost nobody can overlay them on your own performance curve, because that requires both datasets in one place you control. The same property makes the database a natural substrate for AI agents doing ad research: an agent that can query your SQL answers questions a swipe file never could. If you would rather expose it to an agent over a protocol, the MCP server build sits naturally on top of this exact schema.
What it costs to run
Concrete math for a realistic setup, five tracked competitors plus two keyword sweeps:
- Nightly curate for 5 saved advertisers: 5 credits/night, about 150/month
- Weekly keyword sweeps, 2 keywords at 2 pages each: 16 searches, 16 credits/month
- Selective enrichment of ~40 threshold-crossing ads: roughly 40 credits/month
Call it ~210 credits a month, comfortably inside the Business plan at €329/mo with 1000+ credits, which is also the tier that includes API access and free integration help if the pipeline fights you. Rate limits are 10 requests per minute and 10,000 a day per key, generous for any nightly job, and failed searches refund their credit automatically. The headroom above your baseline ingest is what funds curiosity: deeper keyword sweeps, a new market, a burst of enrichments when a competitor launches something interesting.
Frequently Asked Questions
How do I build a competitor ad database?
Create four tables (advertisers, ads, snapshots, enrichments), pull ads from the AdLibrary API on a nightly cron, dedup on the stable ad_key identifier with an upsert, and append a daily snapshot of engagement metrics per live ad. SQLite plus sixty lines of Python is enough for a production-quality start.
Should I use SQLite, Postgres, or DuckDB for ad data?
Start with SQLite: one file, zero ops, and it handles millions of rows from a single nightly writer. Move to Postgres when teammates or dashboards need concurrent access. Use DuckDB as an analytical layer on top, since its columnar engine makes large aggregations dramatically faster.
How much competitor ad history do public ad libraries keep?
Meta archives ads about social issues, elections, and politics for seven years, but ordinary commercial ads leave public view once they stop running, with EU and UK ads queryable for about a year. That retention gap is the core argument for storing competitor ads in a database you own.
How often should I refresh a competitor ad database?
Nightly for tracked brands and weekly for keyword sweeps. Daily pulls catch new creatives within 24 hours of launch and give snapshots enough resolution to measure momentum. Mark ads inactive after seven consecutive days missing from pulls rather than deleting them, because dead ads power kill-rate and fatigue analysis.
Can I get exact competitor ad spend and impressions in my database?
No. Impressions arrive as bucketed ranges rather than exact counts, and spend figures are always estimates rather than advertiser-reported numbers. Store them labeled as bands and estimates. They are directionally reliable for trend and momentum analysis, which is what a competitor ad database is for.
The feed is rented, the schema is yours
APIs change and platforms tighten retention. The tables survive all of it. Every night your competitor ad database runs, it gets harder to compete against you on memory: you know their kill rate, their hook patterns, their platform moves, and what all of that did to your auctions, while they know whatever their screenshots folder remembers.
The build is one evening. Schema, pipeline, cron, done. Grab an adl_ key on the Business plan (€329/mo, 1000+ credits, API access included), point the script at your top five competitors, and let the first week of snapshots accumulate. The first time you answer a strategy question with a query instead of a scroll session, the database has already paid for itself.
Related Articles

Claude Code + adlibrary API: End-to-End Competitor Intelligence Workflows
Run five Claude Code workflows against the adlibrary API for automated competitor monitoring: Slack alerts, bulk teardowns, hook extraction across 500 ads, monthly landscape reports, and new entrant detection.

Full adlibrary API Documentation and Implementation Guide
Complete API documentation for AdLibrary. Extract Meta Ads, Google Ads, TikTok Ads and more via REST API. Code examples, endpoints, authentication, and rate limits.

adlibrary MCP server: build your own in 60 lines of Python
Build a custom adlibrary MCP server with fastmcp in 60 lines of Python. Expose ad search, timeline, and enrichment as Claude tools—pair with Meta Ads MCP.

How to Monitor Competitor Ads: The Ongoing Playbook for Diff-Detection and Alerts
Stop refreshing ad libraries manually. Build a competitor ad monitoring system with scheduled API pulls, diff-detection, and Slack alerts that tells you what changed since last week.

How to Pull Meta Ads Data Into BigQuery (2026 Guide)
Step-by-step guide to exporting Meta Ads data into BigQuery using Google's Data Transfer Service, the Marketing API, and ETL connectors. Updated for 2026.

n8n Meta Ads Automation Recipes 2026: 8 Production-Ready Workflows
8 production-ready n8n Meta Ads automation recipes: budget alerts, creative rotation, competitor monitoring via AdLibrary API, Slack reports, spend anomaly detection, and more.

How to Export Meta Ad Library Data in 2026
Meta Ad Library has no native export. This guide covers 3 real methods to get ad data into Notion, Airtable or BI tools — and which one actually works. Start free.