Airbyte Meta Ads Pipeline Setup: Full Connector Guide for 2026
Step-by-step guide to configuring the Airbyte Meta Ads source connector, syncing campaign data to BigQuery or Snowflake, handling incremental replication, and enriching with creative intelligence.

Sections
TL;DR: Setting up an Airbyte Meta Ads pipeline takes under an hour. The connector pulls campaigns, ad sets, ads, and performance insights into BigQuery or Snowflake using a System User token and incremental date-based sync. The main failure modes — expiring tokens, insight reattribution windows, and first-sync rate limits — are predictable and fixable. This guide walks every step.
If your ad spend data lives in Meta Ads Manager and nowhere else, you are making reporting decisions in a silo. The moment you need to join ad performance to CRM conversions, LTV cohorts, or multi-channel attribution models, you need the data in a warehouse. Airbyte is the standard open-source ELT tool for that move.
The Meta Ads connector in Airbyte — labeled "Facebook Marketing" in the source catalog — syncs campaign structure and insight data from the Meta Marketing API to any supported destination. BigQuery and Snowflake are the most common targets; Postgres, Redshift, and DuckDB work too.
This guide covers the full setup from token creation through validated first sync, then addresses the schema and query patterns that make the pipeline actually useful. See also hightouch-meta-ads-audiences-sync for the reverse-ETL complement to this pipeline.
What the Connector Syncs
Before configuring anything, be clear on what the connector does and does not return.
What it syncs:
- campaigns — campaign name, objective, status, budget type, created/updated timestamps
- ad_sets — targeting parameters (country, age, gender, placements), optimization goal, budget, bid strategy, schedule
- ads — ad name, status, creative ID, tracking specs
- ads_insights — impressions, reach, clicks, spend, CPC, CPM, CTR, conversions, ROAS — aggregated by day and by ad, ad set, or campaign depending on the breakdown
- ad_creatives — creative ID, name, thumbnail URL, object_type
What it does not sync:
The connector does not return ad copy (headline, body text) as structured fields in most sync modes, competitor ad data, ad intelligence signals, or cross-platform data from TikTok, YouTube, or Google.
For teams building a comprehensive ad intelligence layer — not just their own performance data but structured creative data and competitor signals — see AdLibrary's API access feature, which returns richer creative metadata and multi-platform coverage than the Meta connector alone. Meta's free API is adequate for your own account data. When you add creative intelligence or multi-platform scope to the same data model, you need something else.
Step 1: Create a Meta System User and Generate a Token
The most common cause of pipeline failures in production is a token that expires. User access tokens expire in 60 days. System User tokens do not expire — they are the correct credential for any automated data pipeline.
Create the System User:
- In Meta Business Manager, navigate to Business Settings > System Users.
- Click Add and create a new system user. Name it something descriptive:
airbyte-pipeline-user. - Set the role to Employee (read-only is sufficient; Advertiser is needed only if you intend to write back to Meta).
Assign the System User to your Ad Account:
- Go to Business Settings > Accounts > Ad Accounts.
- Select your ad account, click Add People, and add the system user with the Analyst role. Analyst is sufficient for read operations.
Generate the token:
- Return to System Users, select your user, and click Generate New Token.
- Grant:
ads_read,ads_management,read_insights. Theads_managementpermission is required even for read-only pipelines. - Copy and store the token securely. It will only be shown once.
If your ad account is under a Business Manager you do not administer, ask the admin to create the system user. Without BM-level access, you cannot create non-expiring tokens — a personal user token will cause the pipeline to break every 60 days without warning. Also note: ad relevance diagnostics data is not returned through this token scope regardless of role.
Step 2: Add the Meta Ads Source in Airbyte
Airbyte's connector catalog labels this connector Facebook Marketing — the name mismatch with "Meta Ads" is a legacy artifact.
In Airbyte Cloud:
- Navigate to Sources and click New Source.
- Search for Facebook Marketing and select it.
- Enter your Access Token from Step 1.
- Enter your Account ID (the numeric ID in your Ads Manager URL). Some connector versions want
act_123456789, others want123456789— the UI field label will specify. - Set the Start Date for historical data. 90 days is a reasonable starting point.
- Leave Insights Lookback Window at the default (28 days). This controls how far back incremental syncs re-pull insight data to capture Meta's retroactive attribution.
In Airbyte OSS (self-hosted):
The configuration is identical. Use environment variable injection or Airbyte's secrets management rather than hardcoding the token in the connection config.
Test the connection before saving. If the test fails, the most common causes are: wrong account ID format (try with and without act_ prefix), insufficient token permissions, or the system user not assigned to the ad account.
Step 3: Select Streams and Configure Sync Modes
Recommended stream selection for most use cases:
| Stream | Sync Mode | Notes |
|---|---|---|
| campaigns | Incremental Append | Low volume, rarely needs full refresh |
| ad_sets | Incremental Append | Targeting params change; incremental is sufficient |
| ads | Incremental Append | Name/status changes captured incrementally |
| ads_insights | Incremental Append | Core metrics; use date lookback for reattribution |
| ad_creatives | Full Refresh | Small table; full refresh keeps it clean |
For the ads_insights stream, choose Incremental Append with cursor field date_start. Meta re-attributes conversion modeling data retroactively for up to 28 days after an event — the connector's built-in lookback window re-pulls the last N days on every sync. The default 28 days is appropriate for accounts using 28-day click attribution windows.
The connector also offers breakdown-level streams: ads_insights_age_and_gender, ads_insights_region, ads_insights_platform_and_device. These are useful for geo-targeting analysis but increase API call volume significantly. Enable them selectively.
Step 4: Configure Your Destination
BigQuery: Enter your Project ID, Dataset ID, and a Service Account JSON with bigquery.dataEditor and bigquery.jobUser roles. Set Loading Method to GCS Staging for large initial syncs.
Snowflake: Enter Account Name (e.g., xy12345.us-east-1), Warehouse, Database, Schema, and credentials. The Snowflake user needs CREATE TABLE, INSERT, SELECT, and UPDATE privileges. Set Loading Method to Internal Staging for best performance.
Normalization: For production pipelines feeding a BI tool directly, enable normalization. For pipelines feeding a dbt transformation layer, load raw and transform downstream — normalization in Airbyte and dbt both running on the same data creates maintenance overhead.
Step 5: Create the Connection and Set Sync Schedule
- In Airbyte, navigate to Connections > New Connection.
- Select your Facebook Marketing source and your destination.
- Confirm sync mode settings from Step 3.
- Set the Sync Schedule:
0 6 * * *(6am UTC) gives Meta time to finalize previous-day data. Meta's own guidance recommends waiting until the day after for stable daily aggregates — relevant for spend pacing reports that feed into budget decisions. - Enable Detect and Propagate Schema Changes in Airbyte Cloud to handle connector schema updates automatically.
For most ad analytics use cases, daily sync is sufficient. Hourly sync is useful only if you are monitoring intraday pacing — and Meta's API returns estimates for intraday data rather than final figures.
Step 6: Run and Validate the First Sync
Trigger a manual sync from the connection page. Do not set it live on a schedule until the first sync validates correctly.
Monitoring the sync: In Airbyte Cloud, the Sync History tab shows real-time progress per stream with row counts and errors. For large accounts with years of history, expect the insights stream to run for 2-6 hours on first sync.
Common first-sync errors:
OAuthException: (#200) Requires ads_read permission — Re-generate the token with the correct permission scope. The system user's role on the ad account does not override token-level permission grants.
User request limit reached — You hit Meta's API rate limit. Airbyte retries with backoff, but for accounts with very long historical ranges, reduce the Start Date to 90 days and perform historical backfills in separate batches. Meta's Marketing API rate limiting documentation covers the scoring system.
Invalid account ID — Try toggling the act_ prefix on the Account ID field.
Validating data: After the first sync, compare spend totals in your destination to the last 7 days in Meta Ads Manager. Discrepancies under 2% are normal. Over 5% indicates a Start Date issue, currency conversion problem, or duplicate rows from the lookback window.
The Schema You Actually Need to Know
Airbyte loads each stream into a separate destination table. Join keys:
ads_insights.ad_id→ads.idads.adset_id→ad_sets.idad_sets.campaign_id→campaigns.id
The ads_insights table is the core metrics table. Every other stream is a dimension table joined to it.
Deduplication layer: Because the lookback window re-pulls recent data, your ads_insights table will contain multiple rows for the same (ad_id, date_start) combination. Standard deduplication pattern in dbt:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY ad_id, date_start
ORDER BY _airbyte_extracted_at DESC
) AS row_num
FROM {{ source('facebook_ads', 'ads_insights') }}
)
WHERE row_num = 1
Partitioning and clustering: For BigQuery, partition ads_insights by date_start and cluster by campaign_id. For Snowflake, cluster on (date_start, campaign_id). This reduces query costs significantly for date-range queries.
Incremental Sync and the Attribution Window Problem
Meta's aggregated event measurement system re-attributes conversion events as new signals arrive. Your incremental sync on Day 15 will return different numbers for Day 1 than your sync on Day 7 did. Both are correct given the data available at the time.
Three standard approaches:
1. Lookback window + deduplication (recommended): Use Airbyte's built-in lookback window combined with the dbt deduplication pattern above. Every sync re-pulls recent rows; deduplication keeps only the latest.
2. Snapshot table: Write to a snapshot table with the sync timestamp. Your BI layer queries the snapshot with the most recent _airbyte_extracted_at per (ad_id, date_start). More storage overhead, but keeps full history — useful for incrementality analysis.
3. Overwrite window: Each sync overwrites the last 28 days of data completely. Simple to implement, but loses any manual corrections you've applied.
For most marketing analytics stacks, approach 1 is the right default.
What the Pipeline Gives You — and What It Doesn't
After a working pipeline, your warehouse has a clean, queryable version of your own Meta Ads performance data. That unlocks cross-channel spend reporting (join Meta ad spend to Google Ads and TikTok spend tables), LTV-to-CAC analysis, campaign structure audits querying naming conventions and ad fatigue patterns at scale, and automated alerting on CTR drops or spend anomalies.
What it does not give you: your competitors' data, creative intelligence signals, or cross-platform ad intelligence. The Meta connector returns only your own account data through Meta's API.
For the intelligence layer — understanding what creative formats competitors are scaling, which hooks are performing across the category — AdLibrary's API access feature is the complement. Meta's free API is the right tool for your own performance data. AdLibrary's paid API adds richer creative metadata, competitor signals, and multi-platform coverage (Facebook, Instagram, TikTok, YouTube, Snapchat, Pinterest, LinkedIn, Google) in a single query interface. The Business plan at €329/mo includes API access. See /features/multi-platform-ads for coverage details.

Connecting to dbt and Monitoring Production
Airbyte handles extraction and loading. Transformation belongs in dbt. A minimal transformation layer looks like three models:
stg_meta_ads__insights.sql— deduplicated insights with correct typesstg_meta_ads__ads.sql— ad dimension with campaign and ad set context joined infct_meta_ads_daily.sql— final fact table ready for BI tooling
The staging models apply the ROW_NUMBER() deduplication pattern and cast date_start from string to DATE. For ad performance dashboards in Looker, Metabase, or Superset, the fact model is the primary source — analysts do not need to understand the raw Airbyte schema.
Airbyte's official dbt integration documentation covers triggering dbt runs after successful Airbyte syncs.
In production, three things break the pipeline: token expiry, Meta API changes, and schema evolution.
Token management: System User tokens do not expire, but they can be manually revoked if someone removes the system user from the ad account. Set up sync failure alerts in Airbyte (email or Slack) so failures surface within hours. A missed daily sync means a gap in your ad spend data that is difficult to backfill cleanly.
Meta deprecates Marketing API versions approximately every 12 months. Subscribe to Meta's Developer Platform changelog and to Airbyte's connector release notes.
Schema evolution: new fields that appear mid-history create nullable columns with nulls for all historical rows — handle these with COALESCE defaults in your dbt models.
Common production mistakes: Using a personal user token instead of a System User token is the number one cause of outages. Personal tokens expire in 60 days, fail silently, and leave your dashboard data stale. Treating Airbyte-loaded data as source-of-truth without the deduplication model produces inflated spend totals for recent periods — always route BI queries through the dbt fact model, never directly to raw Airbyte tables. Setting overly broad historical ranges on first sync (2+ years for a large account) can run for 12-24 hours and exhaust rate limits; do initial historical loads in 90-day batches. If you consolidate multiple accounts with different currencies (USD, EUR, GBP), add a currency conversion layer — raw Airbyte data is in the ad account's reporting currency and mixing currencies without conversion produces incorrect aggregates.
Extending the Pipeline: What Comes Next
A working Airbyte Meta Ads pipeline is infrastructure, not a destination. Three common extensions after initial setup:
Multi-platform consolidation: Add Airbyte connectors for TikTok Ads, Google Ads, and LinkedIn Ads to the same destination schema. Build a fct_paid_ads_daily model that unions all platform fact tables on a normalized schema (date, platform, campaign_id, spend, impressions, clicks, conversions). This is the foundational model for true cross-channel ROAS analysis. See multi-platform ads feature for AdLibrary's coverage across these same platforms.
Audience data layer: Use a reverse-ETL tool like Hightouch (covered in the Hightouch Meta Ads audiences sync guide) to push warehouse-computed audiences back to Meta as Custom Audiences. The Airbyte pipeline brings data in; Hightouch pushes computed segments out. Together they close the loop between warehouse analytics and ad targeting — a pattern covered in detail in the cold audience ramp use case.
Creative intelligence enrichment: Your pipeline knows what your ads spent and how they performed. It does not know what the creative actually contained, or how it compares to what competitors are running. Enriching your warehouse with creative metadata — either by parsing the ad_creatives stream or pulling from AdLibrary's API — makes it possible to build models that correlate creative features to performance outcomes. That is the analysis that actually improves future creative decisions.
Use ad timeline analysis to track when competitor ads start and stop running — a useful signal for identifying what is working in your category. See machine learning Facebook ads platforms and Facebook ads analytics platform for how this data model integrates with broader automated bidding and reporting systems. For teams also tracking server-side signal quality alongside pipeline throughput, see the conversions API implementation guide — the two pipelines (Airbyte ELT + CAPI server events) are complementary layers of the same data infrastructure. And for an architectural overview of how AI automation layers sit above your warehouse data, see Facebook ads AI platforms: four layers explained.
Once your own performance data is in the warehouse, contextualize it against the market. Your CPM and CTR numbers are only meaningful relative to benchmarks. For a closer read on what competitors are spending and scaling, AdLibrary's unified ad search and AI ad enrichment surface structural signals that no benchmark report captures — which platform a competitor added, which formats have been running profitably for 30+ days, how their creative output volume changed. That is the market context your warehouse data needs to be interpreted correctly.
Also see AI-powered Meta marketing for how warehouse-grounded data models connect to Meta's own ML-driven optimization layer, and lookalike audience model 2026 for how clean first-party data improves the seed audiences you build for prospecting. For attribution tracking on top of the warehouse, see northbeam review 2026. And for the companion server-side events pipeline that runs alongside your Airbyte ELT, the conversions API guide covers the full signal architecture.
Use the ad budget planner to model spend allocation across platforms before you build the multi-platform consolidation layer. The break-even ROAS calculator helps validate whether cross-channel expansion is justified at your current margins.
Frequently Asked Questions
What access token type does the Airbyte Meta Ads connector require?
The connector requires a long-lived System User access token with ads_read and read_insights permissions. User tokens expire in 60 days and are not suitable for production pipelines. A System User token tied to a Meta Business Manager system user does not expire and is the correct credential for automated data pipelines.
How does incremental sync work for the Meta Ads insights stream?
The insights stream uses date_start as the incremental cursor. On each run, Airbyte requests data from the last synced date forward plus a configurable lookback window (default 28 days). Meta re-attributes conversion data retroactively, so the lookback window ensures recently updated conversion counts are captured. Use deduplication in your warehouse layer to handle re-ingested rows.
What Meta Ads data does Airbyte not sync?
The connector syncs your own campaign structure and performance metrics. It does not return ad creative copy as structured fields, competitor ad data, or cross-platform data from TikTok, Google, or LinkedIn. For creative intelligence and multi-platform coverage, a separate data source is required.
How do you handle Meta Ads API rate limits in Airbyte?
Meta's Marketing API uses score-based throttling. Airbyte retries automatically with exponential backoff when rate limits are hit. For large accounts with long historical ranges, break the initial sync into shorter date batches (30-90 days at a time) to avoid extended rate limit cycles. Production daily syncs rarely hit rate limits because incremental data volumes are small.
What is the best destination schema for querying Meta Ads pipeline data?
Partition the ads_insights table by date_start and cluster by campaign_id in BigQuery (or cluster on (date_start, campaign_id) in Snowflake). Join to campaigns, ad_sets, and ads tables on their respective IDs for dimension context. Apply a deduplication step using ROW_NUMBER() OVER (PARTITION BY ad_id, date_start ORDER BY _airbyte_extracted_at DESC) to handle the attribution lookback window.
The bottom line: An Airbyte Meta Ads pipeline is not a complex infrastructure project. The connector is mature, the configuration surface is small, and the failure modes are well-documented. The 45-minute setup estimate is realistic for teams with basic Airbyte familiarity and a Meta Business Manager account in place.
The real complexity is downstream: deduplication for attribution restatements, multi-platform consolidation, and the analytical models that turn raw event data into decisions. That work is worth doing because it permanently removes the reporting bottleneck of manually exporting from Ads Manager.
For teams that also need market intelligence alongside their performance data — creative signals, competitor patterns, multi-platform ad coverage — AdLibrary's Business plan at €329/mo includes API access for programmatic data retrieval. Meta's free API gives you your own account data. AdLibrary's paid API covers the rest: richer fields per ad, competitor visibility, and cross-platform coverage in one interface.
Set up the pipeline. Build the deduplication model. Then decide what analytical questions you actually want to answer — the infrastructure will be there when you're ready.
Related Articles

Hightouch Meta Ads Audience Sync: Complete 2026 Setup Guide
Step-by-step guide to syncing your data warehouse to Meta Custom Audiences via Hightouch. Covers setup, sync modes, troubleshooting, and when to upgrade.

Meta Conversions API (CAPI): The Complete 2026 Implementation Guide
How to implement Meta Conversions API, optimize EMQ score, and use competitor ad intelligence to choose the right conversion events. Covers direct API, Stape, Shopify, GTM server-side, and Zapier paths.

Facebook Ads Analytics Platform: 9 Best Tools for ROI in 2026
The 9 best Facebook ads analytics platforms in 2026, compared by use case: attribution, reporting automation, and competitive creative research.

Machine Learning Facebook Ads Platforms: What Actually Uses ML
90% of 'ML' Facebook ad platforms wrap Meta's own Advantage+ engine. This guide shows how to identify the ones with genuine ML differentiation in 2026.

Facebook ads AI platforms: the four layers and what each actually automates
Facebook ads AI platforms do four distinct things. Meta has eaten targeting and budget with Advantage+. The durable opportunity is creative intelligence and reporting.

AI Powered Meta Marketing: 7 Strategies to Scale Ads (2026)
AI powered meta marketing: 7 strategies for creative automation, competitor research, performance scoring, and learning loops to scale Meta ads in 2026.

Northbeam Review 2026: MTA Attribution for DTC Brands — Honest Verdict
Honest Northbeam review for 2026: MTA architecture, pixel-less tracking, MMM layer, setup friction, pricing, and where it fits vs. Triple Whale and Rockerbox.