HomeMarketing Analytics Glossarydbt for Marketing Analytics: A Practitioner's Guide

dbt for Marketing Analytics: A Practitioner’s Guide

dbt for Marketing Analytics: A Practitioner’s Guide

TL;DR

  • dbt (data build tool) is the SQL-first transformation layer of the modern data stack — it turns raw ad/CRM/web data already loaded into your warehouse (Snowflake, BigQuery, Redshift, Databricks) into clean, tested, documented, reusable tables. For marketing analysts, it’s the place to encode “what is an MQL,” “how do we attribute revenue,” and “what does blended CAC mean” once, in version-controlled SQL, so every dashboard agrees. Globally, more than 80,000 data teams use dbt (dbt Labs/Fivetran press release names users including Siemens, Roche, and Condé Nast).
  • A mid-career marketing analyst with solid SQL can be productive in dbt without a data engineer. You write SELECT statements; dbt handles dependencies, builds, tests, and docs. Pre-built Fivetran dbt packages (Google Ads, Facebook Ads, LinkedIn, HubSpot, Salesforce, plus the cross-channel ad_reporting rollup that unifies 11 ad platforms) give you analytics-ready campaign tables out of the box.
  • The landscape shifted in late 2025: Fivetran and dbt Labs signed an all-stock merger (Oct 13, 2025), dbt Labs open-sourced MetricFlow (the Semantic Layer engine) under Apache 2.0, and the new Rust-based Fusion engine entered preview. dbt is a genuinely valuable, in-demand skill — but it is NOT an ingestion tool, a BI tool, or a reverse-ETL tool. It only transforms data already in your warehouse.

Key Findings

  1. dbt is the “T” in ELT. In the modern stack, tools like Fivetran or Airbyte Extract and Load raw data into a cloud warehouse; dbt Transforms it in place using SQL. It does not move data and does not store data itself — it issues CREATE TABLE/CREATE VIEW commands against your warehouse.
  2. You work in SQL SELECT statements (“models”). dbt adds software-engineering discipline — version control (Git), automated testing, documentation, and a dependency graph (DAG) — on top of plain SQL.
  3. dbt Core is free and open-source (Apache 2.0); dbt Cloud/Platform is the paid managed service. The free Developer plan is one seat; the Starter plan is $100/developer/month (up to 5 seats).
  4. Pre-built marketing packages are the fastest on-ramp. Fivetran maintains free, open-source dbt packages that transform raw connector data into reporting tables, plus a cross-channel ad_reporting package that unifies 11 ad platforms.
  5. The dbt Semantic Layer (MetricFlow) is the answer to “why does every dashboard show a different number?” It centralizes metric definitions so revenue, CAC, and MQLs mean the same thing everywhere.
  6. dbt is a top-requested data skill, but rarely the headline requirement for pure marketing-analyst roles. It is heavily requested in analytics engineering, marketing data engineer, and GTM/growth analytics roles.

Details

What is dbt (plain-English for marketing analysts)

dbt (data build tool) is an open-source framework for transforming data that is already sitting in your cloud data warehouse. If you have ever maintained a sprawling chain of SQL queries — “this query feeds that dashboard, but first you have to run the UTM-cleaning query, then the sessions query, then the join to Salesforce” — dbt is the tool that formalizes that chain.

You write a transformation as a SELECT statement saved in a .sql file. That file is called a model. dbt wraps your SELECT in the appropriate CREATE TABLE AS or CREATE VIEW AS boilerplate and runs it against your warehouse (Snowflake, BigQuery, Redshift, Databricks, Postgres). You never write DDL.

Crucially, dbt fits the ELT pattern (Extract, Load, Transform), which has largely replaced traditional ETL for cloud analytics. In ETL, data was transformed before loading, often in expensive proprietary tools. In ELT, you load raw data first (cheap cloud storage and compute) and transform it inside the warehouse with SQL — which is exactly what dbt does. dbt does not extract or load data; you need an ingestion tool (Fivetran, Airbyte, Stitch) for that.

dbt’s philosophy is “analytics engineering” — applying software-engineering best practices (version control, modularity, testing, documentation, CI/CD) to analytics SQL. The O’Reilly text Analytics Engineering with SQL and dbt frames dbt as the tool that defines what analytics engineers do.

Core dbt concepts every marketing analyst needs to know

  • Models — a .sql file containing one SELECTdbt run builds it in the warehouse. Models are organized in layers: staging (one model per source table, light cleaning/renaming), intermediate (joins and business logic), and marts (final business-conformed tables like fct_campaign_performance or dim_customers).
  • ref() and the DAG — instead of hardcoding table names, you write {{ ref('stg_google_ads__campaigns') }}. This ref() function is the heart of dbt: it tells dbt that one model depends on another, automatically building a DAG (directed acyclic graph) so models run in the correct order. Rename or re-point a model and every downstream reference updates.
  • Sources — declared in YAML, these point dbt at the raw tables your ingestion tool landed (e.g., the raw google_ads schema). You reference them with {{ source('google_ads', 'campaign_history') }}. Sources also support freshness tests to alert you when a connector stops syncing.
  • Seeds — small CSV files version-controlled in your project and loaded with dbt seed. Perfect for marketing lookup data: UTM-source-to-channel mappings, campaign naming conventions, country/region groupings, cost-center mappings. dbt’s docs explicitly recommend seeds for “business-specific logic, for example a list of country codes” — not for large raw data.
  • Tests — assertions about your data. dbt ships four built-in generic (schema) testsuniquenot_nullaccepted_values, and relationships (referential integrity). Singular tests are one-off SQL files in the tests/ directory that fail if they return any rows (e.g., “no campaign should have spend but zero impressions”). Tests run with dbt test.
  • Documentation — descriptions written in YAML are compiled into a searchable docs site with an auto-generated lineage graph (dbt docs generate). This is how a marketing analyst documents “what does qualified_lead mean” in code.
  • Macros — reusable SQL snippets written in Jinja (a templating language). Macros let you DRY up repeated logic — e.g., a channel_grouping() macro that classifies UTM parameters into channels consistently across every model.
  • Packages — libraries of pre-built models and macros installed via packages.yml and dbt deps. This is where the Fivetran marketing packages and utility packages live.
  • Materializations — how dbt builds a model: view (fast to build, recomputed on query), table (stored physically, fast to query), incremental (only process new/changed rows — important for large event/clickstream tables), and ephemeral (inlined as a CTE).

Marketing analytics use cases with dbt (specific and concrete)

  • Transforming raw ad-platform data. Raw Google Ads / Meta / LinkedIn tables from a connector are messy, nested, and schema-specific. Staging models normalize column names, cast types, and de-duplicate. The Fivetran packages do most of this for you (see below).
  • UTM parameter normalization. A canonical marketing dbt job: strip casing/whitespace, coalesce nulls, map raw utm_source/utm_medium into a clean channel dimension via a seed file or a channel_grouping() macro. dbt Labs’ own best-practice guidance warns this is “often the hardest part” of attribution because “UTM campaigns are often poorly maintained.” Notably, Fivetran’s google_ads__url_report already breaks out utm_sourceutm_mediumutm_campaignutm_content, and utm_termgithub
  • Unified cross-channel campaign performance tables. Combine spend, clicks, impressions, and conversions from every ad platform into one table keyed by date/channel/campaign. The Fivetran ad_reporting package builds exactly this — “a unified table in your destination that shows your ad spend, clicks, and impressions for each advertising platform.”
  • Marketing attribution models. dbt is well-suited to first-touch, last-touch, linear, position-based, and time-decay attribution because the logic is transparent SQL (“glassbox,” no black-box ML). dbt Labs publishes a marketing-attribution playbook; community/vendor packages include Snowplow’s attribution package (first/last/linear/position-based) and the open-source rittmananalytics/ra_attribution multi-touch revenue + cost attribution model that sources ad spend via the Fivetran Google/Facebook/Snapchat packages. Clearbit built its attribution model on dbt + Redshift + Segment.
  • Customer journey / touchpoint models. Build an “every interaction” table stitched by a user/account identifier (for B2B, often account_id). RudderStack documents a first-touch-to-multi-touch journey workflow built on dbt.
  • Funnel analysis models. Encode staged funnels (Visit → Sign Up → MQL → SQL → Customer) as SQL, correctly following the same cohort through stages rather than independently counting each step (a common error that inflates conversion rates).
  • Lead scoring / pipeline analytics. Build trailing-engagement or fit/behavior scores in dbt models, then push them to Salesforce/HubSpot via a reverse-ETL tool (Census, Hightouch). dbt builds the score; it does not activate it.
  • Spend pacing models. Model daily/month-to-date spend vs. budget (from a seed or finance source) to flag over/under-pacing campaigns.
  • A single source of truth for marketing metrics. The overarching value: define CAC, ROAS, MQL, and pipeline once in dbt so marketing, sales, and finance stop arguing about whose number is right.

dbt Core vs dbt Cloud

dbt Coredbt Cloud / dbt Platform
What it isOpen-source CLI (Apache 2.0), installed via pipManaged SaaS: browser IDE, scheduler, CI/CD, hosting
CostFreeDeveloper: free (1 seat); Starter: $100/seat/mo (up to 5 seats); Enterprise/Enterprise+: custom
SchedulingNone — bring your own (Airflow, Dagster, Prefect, GitHub Actions, cron)Built-in job scheduler
Best forTechnical teams with DevOps/orchestration already in place; tight budgetsTeams wanting to move fast, SQL-fluent analysts, less infrastructure overhead

Current dbt Cloud pricing (from getdbt.com/pricing):

  • Developer — free, 1 developer seat, 1 project, up to 3,000 successful models built/month.
  • Starter (formerly “Team”) — $100 per user/month, 5 developer seats, 15,000 successful models built/month, 5,000 queried metrics/month, 1 project; includes basic Catalog, basic Semantic Layer, dbt Copilot, and API access. The overage formula is ($100 × developer seats) + ((models built − 15,000) × $0.01).
  • Enterprise — custom pricing; raises entitlements to 100,000 models/month, 20,000 queried metrics/month, 30 projects; adds advanced Semantic Layer, dbt Mesh, dbt Insights, Copilot, SSO/RBAC, and cost-management features.
  • Enterprise+ — custom; adds PrivateLink, IP restrictions, rollback, hybrid projects.

Two important billing nuances: (1) dbt Cloud bills on seats + usage, where usage = “successful models built” in deployment runs (development IDE runs, tests, seeds, snapshots, and ephemeral models do not count) plus queried metrics; (2) your warehouse compute is billed separately by Snowflake/BigQuery/etc. — dbt pushes all transformation SQL down to your warehouse, so heavy dbt usage raises your warehouse bill.

dbt in the marketing data stack

  • Ingestion (E + L): Fivetran or Airbyte pull from Google Ads, Meta, LinkedIn, HubSpot, Salesforce, GA4, etc., into the warehouse.
  • Warehouse: Snowflake, BigQuery, Redshift, or Databricks — where dbt runs.
  • Transformation (T): dbt.
  • BI / consumption: Looker, Tableau, Power BI, Mode, and Lightdash query dbt’s output tables (or its Semantic Layer). The dbt Semantic Layer has GA integration with Tableau, and supports Power BI (in preview) and Google Sheets/Excel.
  • Activation (reverse ETL): Census or Hightouch push dbt-built models (scores, segments) back into Salesforce, HubSpot, and ad platforms.

Fivetran / dbt packages for marketing data sources

These are free, open-source, maintained by a small Fivetran analytics-engineering team, and installed via packages.yml + dbt deps. They turn raw connector data into analytics-ready tables. (Note: the older *_source companion packages are now deprecated — sources and staging are bundled into the main package.)

  • fivetran/ad_reporting (cross-channel rollup, v2.4.0 latest) — combines exactly 11 platforms: Amazon Ads, Apple Search Ads, Facebook Ads, Google Ads, LinkedIn Ad Analytics, Microsoft Advertising, Pinterest Ads, Reddit Ads, Snapchat Ads, TikTok Ads, and Twitter Ads — into unified ad_reporting__* models (account, campaign, ad group, ad, keyword, URL levels). You need at least two ad connectors to benefit from the rollup. Works on BigQuery, Snowflake, Redshift, Postgres, Databricks.
  • fivetran/google_ads (v1.3.x) — produces 9 final tables including google_ads__account_reportgoogle_ads__campaign_reportgoogle_ads__ad_group_reportgoogle_ads__ad_reportgoogle_ads__keyword_reportgoogle_ads__search_term_report, and google_ads__url_report (the URL report breaks out base URL and the five UTM parameters). The Adwords API path is deprecated; use the Google Ads API.
  • fivetran/facebook_ads — models clicks, impressions, cost, conversions, and conversion value (with attribution windows) at account/campaign/ad-set/ad and URL grains; optional country/region demographics. Conversion handling was vetted with Seer Interactive.
  • fivetran/linkedin_ads (LinkedIn Ad Analytics) — part of the ad_reporting rollup; supports geo/country/region reporting.
  • fivetran/hubspot (v1.7.x) — transforms HubSpot into enriched contact, company, deal, email, and engagement models (materializes ~147 models when fully enabled); answers questions like “Are one-time email campaigns or automated flows generating more revenue?”
  • fivetran/salesforce (v2.2.x) — final models include salesforce__manager_performancesalesforce__owner_performancesalesforce__sales_snapshotsalesforce__opportunity_enhancedsalesforce__contact_enhancedsalesforce__daily_activity, and salesforce__opportunity_line_item_enhanced; answers pipeline, bookings, and win-rate questions (e.g., “Which industries or account types generate the highest-value opportunities and shortest sales cycles?”). github
  • fivetran/klaviyo — email/SMS last-touch and revenue attribution models (klaviyo__campaignsklaviyo__flowsklaviyo__persons).

Utility packages every marketing analyst should know:

  • dbt-labs/dbt_utils (current version 1.3.3) — the essential toolbox. Marketing-relevant macros: star() (select all columns except some), get_column_values() + pivot() (pivot channels/statuses into columns), union_relations() (stack regional or multi-account tables, even with mismatched columns), date_spine() (build a continuous date axis for spend pacing/time series), generate_surrogate_key() (build hashed keys for joins; replaces the deprecated surrogate_key()), and safe_divide() (returns null instead of erroring on divide-by-zero — handy for CTR/CVR/ROAS). It also adds generic tests: unique_combination_of_columnsaccepted_rangenot_null_proportionequal_rowcount, and expression_is_true. (Note: dbt Core ≥1.10.6 changed multi-argument test syntax — arguments now nest under an arguments: key.)
  • dbt-labs/audit_helper — compares two tables row-by-row (added/removed/identical/modified). Invaluable when migrating a legacy spreadsheet/Tableau/SQL-Server marketing report into dbt and proving the numbers match.
  • calogica/dbt_expectations — Great Expectations-style data-quality tests.

dbt Semantic Layer and why it matters for marketing

The dbt Semantic Layer, powered by MetricFlow, lets you define a metric once — in version-controlled YAML, in your dbt project — and query it consistently from any connected BI tool, spreadsheet, notebook, or AI agent. The point is to move metric definitions out of the BI layer (where every tool/analyst re-derives “revenue” slightly differently) and into the governed modeling layer. Change the definition once, and it updates everywhere it’s invoked.

For marketing analysts this directly attacks “metric drift”: MQL, SQL, CAC, ROAS, pipeline, and blended-vs-paid CAC each get one canonical definition. MetricFlow uses three building blocks — entities (join keys), dimensions (slice-by attributes like channel, campaign, region), and measures/metrics (the aggregations) — and supports simple, ratio, cumulative, and derived metrics. Best practice is to point semantic models at normalized staging/mart tables and let MetricFlow assemble joins on the fly, rather than pre-building one giant denormalized table.

Important honesty for practitioners: dbt’s original “dbt Metrics” (launched 2022) was deprecated and replaced by MetricFlow (acquired from Transform in 2023). At Coalesce 2025, dbt Labs open-sourced MetricFlow under Apache 2.0, and joined the Open Semantic Interchange (OSI) standards effort alongside Snowflake, Salesforce, and others. The engine is now free and open; the hosted Semantic Layer service in dbt Cloud is a paid feature (Starter tier and above), metered by queried metrics. MetricFlow is still pre-1.0 (e.g., v0.x), evolving quickly, and not yet supported on every warehouse (e.g., Microsoft Fabric is not supported).

Recent industry shifts (late 2025) you should know

  • Fivetran–dbt Labs merger: On October 13, 2025, Fivetran and dbt Labs signed a definitive all-stock merger agreement. Per dbt Labs’ announcement, “following close of the transaction, the combined company will be approaching $600M in annual recurring revenue (ARR)”; CEO Tristan Handy described it as launching with “~$600 million in ARR with well north of 10,000 customers.” Fivetran CEO George Fraser becomes CEO; Handy becomes President. As of early 2026 the deal is pending regulatory approval and the companies operate independently. Both committed to keeping dbt Core open-source under Apache 2.0. Independent analysis (Peliqan) notes the firms share “more than 1,500 joint customers” and that “Fivetran estimates that 80-90% of its existing customers already use dbt in some form.” dbt Labs
  • dbt Fusion engine: A new engine rewritten in Rust (from the SDF Labs acquisition), in preview as of Coalesce 2025. It understands SQL natively (catches errors before running, shows downstream impact of a column rename) and enables state-aware orchestration (only rebuild models whose inputs changed) for warehouse-cost savings. Per dbt Labs, “EQT Group, one of the world’s largest private equity investors, reported about 60% faster runtimes and 45% lower warehouse cost after adopting Fusion,” and a dbt internal project saw 64% total cost savings — these are vendor-reported figures and should be treated as such. The Information Lab + 2

What you need to know to get started

  • SQL is the core prerequisite. Intermediate SQL (joins, CTEs, window functions, aggregations) is what you need day one. The dbt Analytics Engineering certification recommends 6+ months of hands-on dbt and intermediate SQL.
  • What you can do solo (no data engineer): write staging/intermediate/mart models, install and configure Fivetran packages, write tests, document models, build attribution/funnel/pacing logic, and define metrics. If your data is already loading via Fivetran into the warehouse and you use dbt Cloud’s browser IDE, you can do all of this without engineering support.
  • What still needs engineering (or platform) help: initial warehouse/Git/connector setup, production orchestration on dbt Core, complex incremental models, performance/cost tuning at scale, security/governance (SSO, network controls), and reverse-ETL activation pipelines.
  • A small amount of Jinja and Git rounds out the toolkit — enough to use ref(), simple macros, and branch/commit/PR workflows.

Limitations and what dbt is NOT

  • dbt does not extract or load data. No connectors, no API pulls, no file ingestion. You need Fivetran/Airbyte/Stitch upstream. dbt only transforms data already in the warehouse.
  • dbt is not a BI/visualization tool. It produces tables; Looker/Tableau/Power BI/Mode visualize them.
  • dbt is not a reverse-ETL/activation tool. It builds the lead score or segment; Census/Hightouch sync it to Salesforce/HubSpot/ad platforms.
  • dbt is not an orchestrator (in Core). dbt Core has no scheduler; you bring Airflow/Dagster/Prefect or use dbt Cloud’s built-in scheduler.
  • dbt Core is not a no-code tool. It requires SQL and software-engineering practices (Git, testing). Teams lacking SQL skills should consider visual/no-code alternatives.
  • The Semantic Layer is not free-of-cost in production (the hosted service is paid) and is still maturing.
  • Common marketing misconceptions: dbt does not “do attribution for you” (you define the logic); it is not real-time (it runs in scheduled batches); and seeds are not a way to load big raw datasets (they’re for small reference data only).
  • Cost reality: because dbt pushes compute to your warehouse, poorly written models or over-frequent runs can drive surprising Snowflake/BigQuery bills — independent of dbt Cloud’s own seat/usage charges.

Salary / career relevance of dbt skills

dbt is one of the most in-demand modern-data-stack skills, but with important nuance for marketing analysts:

  • It is a near-default requirement for “analytics engineer” roles and increasingly listed for marketing/GTM data engineer, growth analytics, and senior marketing analytics roles, frequently paired with SQL + Snowflake/BigQuery + Git + Fivetran. Job boards (Indeed, ZipRecruiter, Glassdoor) show hundreds to thousands of postings naming dbt.
  • For pure, junior marketing-analyst roles, dbt is more often “preferred/nice-to-have” than mandatory — but it is a clear differentiator and a path into higher-paid analytics-engineering-adjacent work.
  • Compensation context: per Glassdoor (May 2026, 992 salaries), average analytics engineer total pay is $154,610/year, with a typical range between $127,766 (25th percentile) and $189,710 (75th percentile), and top earners to ~$226,956. dbt-skilled job postings commonly fall in the ~$120k–$180k+ band. These are analytics-engineering figures, which run higher than pure marketing-analyst pay.
  • The dbt Analytics Engineering certification (paid, proctored) is a recognized credential. Per dbt Labs, the company “has helped mint more than 12,000 analytics engineers” via dbt Learn and “certified more than 400 dbt developers” through the certification program.

Learning resources (current URLs)

  • dbt Learn (free, self-paced official courses incl. dbt Fundamentals): getdbt.com/dbt-learn
  • Official documentation: docs.getdbt.com
  • dbt Community Slack (very active; #dbt-certification and topic channels): join via getdbt.com community
  • dbt Community Forum (Discourse): discourse.getdbt.com
  • dbt Package Hub (find packages incl. Fivetran’s): hub.getdbt.com
  • dbt Analytics Engineering Certification study guide: on getdbt.com
  • Third-party: DataCamp (Introduction to dbt, dbt Fundamentals track), DataTalks.Club Data Engineering Zoomcamp (free, includes a dbt module), and the O’Reilly book Analytics Engineering with SQL and dbt.

Recommendations

Stage 1 — Learn and prototype (week 1–4). Complete the free dbt Fundamentals course on dbt Learn using a free dbt Cloud Developer account. If your company already loads marketing data via Fivetran, install the relevant Fivetran package (google_adsfacebook_adshubspot, or salesforce) plus dbt_utils in a sandbox and explore the pre-built *_report models. Benchmark: you can build and dbt run a 3-layer project (staging → intermediate → one mart) and write unique/not_null tests.

Stage 2 — Ship one source of truth (month 2–3). Pick your single most-contested metric (likely blended CAC, MQL count, or pipeline-by-channel) and build the staging → mart chain that defines it canonically, with UTM normalization via a seed + macro, and tests on the grain. Validate against the legacy report using audit_helperBenchmark: one dashboard in your BI tool now reads from a dbt mart instead of ad-hoc SQL.

Stage 3 — Scale and govern (month 4+). Expand to the cross-channel ad_reporting rollup, add attribution/funnel models, and — if you’re on dbt Starter/Enterprise — pilot the Semantic Layer for your top 5–10 metrics so BI tools, spreadsheets, and AI agents share definitions.

Decision thresholds that change the plan:

  • Core vs. Cloud: Choose dbt Core only if you already run an orchestrator (Airflow/Dagster) and have engineering support; otherwise choose dbt Cloud for the browser IDE + scheduler. For a ~5-person analytics team, Cloud’s saved DevOps time typically justifies the ~$500/mo; for 1–2 people with existing orchestration, Core is the better value.
  • Watch warehouse spend, not just license cost — if dbt-driven Snowflake/BigQuery compute climbs, tune models (incremental materializations, state-based selection, fewer full-refreshes).
  • Re-evaluate the stack post-merger: if you run Fivetran + dbt + a separate orchestrator, watch for tighter native integration (and potential pricing/packaging changes) as the merger closes through 2026.

Caveats

  • Pricing and product details are moving fast. dbt Cloud pricing has changed materially more than once; the merger, the rename of “dbt Cloud” toward “dbt Platform,” and Fusion are all in flux. Confirm current pricing at getdbt.com/pricing and package versions at hub.getdbt.com at publication time.
  • Vendor-reported performance figures (e.g., Fusion’s ~60% faster / ~45% cheaper at EQT, 64% internal savings) come from dbt Labs keynotes/blog and should be presented as vendor claims, not independent benchmarks.
  • Semantic Layer maturity: MetricFlow is open-source and production-used but still pre-1.0 and evolving; the spec and warehouse support are changing.
  • Salary figures cited are for analytics-engineering roles (Glassdoor/Coursera) and are higher than typical pure marketing-analyst pay; treat them as the upper band of the dbt-skilled marketing-analytics range, not the median.
  • Package model counts and names were verified against current GitHub READMEs, but Fivetran maintains only the latest version and adds models over time — verify exact outputs against the package’s README for the version you install. (For example, the Google Ads package now ships 9 final models and Salesforce ~10, more than the canonical handful most guides list.)
RELATED ARTICLES
- Advertisment -
Google search engine

Most Popular

Recent Comments