Skip to main content

Overview

The Engineering Agent (DBT Agent) builds data transformation pipelines by generating dbt models from your natural language descriptions. It profiles your raw data, creates properly structured SQL transformations, validates them against your warehouse, and submits everything as a GitHub pull request for your review — nothing gets deployed without your approval.

Demo: Engineering Agent in Action

This demo starts at 16:00 and shows the engineering agent building data pipelines and transforming data.

What You Can Ask

  • “Create a customer segmentation model based on purchase frequency and total spend”
  • “Build a transformation that joins orders with customer demographics”
  • “Generate a dbt model for monthly revenue aggregation by region”
  • “Transform the raw events table into a session-level summary”
  • “Create a model that calculates customer lifetime value”
  • “Build a staging model that cleans and deduplicates the leads data”

How It Works

  1. Describe what you need — Tell the agent what transformation you want in plain English.
  2. Agent profiles your data — Loads table schemas, sample rows, and available reference models to understand your data landscape.
  3. Generates SQL — Creates an optimized SQL transformation based on your source data structure and requirements.
  4. Code review loop — An automated review validates the SQL against your schema, data types, and query requirements. If issues are found, the agent edits and resubmits (up to 2 iterations).
  5. Converts to dbt — Validated SQL is transformed into a properly structured dbt model with {{ source() }} references and configurations.
  6. Validates against your warehouse — The model executes against your actual warehouse to confirm it produces correct results. If validation fails, the agent reviews errors and automatically retries with a built-in retry strategy.
  7. Submits a GitHub PR — All generated code is submitted as a pull request for your team to review before deployment.

Key Capabilities

Schema-Aware SQL Generation

Generates SQL using actual column names, data types, and table structures from your warehouse — not guesses. References existing dbt sources and models from your GitHub repo.

Automated Code Review

Every generated query goes through automated review that checks schema compatibility, data type alignment, and correctness before proceeding.

Warehouse Validation

Models are executed against your warehouse to verify they produce correct results. Failed validations trigger automatic error analysis and fixes.

Human Review via GitHub

Every generated model is submitted as a GitHub PR with full SQL, configurations, and a business-friendly summary — nothing touches your warehouse without approval.

Multi-Step Pipeline

The Engineering Agent uses an 8-step pipeline that ensures quality at every stage:
StepWhat Happens
Data ProfilingLoads table schema, sample rows (10 rows), and available reference models from your GitHub repo
SQL DraftingGenerates optimized SQL based on your request and source data structure
Code ReviewValidates SQL correctness against schema, data types, and query requirements
Code EditingRefines SQL based on review feedback — applies fixes, ensures compatibility
dbt ConversionTransforms SQL into proper dbt model syntax with {{ source() }} references
dbt ValidationExecutes the model against your warehouse and verifies output
Error ReviewIf validation fails, analyzes root cause and recommends specific fixes
FinalizationPackages the model with a summary and submits as a GitHub pull request

GitHub Integration

The agent integrates with your GitHub repositories to:
  • Read existing sources — Fetches sources.yml from your dbt project to understand available data sources and reference models
  • Browse repo structure — Navigates branches and directories to understand your project layout
  • Submit pull requests — Creates PRs with generated dbt models, configurations, and descriptions for your team to review

Works With Other Agents

  • Retrieval Agent provides raw data context, schema information, and identifies which data assets to transform.
  • Analyst Agent receives clean, transformed data for analysis after models are deployed.
  • Governance Agent tracks transformation lineage in Neo4j and ensures compliance with data policies.
The Engineering Agent is part of the BrightAgent architecture. See capabilities for the full list of what BrightAgent can do.