Skip to main content

Overview

The Retrieval Agent serves as the data discovery and query layer for BrightAgent. It finds relevant data assets using vector search across your metadata catalog, generates optimized SQL, and executes queries against your warehouse — so you can ask for data without knowing which table, schema, or source it lives in.

Demo: Retrieval Agent in Action

This demo starts at 10:00 and shows data source connection, extraction, and preparation.

What You Can Ask

  • “Show me customer demographics”
  • “What tables do we have for sales data?”
  • “Find the students dataset from my warehouse”
  • “Retrieve the CRM dataset”
  • “How many orders were placed last quarter?”
  • “Get revenue by region for Q4”

How It Works

  1. You ask a question — Any question that references data, whether you know the exact table name or not.
  2. Discovers data assets — The BrightAgent runs a vector search across your metadata catalog to find semantically matching data assets by name, description, and schema.
  3. Assesses confidence — Each match is scored by similarity. Strong matches proceed automatically; uncertain matches ask for your confirmation.
  4. Generates SQL — The Retrieval Agent creates an optimized SQL query using your data asset’s schema, columns, and any workspace policies.
  5. Executes the query — Runs the SQL against your Redshift Serverless warehouse (or Snowflake, if configured) and returns the results.
  6. Creates an artifact — Saves the full dataset, metadata, and query details as a reusable artifact that other agents can reference.

Confidence-Based Discovery

The Retrieval Agent uses similarity scoring to ensure it finds the right data before querying. This prevents bad queries and wasted compute:
ConfidenceThresholdWhat Happens
Strong Match> 60% similarityProceeds directly to SQL generation — the agent is confident it found the right data
Possible Match40–60% similarityPresents options and asks you to confirm which dataset you mean
Uncertain< 40% similarityAsks you to clarify or refine your request before proceeding
Discovery searches across vector embeddings (semantic meaning of names and descriptions) and your platform metadata catalog (structured metadata, schemas, and relationships) to find assets that match your intent — even if you don’t know the exact table name.

SQL Generation & Execution

Once the right data asset is identified, the Retrieval Agent handles the full query lifecycle:

Schema-Aware SQL

Generates SQL using the actual column names, data types, and table structure from your metadata catalog — not guesses.

Policy Compliance

Respects workspace policies during query generation. If your workspace has data access restrictions, the SQL enforces them.

Safe Execution

Queries execute via cross-account IAM roles against your dedicated Redshift Serverless cluster. Results are capped at 10,000 rows by default.

Reusable Artifacts

Every query result is saved as an artifact with full metadata — SQL used, tables queried, column definitions, and a searchable summary.

What It Connects To

Neo4j Metadata Catalog

The primary search target — all data asset metadata, schemas, lineage, and relationships live here. Embeddings enable semantic search.

Redshift Serverless

Your workspace data warehouse where analytical queries execute. Auto-scaling, 3-AZ deployment, schema-per-organization isolation.

S3 Data Lake

Organization-level raw data storage. Redshift Spectrum queries S3 data directly without loading it into the warehouse.

Glue Data Catalog

Schema metadata auto-discovered by Glue crawlers when new data lands in S3. Feeds into Neo4j for unified search.

Works With Other Agents

The Retrieval Agent is typically the first agent invoked in any data workflow:
  • Analyst Agent uses retrieved data assets for statistical analysis and exploration.
  • Visualization Agent receives query results to create interactive charts.
  • Engineering Agent uses schema context to generate appropriate dbt transformation models.
  • Governance Agent reports on data lineage and tracks access patterns.
The Retrieval Agent is part of the BrightAgent architecture. See the evaluation framework for how retrieval quality is measured.