> ## Documentation Index
> Fetch the complete documentation index at: https://docs.brighthive.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Retrieval Agent

> The Retrieval Agent finds and fetches data from across your data stack — so you never need to know where things live.

## Overview

The Retrieval Agent serves as the data discovery and query layer for BrightAgent. It finds relevant data assets using **vector search** across your metadata catalog, generates optimized SQL, and executes queries against your warehouse — so you can ask for data without knowing which table, schema, or source it lives in.

## Demo: Retrieval Agent in Action

<iframe className="w-full aspect-video rounded-xl" src="https://www.loom.com/embed/3c52b03b08434967bae5850d053911bd?sid=946bb8e2-ae73-461e-bb36-cd667dee3721&t=600" title="Brighthive E2E Demo - Retrieval Agent" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

*This demo starts at 10:00 and shows data source connection, extraction, and preparation.*

## What You Can Ask

* *"Show me customer demographics"*
* *"What tables do we have for sales data?"*
* *"Find the students dataset from my warehouse"*
* *"Retrieve the CRM dataset"*
* *"How many orders were placed last quarter?"*
* *"Get revenue by region for Q4"*

## How It Works

```mermaid theme={null}
graph TD
    A[Your Question] --> B[BrightAgent]
    B -->|"Discover assets"| C[Vector Search + Metadata Catalog]
    C --> D{Confidence Assessment}
    D -->|"Strong Match > 60%"| E[Retrieval Agent]
    D -->|"Possible Match 40-60%"| F[Ask You to Confirm]
    D -->|"Uncertain < 40%"| G[Ask You to Clarify]
    F --> E
    E --> H[Generate SQL]
    H --> I[Execute Against Warehouse]
    I --> J[Results + Artifact Created]
```

1. **You ask a question** — Any question that references data, whether you know the exact table name or not.
2. **Discovers data assets** — The BrightAgent runs a **vector search** across your metadata catalog to find semantically matching data assets by name, description, and schema.
3. **Assesses confidence** — Each match is scored by similarity. Strong matches proceed automatically; uncertain matches ask for your confirmation.
4. **Generates SQL** — The Retrieval Agent creates an optimized SQL query using your data asset's schema, columns, and any workspace policies.
5. **Executes the query** — Runs the SQL against your **Redshift Serverless** warehouse (or Snowflake, if configured) and returns the results.
6. **Creates an artifact** — Saves the full dataset, metadata, and query details as a reusable artifact that other agents can reference.

## Confidence-Based Discovery

The Retrieval Agent uses **similarity scoring** to ensure it finds the right data before querying. This prevents bad queries and wasted compute:

| Confidence         | Threshold         | What Happens                                                                         |
| ------------------ | ----------------- | ------------------------------------------------------------------------------------ |
| **Strong Match**   | > 60% similarity  | Proceeds directly to SQL generation — the agent is confident it found the right data |
| **Possible Match** | 40–60% similarity | Presents options and asks you to confirm which dataset you mean                      |
| **Uncertain**      | \< 40% similarity | Asks you to clarify or refine your request before proceeding                         |

Discovery searches across **vector embeddings** (semantic meaning of names and descriptions) and your **platform metadata catalog** (structured metadata, schemas, and relationships) to find assets that match your intent — even if you don't know the exact table name.

## SQL Generation & Execution

Once the right data asset is identified, the Retrieval Agent handles the full query lifecycle:

<CardGroup cols={2}>
  <Card title="Schema-Aware SQL" icon="code">
    Generates SQL using the actual column names, data types, and table structure from your metadata catalog — not guesses.
  </Card>

  <Card title="Policy Compliance" icon="shield">
    Respects **workspace policies** during query generation. If your workspace has data access restrictions, the SQL enforces them.
  </Card>

  <Card title="Safe Execution" icon="lock">
    Queries execute via **cross-account IAM** roles against your dedicated Redshift Serverless cluster. Results are capped at 10,000 rows by default.
  </Card>

  <Card title="Reusable Artifacts" icon="box-archive">
    Every query result is saved as an **artifact** with full metadata — SQL used, tables queried, column definitions, and a searchable summary.
  </Card>
</CardGroup>

## What It Connects To

<CardGroup cols={2}>
  <Card title="Neo4j Metadata Catalog" icon="share-nodes">
    The primary search target — all data asset metadata, schemas, lineage, and relationships live here. Embeddings enable semantic search.
  </Card>

  <Card title="Redshift Serverless" icon="warehouse">
    Your workspace data warehouse where analytical queries execute. Auto-scaling, 3-AZ deployment, schema-per-organization isolation.
  </Card>

  <Card title="S3 Data Lake" icon="hard-drive">
    Organization-level raw data storage. Redshift Spectrum queries S3 data directly without loading it into the warehouse.
  </Card>

  <Card title="Glue Data Catalog" icon="book-open">
    Schema metadata auto-discovered by Glue crawlers when new data lands in S3. Feeds into Neo4j for unified search.
  </Card>
</CardGroup>

## Works With Other Agents

The Retrieval Agent is typically the **first agent invoked** in any data workflow:

* **Analyst Agent** uses retrieved data assets for statistical analysis and exploration.
* **Visualization Agent** receives query results to create interactive charts.
* **Engineering Agent** uses schema context to generate appropriate dbt transformation models.
* **Governance Agent** reports on data lineage and tracks access patterns.

<Callout type="info">
  The Retrieval Agent is part of the [BrightAgent architecture](/brightagent/architecture). See the [evaluation framework](/brightagent/evaluation) for how retrieval quality is measured.
</Callout>
