How BrightAgent Connects
BrightAgent integrates with the services that power your data stack through the Datapiary library, providing a uniform interface across all service types. Every interaction is authenticated, scoped to your workspace, and tracked in Neo4j for full lineage and auditability.Core Integrations
Neo4j
Metadata & Knowledge Graph — Single source of truth for all metadata, lineage, relationships, and data asset information. Every agent queries Neo4j for context via GraphRAG. Stores user, workspace, organization relationships; data asset schemas and locations; transformation lineage; and access control metadata.
Redshift Serverless
Data Warehouse — Auto-scaling analytical warehouse deployed across 3 availability zones with schema-per-organization isolation. The Analyst Agent generates and executes SQL here. Queries access organization data via cross-account IAM and Redshift Spectrum — reading S3 in place without copying.
Snowflake
Data Warehouse (via Datapiary) — Available for organizations that need Snowflake alongside Redshift. Data syncs from organization S3 to Snowflake via cross-account IAM. DBT Cloud transformations can run against either warehouse.
AWS Glue
Schema Discovery — Glue crawlers automatically detect schemas when data lands in S3 — inferring column names, data types, partitions, and formats. Metadata is synced to Neo4j and made available to all agents immediately.
DBT Cloud
Data Transformation (via Datapiary) — The Engineering Agent generates dbt models that run on DBT Cloud. All generated code goes through GitHub PRs for human review. Neo4j tracks transformation lineage — which models depend on which sources.
Amazon S3
Data Lake Storage — Each organization gets dedicated S3 buckets (raw, staged, shared) in their own AWS account. File uploads trigger automatic schema discovery via Glue and metadata registration in Neo4j.
Airbyte
Data Ingestion (Optional) — Self-hosted Airbyte instance with 300+ connectors for ingesting data from external sources like Shopify, HubSpot, Salesforce, PostgreSQL, and hundreds more. Runs within the organization’s dedicated AWS account.
OpenMetadata
Metadata Catalog — Unified metadata catalog integration for comprehensive data asset discovery, documentation, and lineage tracking. Connected via MCP for direct agent access.
MCP Integrations (Model Context Protocol)
BrightAgent uses MCP for validated access to external tools and services. MCP ensures that every tool call is well-formed, authorized, and auditable before execution.Jira
Create tickets, update statuses, and manage sprints directly from BrightAgent or Slack. The Slack Router Agent routes Jira-related requests to the Jira MCP server.
Notion
Search pages, query databases, and retrieve documentation from Notion workspaces. Integrated as an MCP server for structured access.
Google Drive
Search and retrieve documents from Google Drive. Available through the Slack Router Agent for quick access from Slack conversations.
OpenMetadata
Direct MCP connection to OpenMetadata for metadata discovery, data quality information, and catalog operations beyond what’s stored in Neo4j.
Observability & Tracing
LangSmith
Full distributed tracing for every agent interaction — from initial user query through intent classification, tool calls, and response synthesis. Traces include latency breakdowns per agent, token usage by model, and error attribution.
OpenTelemetry
Evaluation metrics, agent invocation counts, latency percentiles (p50/p95/p99), and error rates are recorded via OpenTelemetry for operational dashboards and alerting.
Integration Architecture
BrightAgent doesn’t connect to data services directly from the AI layer. Instead, all access flows through the platform’s secure infrastructure: This architecture means:- All access is authenticated via Cognito JWT tokens — every request is verified before reaching any backend service
- All queries respect workspace boundaries — agents can only access data the user’s workspace is authorized for
- All interactions are logged in Neo4j for lineage and audit — you can trace exactly what data was accessed and why
- No credentials are shared — cross-account access uses IAM role assumption (AWS STS), not stored passwords or API keys
Service Categories
The Datapiary library organizes integrations into service types, providing a consistent interface regardless of the underlying technology:| Category | Services | What Agents Use Them For |
|---|---|---|
| Warehouse | Redshift Serverless, Snowflake | Executing SQL queries, running analysis, aggregating data |
| Catalog | Neo4j, Glue Data Catalog, OpenMetadata | Discovering data assets, understanding schemas, tracking lineage |
| Transformation | DBT Cloud | Generating and running data transformation models |
| Ingestion | S3 direct upload, Airbyte | Bringing data into the platform from files and external sources |
| Notebook | Jupyter (E2B sandbox) | Generating and executing analysis notebooks in isolated environments |
| Collaboration | Stream.io | Real-time team chat and collaboration within the platform |
| External Tools | Jira, Notion, Google Drive, MS Teams | Task management, documentation, and file access via MCP |
Learn about the platform infrastructure that powers these integrations, or see the security model for how data isolation and access control work.

