Neo4j is the backbone of the Brighthive platform — every data asset, user, workspace, organization, and transformation is represented as a connected graph node.
The Single Source of Truth
Neo4j stores and connects all platform metadata. When BrightAgent searches for data, when the webapp displays your catalog, or when lineage is traced from source to report — it all comes from Neo4j.What Neo4j Tracks
- Users — Platform users, their roles, and workspace memberships.
- Workspaces — Customer workspaces with their AWS account IDs, Redshift API URLs, and configurations.
- Organizations — Data-providing organizations, their AWS accounts, and S3 bucket locations.
- Data Assets — Every table, file, and dataset with schema, row counts, and location metadata.
- Lineage — How data flows from source through transformation to consumption.
- Relationships — Which organizations belong to which workspaces, which users can access what.
Why Neo4j?
GraphRAG Foundation
Powers Graph Retrieval-Augmented Generation with native vector search and graph traversal — giving BrightAgent rich context for every query.
Fast Relational Queries
Cypher query language enables fast, intuitive graph pattern matching — find connections across your data estate in milliseconds.
Lineage Tracking
Native graph structure is ideal for tracking data lineage — from raw source through transformations to final reports.
Vector + Graph Search
Combines vector similarity search with graph relationships for semantic data discovery.
GraphRAG
GraphRAG enhances traditional RAG by leveraging knowledge graphs to provide richer context and more accurate responses. BrightAgent uses GraphRAG to:- Find semantically similar data assets while considering graph relationships.
- Traverse multiple relationship levels to gather comprehensive context for complex questions.
- Link mentions across queries through graph connections — e.g., understanding that “revenue” might refer to data in
fact_ordersorsales_summary. - Adapt retrieval strategy based on data lineage and relationship patterns.
How the Platform Uses Neo4j
Query Routing
When a user queries data through the webapp or BrightAgent:- GraphQL API queries Neo4j for the workspace’s Redshift API URL and account ID.
- Neo4j returns the metadata needed to route the query to the correct workspace infrastructure.
Data Catalog
The webapp’s data catalog view is powered entirely by Neo4j:- Browse data assets with schema details, tags, and quality scores.
- See relationships between tables, organizations, and workspaces.
- Trace lineage from raw source to final output.
Agent Context
BrightAgent’s Retrieval Agent queries Neo4j to find relevant data assets for any user question — using a combination of keyword matching, vector similarity, and graph traversal.Technical Details
- Deployment: EC2 instance in the shared platform account.
- Access: Cypher queries via GraphQL OGM (Object-Graph Mapping) at
api.{env}.brighthive.net/ogm. - Security: VPC isolation with encrypted connections. Access controlled via platform API authentication.

