Skip to main content
Neo4j is the backbone of the Brighthive platform — every data asset, user, workspace, organization, and transformation is represented as a connected graph node.

The Single Source of Truth

Neo4j stores and connects all platform metadata. When BrightAgent searches for data, when the webapp displays your catalog, or when lineage is traced from source to report — it all comes from Neo4j.

What Neo4j Tracks

  • Users — Platform users, their roles, and workspace memberships.
  • Workspaces — Customer workspaces with their AWS account IDs, Redshift API URLs, and configurations.
  • Organizations — Data-providing organizations, their AWS accounts, and S3 bucket locations.
  • Data Assets — Every table, file, and dataset with schema, row counts, and location metadata.
  • Lineage — How data flows from source through transformation to consumption.
  • Relationships — Which organizations belong to which workspaces, which users can access what.

Why Neo4j?

GraphRAG Foundation

Powers Graph Retrieval-Augmented Generation with native vector search and graph traversal — giving BrightAgent rich context for every query.

Fast Relational Queries

Cypher query language enables fast, intuitive graph pattern matching — find connections across your data estate in milliseconds.

Lineage Tracking

Native graph structure is ideal for tracking data lineage — from raw source through transformations to final reports.

Vector + Graph Search

Combines vector similarity search with graph relationships for semantic data discovery.

GraphRAG

GraphRAG enhances traditional RAG by leveraging knowledge graphs to provide richer context and more accurate responses. BrightAgent uses GraphRAG to:
  • Find semantically similar data assets while considering graph relationships.
  • Traverse multiple relationship levels to gather comprehensive context for complex questions.
  • Link mentions across queries through graph connections — e.g., understanding that “revenue” might refer to data in fact_orders or sales_summary.
  • Adapt retrieval strategy based on data lineage and relationship patterns.

How the Platform Uses Neo4j

Query Routing

When a user queries data through the webapp or BrightAgent:
  1. GraphQL API queries Neo4j for the workspace’s Redshift API URL and account ID.
  2. Neo4j returns the metadata needed to route the query to the correct workspace infrastructure.

Data Catalog

The webapp’s data catalog view is powered entirely by Neo4j:
  • Browse data assets with schema details, tags, and quality scores.
  • See relationships between tables, organizations, and workspaces.
  • Trace lineage from raw source to final output.

Agent Context

BrightAgent’s Retrieval Agent queries Neo4j to find relevant data assets for any user question — using a combination of keyword matching, vector similarity, and graph traversal.

Technical Details

  • Deployment: EC2 instance in the shared platform account.
  • Access: Cypher queries via GraphQL OGM (Object-Graph Mapping) at api.{env}.brighthive.net/ogm.
  • Security: VPC isolation with encrypted connections. Access controlled via platform API authentication.