Skip to main content

High-Level Architecture

Brighthive uses a three-tier architecture that provides dedicated, isolated infrastructure for every customer: Tier 1: Shared Platform — The webapp, GraphQL API, BrightAgent AI system, and Neo4j metadata graph run on shared infrastructure. Tier 2: Workspace Accounts — Each customer workspace gets a dedicated AWS account with Redshift Serverless (3-AZ), Snowflake (via Datapiary), and DBT Cloud for transformations. Tier 3: Organization Accounts — Each data-providing organization gets a dedicated AWS account with S3 storage, Glue Data Catalog, and cross-account IAM roles for secure data sharing.

Key Differentiators

  • Dedicated Infrastructure: Every customer gets isolated AWS accounts — not shared tenancy. Your data warehouse, storage, and compute are yours alone.
  • AI-Native: BrightAgent’s multi-agent system (LangGraph) is built into the platform, not bolted on. Ask questions in plain English and get analysis, visualizations, and dbt models.
  • Graph-Powered Metadata: Neo4j serves as the single source of truth for all metadata, lineage, and relationships across your entire data estate.
  • Cross-Account Security: Organization data is shared to workspaces through IAM-based cross-account roles — no data copying, no shared credentials.
  • Automated Schema Discovery: Glue crawlers auto-detect schemas when data lands in S3, register metadata in Neo4j, and make it immediately queryable.

Platform Components

Redshift Serverless

Auto-scaling data warehouse across 3 availability zones with schema-per-organization isolation

Neo4j Graph Database

Single source of truth for all metadata, lineage, data asset relationships, and GraphRAG capabilities

Glue Data Catalog

Automatic schema discovery via crawlers when data lands in S3 — no manual cataloging required

LangGraph AI Agents

Multi-agent orchestration system with specialized agents for retrieval, analysis, visualization, engineering, and governance

React + Apollo GraphQL

Modern webapp with real-time collaboration via Stream.io, data catalog browsing, and BrightAgent interface

DBT Cloud + Datapiary

Agent-generated dbt models submitted as GitHub PRs, with transformation lineage tracked in Neo4j

Key Data Flows

User Query

User → WebApp → GraphQL API → Neo4j (metadata lookup) → Workspace Redshift → Org S3/Glue (cross-account) → Results

Data Upload

File → Org S3 → EventBridge → Glue Crawler → Schema Discovery → Neo4j Metadata Sync → Available for Querying

AI Assistant

User Question → BrightAgent → Specialized Agents (Retrieval, Analyst, Visualization) → Neo4j + Redshift → Response

Provisioning

Admin Request → brighthive-admin Step Functions → Create AWS Account → Deploy CDK → Neo4j + DynamoDB → Ready