Skip to main content

Overview

Every Brighthive workspace gets a dedicated data warehouse deployed in its own AWS account. Redshift Serverless is the primary warehouse, with Snowflake available via Datapiary for organizations that need it.

Redshift Serverless

Auto-Scaling

Serverless compute scales automatically based on query workload — no capacity planning or cluster management required.

3 Availability Zones

Deployed across 3 AZs for high availability and fault tolerance within each workspace’s dedicated VPC.

Schema-per-Organization

Each organization’s data lives in its own Redshift schema, providing logical isolation within the shared workspace warehouse.

REST API Access

Lambda-backed REST API enables the platform and BrightAgent to execute queries programmatically against your warehouse.

Cross-Account Data Access

Redshift in your workspace account queries organization data stored in separate AWS accounts using cross-account IAM roles:
Workspace Redshift → Assumes OrgDataCatalogRole → Reads Org S3 + Glue Catalog
  • OrgDataCatalogRole is an IAM role in each organization’s account that trusts the workspace’s Redshift role.
  • Redshift Spectrum queries S3 data directly via external tables — no data copying required.
  • Glue Data Catalog provides schema metadata for these external tables.

Redshift Spectrum

Redshift Spectrum enables querying data directly in S3 without loading it into Redshift tables. This is used for:
  • Querying large datasets that don’t need to be materialized in the warehouse.
  • Accessing the latest organization data immediately after upload (via Glue catalog references).

Snowflake (via Datapiary)

For organizations that need Snowflake alongside Redshift, Brighthive provides Snowflake integration through Datapiary:
  • Organizations can sync data from their S3 data lake to Snowflake.
  • Snowflake assumes the OrgDataCatalogRole to access organization S3 data via cross-account IAM.
  • DBT Cloud transformations can run against Snowflake in addition to Redshift.

How Data Gets Into Your Warehouse

  1. Organization uploads data to their S3 data lake.
  2. Glue crawlers auto-detect the schema and update the Glue Data Catalog.
  3. Redshift Spectrum creates external tables pointing to the organization’s S3 and Glue catalog.
  4. Metadata is synced to Neo4j, making the data discoverable by BrightAgent and the webapp.
  5. Optionally, data is synced to Snowflake for organizations that use it.