> ## Documentation Index
> Fetch the complete documentation index at: https://docs.brighthive.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Warehouse

> Redshift Serverless and Snowflake provide auto-scaling, isolated data warehousing for every workspace.

## Overview

Every Brighthive workspace gets a dedicated data warehouse deployed in its own AWS account. Redshift Serverless is the primary warehouse, with Snowflake available via Datapiary for organizations that need it.

## Redshift Serverless

<CardGroup cols={2}>
  <Card title="Auto-Scaling" icon="arrows-up-down">
    Serverless compute scales automatically based on query workload — no capacity planning or cluster management required.
  </Card>

  <Card title="3 Availability Zones" icon="server">
    Deployed across 3 AZs for high availability and fault tolerance within each workspace's dedicated VPC.
  </Card>

  <Card title="Schema-per-Organization" icon="table-columns">
    Each organization's data lives in its own Redshift schema, providing logical isolation within the shared workspace warehouse.
  </Card>

  <Card title="REST API Access" icon="code">
    Lambda-backed REST API enables the platform and BrightAgent to execute queries programmatically against your warehouse.
  </Card>
</CardGroup>

### Cross-Account Data Access

Redshift in your workspace account queries organization data stored in separate AWS accounts using cross-account IAM roles:

```
Workspace Redshift → Assumes OrgDataCatalogRole → Reads Org S3 + Glue Catalog
```

* **OrgDataCatalogRole** is an IAM role in each organization's account that trusts the workspace's Redshift role.
* Redshift Spectrum queries S3 data directly via external tables — no data copying required.
* Glue Data Catalog provides schema metadata for these external tables.

### Redshift Spectrum

Redshift Spectrum enables querying data directly in S3 without loading it into Redshift tables. This is used for:

* Querying large datasets that don't need to be materialized in the warehouse.
* Accessing the latest organization data immediately after upload (via Glue catalog references).

## Snowflake (via Datapiary)

For organizations that need Snowflake alongside Redshift, Brighthive provides Snowflake integration through Datapiary:

* Organizations can sync data from their S3 data lake to Snowflake.
* Snowflake assumes the OrgDataCatalogRole to access organization S3 data via cross-account IAM.
* DBT Cloud transformations can run against Snowflake in addition to Redshift.

## How Data Gets Into Your Warehouse

1. **Organization uploads data** to their S3 data lake.
2. **Glue crawlers** auto-detect the schema and update the Glue Data Catalog.
3. **Redshift Spectrum** creates external tables pointing to the organization's S3 and Glue catalog.
4. **Metadata is synced** to Neo4j, making the data discoverable by BrightAgent and the webapp.
5. **Optionally**, data is synced to Snowflake for organizations that use it.
