Skip to main content

Overview

Brighthive makes it easy to get your data into the platform. Upload files to your organization’s dedicated S3 data lake, and the platform automatically detects the schema, catalogs the data in Neo4j, and makes it available for querying.

How It Works

  1. Upload your file — Through the webapp or directly to your organization’s S3 bucket.
  2. S3 stores the file — Each organization gets dedicated S3 buckets (brighthive-raw/, brighthive-staged/, brighthive-shared/) in their own AWS account.
  3. Glue crawlers detect the schema — Automatically infer column names, data types, partitions, and file format.
  4. Metadata registered in Neo4j — Schema, row counts, location, and relationships are stored in the knowledge graph.
  5. Ready for querying — Data is immediately available via Redshift Spectrum, BrightAgent, and the webapp data catalog.

Supported Formats

  • Tabular: CSV, Parquet, JSON, Avro, ORC, Excel
  • Documents: PDF
  • Media: Images, Videos
Tabular files are automatically schema-detected and made queryable. Documents and media are stored and cataloged for reference.

Data Catalog

Every uploaded file is represented as a node in Neo4j’s knowledge graph:
  • Schema — Column names, data types, and partition structure.
  • Relationships — Which organization owns it, which workspaces can access it.
  • Lineage — How the data was uploaded and any transformations applied to it.
  • Metadata — File size, row count, last updated timestamp, and custom tags.
The data catalog is accessible through both the webapp UI and BrightAgent — ask “What data do we have about customers?” and the Retrieval Agent searches Neo4j to find matching assets.

Storage Architecture

Each organization gets isolated S3 storage in their dedicated AWS account:
  • brighthive-raw/ — Original uploaded files.
  • brighthive-staged/ — Processed and cleaned data.
  • brighthive-shared/ — Data shared with workspace services.
Cross-account access from workspace Redshift is handled securely via the OrgDataCatalogRole IAM role — no credentials are shared.