Technical Overview
A high-level overview of the tech stack powering our platform.
๐งฑ How the Stack Works Together
Our platform leverages a modular, cloud-native architecture designed to help data-driven organizations ingest, govern, transform, and explore their dataโthrough a seamless UI and a powerful conversational interface.
๐ Unified Orchestration via GraphQL API
At the heart of the system is a GraphQL API, deployed serverlessly on AWS Lambda, acting as the central orchestrator.
- Exposes a unified API to the frontend.
- Coordinates ingestion with Airbyte.
- Enables integrates with Snowflake / Redshift.
- Integrates with OpenMetadata.
- Integrates with LangChain (BrightBot)
- Handles global authentication, command routing, and workflow abstraction.
- Provides endpoints supporting different clients.
๐ง Metadata & Governance with Neo4j
Our Neo4j knowledge graph acts as the control plane, representing all platform entities: data sources, files, pipelines, warehouses, and transformations.
- Powers the backend operations as well as different services.
- Supports Cypher queries for consumption by the backend / BrightBot.
- Enables semantic search and natural language queries via BrightBot.
- Serves as the metadata registry for:
- Data Catalog.
- Ingestion pipelines (Airbyte).
- Warehousing (Redshift, Snowflake).
- BrightBot agents.
๐ Seamless Ingestion
Ingestion is powered by a self-hosted Airbyte instance (on EC2).
- Coordinates with backend (Apollo) for metadata storage.
- Configures sources/connections via UI.
- Schedules and monitors syncs to S3.
- Automatically maps to assets into Neo4j.
This integration ensures ingestion is scalable, trackable, and governed from the start.
๐ Transformations with DBT Cloud
Once data lands in Redshift or Snowflake, itโs transformed via DBT Cloud.
- Analysts use DBTโs transformation-as-code model.
- Models and jobs are registered in Neo4j.
๐ง Warehousing
We support Amazon Redshift and Snowflake as analytical destinations.
- Warehouse configurations are UI-driven.
- Schemas are indexed in Neo4j.
- Enables:
- High-performance analytics.
- Cross-source joins
- Governed, traceable storage
๐งพ File Uploads & Data Catalog
Users can upload structured and unstructured files (CSV, PDF, images, videos) directly via the platform.
- Enabled through backend (Apollo).
- Accessible from the UI.
- Files are stored in Amazon S3.
- Automatically indexed in Neo4j.
- Metadata includes file type, schema, source relationships. These files become searchable, queryable, and tightly integrated into the data discovery experience.
๐ฌ Conversational Intelligence with BrightBot
BrightBot is our AI-powered multi-agent system that interacts primarily with backend (Apollo):
- Stores and retrieves data to / from Neo4j.
- Available in the UI.
- Agents below directly interacting with Apollo endpoints.
- Supervisor Agent: Parses user intent and delegates tasks.
- Retrieval Agent: Uses RAG to fetch relevant metadata.
- Engineering Agent: Generates DBT code.
- Analytics Agent: Builds Jupyter notebooks.
- Visualization Agent: Creates charts and dashboards.
๐งฉ Modular and Extensible by Design
Each system component is decoupled and modular, including:
- Ingestion: Airbyte.
- Transformation: DBT.
- Metadata: Neo4j.
- Storage: S3, Redshift, Snowflake.
- Access: GraphQL API, BrightBot.
This design ensures the platform is:
- Extensible to new tools and sources.
- Cloud-agnostic and flexibly deployable.
- Developer-friendly, with schema-based APIs and integrations.