Skip to main content

Overview

The Metadata Agent is responsible for keeping your data catalog current, understandable, and well-documented. It connects to OpenMetadata (OMD) via MCP to read and enrich metadata across your entire data estate — generating business-friendly descriptions, managing tags, understanding schema definitions, and tracing data lineage.

What You Can Ask

  • “Describe the customers table” — Generates a business-focused description based on schema and sample data
  • “What columns are in the orders dataset?” — Returns full schema with data types and existing documentation
  • “Add a description to the revenue column” — Updates column-level metadata in OpenMetadata
  • “Tag the email column as PII” — Applies sensitivity classifications to columns
  • “Show me the lineage for the sales_summary table” — Traces upstream sources and downstream dependencies
  • “What tables are related to inventory?” — Semantic search across your catalog to find relevant assets

How It Works

OpenMetadata Integration

The Metadata Agent connects to OpenMetadata through the Model Context Protocol (MCP), providing structured, validated access to your full data catalog.

Schema Exploration

Browse databases, schemas, and tables. View column names, data types, constraints, and existing documentation — all through the OpenMetadata catalog.

Description Generation

LLM-powered description generation that analyzes schema structure, column patterns, and sample data to produce business-focused descriptions that explain what the data means, not just what it contains.

Metadata Enrichment

Update table and column descriptions, apply PII sensitivity tags, and add documentation — all written back to OpenMetadata via JSON Patch operations.

Lineage Tracking

Trace data relationships and dependencies across your estate — see which tables feed into which, understand upstream sources, and follow transformations through the pipeline.

Description Generation

When you ask the Metadata Agent to describe a data asset, it goes beyond reading existing documentation. It uses an LLM to generate business-focused descriptions by:
  1. Retrieving metadata — Fetches the full schema from OpenMetadata (columns, types, constraints)
  2. Assessing context — Determines whether the schema alone provides enough context, or if sample data is needed
  3. Fetching sample data (when needed) — Queries your Redshift warehouse for a representative sample to understand actual data patterns
  4. Generating descriptions — Produces 2-3 sentence descriptions focused on the distinctive business characteristics of the data — what it represents, how it’s used, and what makes it unique
  5. Saving to catalog — Writes the generated description back to OpenMetadata and Neo4j so it’s available across the platform
Descriptions are written for business users — they explain what the data means in context, not how it was collected or stored.

Schema Operations

The Metadata Agent can explore your full OpenMetadata catalog hierarchy:
LevelOperations
DatabaseList databases, view database details, browse schemas within a database
SchemaList schemas, view schema details, browse tables within a schema
TableList tables, get full table metadata, view column definitions and types
ColumnView data types, constraints, existing descriptions, sensitivity tags
LineageTrace upstream sources and downstream consumers for any table

Metadata Enrichment

The agent can update metadata directly in OpenMetadata using structured patch operations:

Table Descriptions

Add or update table-level descriptions that explain the business purpose and context of each data asset.

Column Descriptions

Document individual columns with business-friendly explanations — what each field represents and how it should be interpreted.

PII Tags

Apply sensitivity classifications to columns containing personal information — emails, SSNs, phone numbers — using OpenMetadata’s PII tagging system.

Search & Discovery

Semantic search across your entire catalog using vector embeddings — find tables by what they contain, not just what they’re named.

Data Asset Discovery

Finding the right data asset is the first step in any metadata operation. The Metadata Agent uses confidence-based discovery to surface the most relevant assets:
ConfidenceThresholdBehavior
Strong Match> 60% similarityProceeds automatically with the best match
Possible Match40–60% similarityPresents options and asks you to confirm
Uncertain< 40% similarityAsks you to clarify or refine your request
Discovery searches across both vector embeddings (semantic meaning) and OpenMetadata catalog (structured metadata) to find assets that match your intent — even if you don’t know the exact table name.

Dual Catalog Architecture

The Metadata Agent works across two complementary systems:
  • OpenMetadata stores the detailed catalog metadata — schemas, column definitions, descriptions, PII tags, lineage, and quality metrics
  • Neo4j stores the relationship graph — how data assets connect to workspaces, organizations, and each other, enabling GraphRAG-powered discovery
When the agent updates a description, it writes to both systems — keeping the catalog and the knowledge graph in sync.
The Metadata Agent works alongside the Governance Agent for policy compliance and the Quality Agent for data quality checks. Together they keep your data estate documented, governed, and healthy.