Attaching metadata to assets can help make your pipelines easier for you and other team members to understand. Data about your data assets can be attached to both asset definitions and materializations.
By the end of this guide, you'll understand how to attach metadata to assets and view that metadata in the Dagster UI.
Definition metadata is information that's fixed or doesn't frequently change. For example, definition metadata could be the storage location of a table, a link the asset's definition in GitHub, or who owns the asset.
Runtime, or materialization metadata is information that changes after a materialization occurs. This could be how many records were processed or how long an asset took to materialize.
How metadata is attached to an asset depends on the type of metadata being attached. Refer to the following sections for more details.
Attaching arbitrary metadata to an asset definition is done using the metadata argument and a dictionary of key/value pairs. Keys must be a string, but values can:
Be any of the MetadataValue classes provided by Dagster
Primitive Python types, which Dagster will convert to the appropriate MetadataValue
For example, to attach the name of the table we expect to store the asset in, we'll add a "dataset_name" entry to the metadata argument:
from dagster_duckdb import DuckDBResource
from dagster import asset
# ... other assets@asset(
deps=[iris_dataset],
metadata={"dataset_name":"iris.small_petals"},)defsmall_petals(duckdb: DuckDBResource)->None:with duckdb.get_connection()as conn:
conn.execute("CREATE TABLE iris.small_petals AS SELECT * FROM iris.iris_dataset WHERE"" 'petal_length_cm' < 1 AND 'petal_width_cm' < 1")
Dagster provides a standard set of metadata keys that can be used for common types of metadata, such as an asset's URI or column schema. Note: These entries are intended to be a starting point, and we encourage you to create your own metadata keys that make sense within the context of your data platform.
Did you know? If using Dagster+ Pro, you can create asset-based alerts that will automatically notify an asset's owners when triggered. Refer to the Dagster+ alert documentation for more information.
An asset can have multiple owners, defined using the owners argument on the @asset decorator. This argument accepts a dictionary of owners, where each value is either an individual email address or a team. Teams must include a team: prefix; for example: team:data-eng.
The asset in the following example has two owners: richard.hendricks@hooli.com and the data-eng team.
from dagster import asset
@asset(owners=["richard.hendricks@hooli.com","team:data-eng"])defleads():...
Attaching code references to an asset definition allows you to easily navigate to the asset's source code, either locally in your editor or in your source control repository. For more information, refer to the Code references guide.
Attaching materialization metadata to an asset is accomplished by returning a MaterializeResult object containing a metadata parameter. This parameter accepts a dictionary of key/value pairs, where keys must be a string.
When specifying values, use the MetadataValue utility class to wrap the data, ensuring it displays correctly in the UI. Values can also be primitive Python types, which Dagster will convert to the appropriate MetadataValue.
In the following example, we added a row count and preview to a topstories asset:
import json
import requests
import pandas as pd
from dagster import AssetExecutionContext, MetadataValue, asset, MaterializeResult
@asset(deps=[topstory_ids])deftopstories(context: AssetExecutionContext)-> MaterializeResult:withopen("data/topstory_ids.json")as f:
topstory_ids = json.load(f)
results =[]for item_id in topstory_ids:
item = requests.get(f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json").json()
results.append(item)iflen(results)%20==0:
context.log.info(f"Got {len(results)} items so far.")
df = pd.DataFrame(results)
df.to_csv("data/topstories.csv")return MaterializeResult(
metadata={"num_records":len(df),# Metadata can be any key-value pair"preview": MetadataValue.md(df.head().to_markdown()),# The `MetadataValue` class has useful static methods to build Metadata})
Dagster provides a standard set of metadata keys that can be used for common types of metadata, such as an asset's URI or column schema. Note: These entries are intended to be a starting point, and we encourage you to create your own metadata keys that make sense within the context of your data platform.
For assets which produce database tables, you can attach table metadata to provide additional context about the asset. Table metadata can include information such as the schema, row count, or column lineage. Refer to the Table metadata documentation for more information, or the Column-level lineage documentation for specific details on column-level lineage.
A Dagster+ Pro plan is required to use this feature.
Dagster+ users can view and add numeric asset materialization metrics to Insights, allowing you to track user-provided metrics alongside Dagster+ metrics.
The following is a set of standard asset metadata entries that can be included in the dictionaries passed to metadata attributes of @asset, MaterializeResult, etc. Many of these receive special treatment in Dagster's UI, such as dagster/column_schema resulting in a Columns section on the Overview tab of the Asset details page.
The dagster prefix indicates that the Dagster package takes responsibility for defining the meaning of these metadata entries.
Key
Details
dagster/uri
Value:str
Description: The URI for the asset, e.g. "s3://my_bucket/my_object"
Description: For an asset that's a table, the lineage of column inputs to column outputs for the table. Refer to the Column lineage documentation for details.
dagster/row_count
Value:int
Description: For an asset that's a table, the number of rows in the table. Refer to the Table metadata documentation for details.
dagster/partition_row_count
Value:int
Description: For a partition of an asset that's a table, the number of rows in the partition.
dagster/table_name
Value:str
Description: A unique identifier for the table/view, typically fully qualified. For example, my_database.my_schema.my_table
Description: A list of code references for the asset, such as file locations or references to Github URLs. Refer to the Code references documentation for details. Should only be provided in definition-level metadata, not materialization metadata.