Skip to main content

YAML specification

Reference documentation for writing semantic resources in Hex.

info

Authoring semantic models within Hex is in beta.
Minor changes may be made to the following formats during the beta period.

Resources

Each YAML document represents a single semantic resource.

  • Models declare reusable definitions for tables, measures, dimensions, and relationships in a data warehouse.
  • Views are optionally declared on top of models and provide fit-for-purpose entrypoints for users to conduct self-serve analysis.

Model specification

Models declare a reusable definition for a data table alongside its measures, dimensions, and relationships. Models act as the core building blocks for all analytical logic within the semantic project. Data models typically correspond closely to tables in your database.

id: users
type: model
base_sql_table: public.users
dimensions:
- id: email
type: string
unique: true
measures:
- id: total_users
func: count
relations:
- id: organizations
type: many_to_one
join_sql: org_id = ${organizations}.id
*id:ID

The unique identifier for this model.

This identifier is used as its reference across all models and must be unique across the semantic project. Changing this identifier may invalidate existing references.

type:"model"

The string "model" to discern this resource as a data model.

One of base_sql_table or base_sql_query is required to specify the data for this model:

*base_sql_table:string

A table or SQL view in the data connection.

id: customers
base_sql_table: public.customers
*base_sql_query:string

A SQL query that produces a table.

id: subscribers
base_sql_query: |
SELECT * FROM public.customers
WHERE tier IN ('PREMIUM', 'Plus');
name:string

The user-facing display name for this model.

If omitted, defaults to the sentence-case value of id.

description:string

The user-facing description for this model.

visibility:Visibility

The visibility of this model.

Defining a model as internal or private will prevent it from appearing within the data browser as an explore entrypoint for users, which can be helpful to hide models which are more clearly exposed with a view.

Defaults to public.
measures:Measure[]

List of measures for this model.

dimensions:Dimension[]

List of dimensions for this model.

relations:Relation[]

List of relations for this model.

Dimensions

A dimension is a field backed by a physical column, a SQL expression, or a formula based expression. Dimensions are used to filter, group by, and inspect the details of analysis.

*id:ID

The unique identifier for this dimension.

This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.

One of expr_sql or expr_calc can be used to specify the logic for selecting this dimension.

If both expr_sql and expr_calc are omitted, then expr_sql is defaulted to the value of id.

expr_sql:string

A SQL column expression that produces a value for each row.

Often this is simply a column name.

id: sku
type: string
expr_sql: stock_keeping_unit

Or a more complex SQL expression:

id: subscription_attribution_quarter
type: datetime_tz
expr_sql: DATE_TRUNC(subscribed_at, 'quarter')
expr_calc:string

A Hex calculation formula that produces a value for each row.

id: full_name
type: string
expr_calc: CONCAT(first_name, ' ', last_name)
*type:DataType

The data type of this dimension.

unique:boolean

If true, this dimension is unique across all rows in the model, such as for a primary key column.

At least one unique: true dimension is required on each model. This is used to properly aggregate measures across relationships without double counting.

name:string

The user-facing display name for this dimension.

If omitted, defaults to the sentence-case value of id.

description:string

The user-facing description for this dimension.

visibility:Visibility

The visibility of this dimension.

Defaults to public.

Measures

A measure defines an aggregation which derives a single value from a group of records. Measures are used to quantify large amounts of data into meaningful values.

*id:ID

The unique identifier for this measure.

This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.

One of func, func_sql or func_calc is required to specify the aggregation logic for a measure.

*func:string

A standard aggregation function over a provided dimension.

  • count: Count of records. If of is specified (optional in this case), then only non-null values for that dimension will be calculated.
  • count_distinct: Count of unique values over the dimension.
  • sum, avg, median, min, max, stddev, stddev_pop, variance, and variance_pop:
    Standard mathematical functions over a number dimension.
of:string

The dimension over which the aggregation is applied.

A dimension id can be used to reference an existing dimension, or a new dimension can be specified inline.

filters:Dimension[]

List of boolean dimensions which must be true for a row to be included in the measure’s aggregation.

A dimension id can be used to reference an existing dimension, or a new dimension can be specified inline.

id: total_sales
func: count
id: revenue
func: sum
of: price
id: revenue_from_us_deliveries
func: sum
of: price
filters:
- is_delivery
- expr_sql: region = 'US'
*func_sql:string

An aggregating SQL select expression that produces a single value over a set of rows.

id: adjusted_revenue
func_sql: SUM(price) * 0.9
*func_calc:string

An aggregating Hex calculation formula that produces a single value over a set of rows.

id: profits
func_calc: revenue - costs

The data type of this measure.

If omitted, defaults to number.

name:string

The user-facing display name for this measure.

If omitted, defaults to the sentence-case value of id.

description:string

The user-facing description for this measure.

visibility:Visibility

The visibility of this measure.

Defaults to public.

semi_additive:object

The semi-additive behavior of this measure when aggregated over specific dimensions.

*over:object[]

A list of dimensions that determine row selection for aggregation. Rows are selected using a hierarchical approach where dimensions are evaluated for the representative value (minimum or maximum) in the order they are specified.

Currently limited to a single dimension.

*dimension:Dimension

The dimension to use for row selection.

A dimension id can be used to reference an existing dimension.

pick:min | max

Whether to select rows with the minimum or maximum dimension value.

If omitted, defaults to max.

groupings:Dimension[]

A list of dimensions to group by when determining minimum or maximum values. A representative value will be selected from each group.

Semi-additive

A measure is semi-additive when it cannot be added together when aggregating over some dimension(s). Instead, a representative value should be determined and all rows matching that value should be aggregated.

Semi-additive measures are useful on snapshot tables where values should not be added together when aggregating over time. In the following example, the total of all account balances is the sum of balances only on the latest date.

id: total
func_sql: SUM(balance)
semi_additive:
over:
- dimension: date

If the data is such that the accounts do not have a row for every date, then groupings defines how to slice the data to pick the latest balance for each account. (This requires that the snapshot reflects when an account balance goes to 0, otherwise the last valid balance for an account will be included, which could include closed accounts.)

id: total
func_sql: SUM(balance)
semi_additive:
over:
- dimension: date
groupings:
- account_id

Relations

A relation defines how two models connect to each other, allowing you to analyze data across multiple tables. For example, you might connect a sales model to a customers model to analyze sales by customer characteristics.

id: owner
target: users
type: many_to_one
join_sql: owner_id = ${owner}.id
*id:ID

The unique identifier for this relation. Typically, this is the id of the model being joined to.

This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.

target:ID

The target model this relation joins to.

If omitted, defaults to the value of id.

*join_sql:string

The SQL condition to join the base model to a target.

Join conditions are typically of the form foreign_key = ${relation_id}.primary_key.

*type:string

The cardinality of the join, from the base model to the target.

One of the following values:

  • many_to_one
  • one_to_many
  • one_to_one
visibility:Visibility

The visibility of this relation.

Defaults to public.

View specification

Views are declared on top of models and provide fit-for-purpose entrypoints for users to conduct self-serve analysis.

Views can rename and reorganize the dimensions, measures, and relations of models for clarity and readability. Using this feature can help simplify complex join trees or flatten relations to simpler concepts for your users.

Views are an optional feature. Models and views share all analytical capabilities in Hex. Views can be incrementally introduced to build a cleaner facade over your data models as it becomes more complex.

id: revenue
type: view
base: order_items
contents:
- measures:
- revenue
dimensions:
- timestamp
- orders.is_delivery
- relation: orders.customers
dimensions: ...
*id:ID

The unique identifier for this view.

This identifier is used as its reference across all resources and must be unique across the semantic project. Changing this identifier may invalidate existing references.

*type:"view"

The string "view" to discern this resource as a view.

*base:string

The data model to use as the grain for this view.

*contents:Group[]

The contents for this view.

Groups

A group is a set of dimensions and measures to display to the end user in the UI. Groups are displayed as folders in Hex UIs. The layout of groups only affects the visual organization of dimensions and measures, and has no effects on query logic.

The most common practice is to create one group per related entity you want to expose as part of the view.

relation:string

Dot-separated relation path for this group, relative to the parent (base for top-level groups). Sets the default model for declarations within this group.

If omitted, defaults to the base model.

name:string

The user-facing display name for this group.

If omitted, defaults to the name of the group's relation.

description:string

The user-facing description for this group.

If omitted, defaults to the description of the group's relation.

dimensions:string[]

List of dimensions to include in this group, relative to the group's relation.

measures:string[]

List of measures to include in this group, relative to the group's relation.

contents:Group[]

Nested groups to be displayed as an inner tree.

Using nested groups can be useful to explicitly surface a hierarchical relationships between entities. However, for the simplest experience for end-users, consider hoisting up nested relations to a single flat list of uniquely named groups.

Wildcard Inclusions

To include all dimensions or measures from the group's relation, you may use the ... shorthand in place of a list of items:

- relation: orders.customers
dimensions: ...

To omit some items from the ... list, you may specify it as part of a list with excluded items prefixed with ~:

- relation: orders.customers
dimensions:
- ...
- ~phone_number # not relevant for revenue analytics

Overriding names and descriptions

To include an item but under a different name or with a different description, specify an object instead of a string, and provide name or description to override the item's fields:

- relation: orders.customers
dimensions:
- dimension: is_subscriber
name: Ordered by subscriber
measures:
- measure: count
description: Unique count of users contributing to revenue

Shared types

ID

An ID is a string that conforms to the following rules:

  • Begins with a lowercase letter or an underscore
  • Only contains lowercase letters, underscores, and numbers
  • Between 2 and 128 characters long (inclusive)

The following IDs are reserved by the system and cannot be used:

this, self, dataset, model, view, metric, env, _hex*

DataType

Data types are abstract. You do not need to specify the exact subtype of number (int64, unsigned int, double, float, etc), instead you can just specify "number".

  • number: Includes INT, BIGINT, FLOAT, DECIMAL, DOUBLE, REAL, etc.
  • string: Includes CHAR, VARCHAR, TEXT, etc.
  • timestamp_tz: Timezone-aware.
  • timestamp_naive: Without timezone information.
  • date: The date portion of a timestamp; always without a timezone.
  • boolean: True or false.
  • other: Any other type without special support in Hex.

Visibility

The visibility key controls where a resource can be used and who can see it. Hex supports three built-in values:

  • public: Can be viewed and used by everyone.
  • internal: Can be referenced by other resources in the semantic project, but will be hidden from users.
  • private: Can only be used within the current resource. Hidden from users.

The visibility key should not be relied on as a security control and is only used to visually hide content in the UI. For the strongest security guarantees, configure OAuth or role-based access within your database.

SQL interpolation

Any key containing sql will be treated as SQL to execute in your database. To reference semantic entities within these SQL snippets, you may use interpolation within ${}.

Interpolations can reference all the dimensions, measures, and relations within the current model by their id.

Specifying a dimension id inside of a ${} will interpolate a column expression which produces the value for the referenced dimension. This can be used to create derived dimensions, aggregates on top of dimensions, or for the join condition of relations.

dimensions:
- id: customer_tier
type: string
- id: is_premium_tier
func_sql: ${customer_tier} IN ('Premium', 'Plus')

Specifying a measure id inside of a ${} will interpolate an aggregating expression which produces the value for the referenced measure. This is only valid inside of aggregating contexts, such as another measure.

measures:
- id: revenue
func: sum
of: price
- id: costs
func: sum
of: cost_of_goods
- id: profit
func_sql: ${revenue} - ${costs}

Specifying a relation id inside of a ${} will interpolate the alias name for the joined table, which may differ from the id of the relation. Using dot-notation inside of the interpolation will allow accessing the measures and dimensions inside of the joined model.

relations:
- id: customers
type: many_to_one
# reference the join alias
join_sql: customer_id = ${customers}.id

measures:
- id: adjusted_revenue
# reference a dimension on `customers`
# let's assume premium tier users get a 10% discount
func_sql: SUM(IF(${customers.is_premium_tier}, price * 0.9, price))

Note the distinction between referencing a dimension on a relation versus referencing the relation's table and then accessing a SQL column:

  • ${customers.full_name} accesses the full_name dimension on the customers model
  • ${customers}.first_name accesses the underlying first_name column on the customers model's underlying table.

Calculation formulas

Any key containing calc will be treated as a Hex calculation formula.

Calculation formulas can reference all the dimensions, measures, and relations within the current model by their id. SQL columns which are not included as dimensions are not in scope and cannot be referenced.

File and directories

The Hex semantic specification currently infers no behavior from the names or locations of YAML files. A file can be named anything so long as it ends with a valid YAML extension (.yaml or .yml).

Multi-document YAML is fully supported. Two or more semantic resources can be specified in the same file when separated by a --- divider.

Examples

This example builds a simple AAR measure between a users and organizations model, without defining anything else. It demonstrates the basic approach when authoring models.

id: organizations
type: model
base_sql_table: public.organizations

measures:
- id: arr
name: Annual reoccurring revenue
func: sum
of: users.annual_seat_price

dimensions:
- id: organization_id
type: string
unique: true

relations:
- id: users
type: one_to_many
join_sql: id = ${users}.org_id
id: users
type: model
base_sql_table: public.users

dimensions:
- id: annual_seat_price
type: number
expr_sql: IF(seat_type = 'VIEW_ONLY', 20, 60)