Skip to main content

YAML specification

Reference documentation for writing semantic models in Hex.

info

Authoring semantic models within Hex is in beta.
The following formats may change during the beta period.

Model specification

Each YAML file represents a single semantic data model. A model defines a table of data and its dimensions, measures, and relations.

id: users
base_sql_table: public.users
dimensions:
- id: email
type: string
measures:
- id: total_users
func: count
relations:
- id: organizations
type: many_to_one
join_sql: org_id = ${organizations}.id
*id:ID

The unique identifier for this model.

This identifier is used as its reference across all models and must be unique across the semantic project. Changing this identifier may invalidate existing references.

One of base_sql_table or base_sql_query is required to specify the data for this model:

*base_sql_table:string

A table or view in the data connection.

id: customers
base_sql_table: public.customers
*base_sql_query:string

A SQL query that produces a table.

id: subscribers
base_sql_query: |
SELECT * FROM public.customers
WHERE tier IN ('PREMIUM', 'Plus');
name:string

The user-facing display name for this model.

If omitted, defaults to the sentence-case value of id.

description:string

The user-facing description for this model.

measures:Measure[]

List of measures for this model.

dimensions:Dimension[]

List of dimensions for this model.

relations:Relation[]

List of relations for this model.

Dimensions

A dimension is a field backed by a physical column, a SQL expression, or a formula based expression. Dimensions are used to filter, group by, and inspect the details of analysis.

*id:ID

The unique identifier for this dimension.

This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.

One of expr_sql or expr_calc can be used to specify the logic for selecting this dimension.

If both expr_sql and expr_calc are omitted, then expr_sql is defaulted to the value of id.

expr_sql:string

A SQL column expression that produces a value for each row.

Often this is simply a column name.

id: sku
type: string
expr_sql: stock_keeping_unit

Or a more complex SQL expression:

id: subscription_attribution_quarter
type: datetime_tz
expr_sql: DATE_TRUNC(subscribed_at, 'quarter')
expr_calc:string

A Hex calculation formula that produces a value for each row.

id: full_name
type: string
expr_calc: CONCAT(first_name, ' ', last_name)
*type:DataType

The data type of this dimension.

unique:boolean

If true, this dimension is unique for all rows in the model. This dimension may be used in the construction of primary keys.

name:string

The user-facing display name for this dimension.

If omitted, defaults to the sentence-case value of id.

description:string

The user-facing description for this dimension.

visibility:Visibility

The visibility of this dimension.

Defaults to public.

Measures

A measure defines an aggregation which derives a single value from a group of records. Measures are used to quantify large amounts of data into meaningful values.

*id:ID

The unique identifier for this measure.

This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.

One of func, func_sql or func_calc is required to specify the aggregation logic for a measure.

*func:string

A standard aggregation function over a provided dimension.

  • count: Count of records. If of is specified (optional in this case), then only non-null values for that dimension will be calculated.
  • count_distinct: Count of unique values over the dimension.
  • sum, avg, median, min, max, stddev, stddev_pop, variance, and variance_pop:
    Standard mathematical functions over a number dimension.
of:string

The dimension over which the aggregation is applied.

A dimension id can be used to reference an existing dimension, or a new dimension can be specified inline.

filters:Dimension[]

List of boolean dimensions which must be true for a row to be included in the measure’s aggregation.

A dimension id can be used to reference an existing dimension, or a new dimension can be specified inline.

id: total_sales
func: count
id: revenue
func: sum
of: price
id: revenue_from_us_deliveries
func: sum
of: price
filters:
- is_delivery
- expr_sql: region = 'US'
*func_sql:string

An aggregating SQL select expression that produces a single value over a set of rows.

id: adjusted_revenue
func_sql: SUM(price) * 0.9
*func_calc:string

An aggregating Hex calculation formula that produces a single value over a set of rows.

id: profits
func_calc: revenue - costs

The data type of this measure.

If omitted, defaults to number.

name:string

The user-facing display name for this measure.

If omitted, defaults to the sentence-case value of id.

description:string

The user-facing description for this measure.

visibility:Visibility

The visibility of this measure.

Defaults to public.

Relations

A relation defines how two models connect to each other, allowing you to analyze data across multiple tables. For example, you might connect a sales model to a customers model to analyze sales by customer characteristics.

id: owner
target: users
type: many_to_one
join_sql: owner_id = ${owner}.id
*id:ID

The unique identifier for this relation. Typically, this is the id of the model being joined to.

This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.

target:ID

The target model this relation joins to.

If omitted, defaults to the value of id.

*join_sql:string

The SQL condition to join the base model to a target.

Join conditions are typically of the form foreign_key = ${relation_id}.primary_key.

*type:string

The cardinality of the join, from the base model to the target.

One of the following values:

  • many_to_one
  • one_to_many
  • one_to_one
name:string

The user-facing display name for this relation.

If omitted, defaults to the sentence-case value of id.

description:string

The user-facing description for this relation.

visibility:Visibility

The visibility of this dimension.

Defaults to public.

Shared types

ID

An ID is a string that conforms to the following rules:

  • Begins with a lowercase letter or an underscore
  • Only contains lowercase letters, underscores, and numbers
  • Between 2 and 128 characters long (inclusive)

The following IDs are reserved by the system and cannot be used:

this, self, dataset, model, env, _hex*

DataType

Data types are abstract. You do not need to specify the exact subtype of number (int64, unsigned int, double, float, etc), instead you can just specify "number".

  • number: Includes INT, BIGINT, FLOAT, DECIMAL, DOUBLE, REAL, etc.
  • string: Includes CHAR, VARCHAR, TEXT, etc.
  • timestamp_tz: Timezone-aware.
  • timestamp_naive: Without timezone information.
  • date: The date portion of a timestamp; always without a timezone.
  • boolean: True or false.
  • other: Any other type without special support in Hex.

Visibility

visibility controls where a semantic object can be used and who can see it.visibility supports 3 built-in values:

  • public: Can be viewed and used by everyone.
  • internal: Can be referenced by other models in the semantic project, but will be hidden from end-users.
  • private: Can only be used within the current model. Hidden from end-users.

visibility should not be relied on as a security control and is only used to visually hide content in the UI. For the strongest security guarantees, configure OAuth or role-based access within your database.

SQL interpolation

Any key containing sql will be treated as SQL to execute in your database. To reference semantic entities within these SQL snippets, you may use interpolation within ${}.

Interpolations can reference all the dimensions, measures, and relations within the current model by their id.

Specifying a dimension id inside of a ${} will interpolate a column expression which produces the value for the referenced dimension. This can be used to create derived dimensions, aggregates on top of dimensions, or for the join condition of relations.

dimensions:
- id: customer_tier
type: string
- id: is_premium_tier
func_sql: ${customer_tier} IN ('Premium', 'Plus')

Specifying a measure id inside of a ${} will interpolate an aggregating expression which produces the value for the referenced measure. This is only valid inside of aggregating contexts, such as another measure.

measures:
- id: revenue
func: sum
of: price
- id: costs
func: sum
of: cost_of_goods
- id: profit
func_sql: ${revenue} - ${costs}

Specifying a relation id inside of a ${} will interpolate the alias name for the joined table, which may differ from the id of the relation. Using dot-notation inside of the interpolation will allow accessing the measures and dimensions inside of the joined model.

relations:
- id: customers
type: many_to_one
# reference the join alias
join_sql: customer_id = ${customers}.id

measures:
- id: adjusted_revenue
# reference a dimension on `customers`
# let's assume premium tier users get a 10% discount
func_sql: SUM(IF(${customers.is_premium_tier}, price * 0.9, price))

Note the distinction between referencing a dimension on a relation versus referencing the relation's table and then accessing a SQL column:

  • ${customers.full_name} accesses the full_name dimension on the customers model
  • ${customers}.first_name accesses the underlying first_name column on the customers model's underlying table.

Calculation formulas

Any key containing calc will be treated as a Hex calculation formula.

Calculation formulas can reference all the dimensions, measures, and relations within the current model by their id. SQL columns which are not included as dimensions are not in scope and cannot be referenced.

Examples

This example builds a simple AAR measure between a users and organizations model, without defining anything else. It demonstrates the basic approach when authoring models.

id: organizations
base_sql_table: public.organizations

measures:
- id: arr
name: Annual reoccurring revenue
func: sum
of: users.annual_seat_price

relations:
- id: users
type: one_to_many
join_sql: id = ${users}.org_id
id: users
base_sql_table: public.users

dimensions:
- id: annual_seat_price
type: number
expr_sql: IF(seat_type = 'VIEW_ONLY', 20, 60)