YAML specification
Reference documentation for writing semantic models in Hex.
Authoring semantic models within Hex is in beta.
The following formats may change during the beta period.
Model specification
Each YAML file represents a single semantic data model. A model defines a table of data and its dimensions, measures, and relations.
id: users
base_sql_table: public.users
dimensions:
- id: email
type: string
measures:
- id: total_users
func: count
relations:
- id: organizations
type: many_to_one
join_sql: org_id = ${organizations}.id
The unique identifier for this model.
This identifier is used as its reference across all models and must be unique across the semantic project. Changing this identifier may invalidate existing references.
One of base_sql_table or base_sql_query is required to specify the data for this model:
A table or view in the data connection.
id: customers
base_sql_table: public.customers
A SQL query that produces a table.
id: subscribers
base_sql_query: |
SELECT * FROM public.customers
WHERE tier IN ('PREMIUM', 'Plus');
The user-facing display name for this model.
If omitted, defaults to the sentence-case value of id.
The user-facing description for this model.
List of measures for this model.
List of dimensions for this model.
List of relations for this model.
Dimensions
A dimension is a field backed by a physical column, a SQL expression, or a formula based expression. Dimensions are used to filter, group by, and inspect the details of analysis.
The unique identifier for this dimension.
This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.
One of expr_sql or expr_calc can be used to specify the logic for selecting this dimension.
If both expr_sql and expr_calc are omitted, then expr_sql is defaulted to the value of id.
A SQL column expression that produces a value for each row.
Often this is simply a column name.
id: sku
type: string
expr_sql: stock_keeping_unit
Or a more complex SQL expression:
id: subscription_attribution_quarter
type: datetime_tz
expr_sql: DATE_TRUNC(subscribed_at, 'quarter')
A Hex calculation formula that produces a value for each row.
id: full_name
type: string
expr_calc: CONCAT(first_name, ' ', last_name)
The data type of this dimension.
If true, this dimension is unique for all rows in the model. This dimension may be used in the construction of primary keys.
The user-facing display name for this dimension.
If omitted, defaults to the sentence-case value of id.
The user-facing description for this dimension.
The visibility of this dimension.
Measures
A measure defines an aggregation which derives a single value from a group of records. Measures are used to quantify large amounts of data into meaningful values.
The unique identifier for this measure.
This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.
One of func, func_sql or func_calc is required to specify the aggregation logic for a measure.
A standard aggregation function over a provided dimension.
- count: Count of records. If of is specified (optional in this case), then only non-null values for that dimension will be calculated.
- count_distinct: Count of unique values over the dimension.
- sum, avg, median, min, max, stddev, stddev_pop, variance, and variance_pop:
Standard mathematical functions over a number dimension.
The dimension over which the aggregation is applied.
A dimension id can be used to reference an existing dimension, or a new dimension can be specified inline.
List of boolean dimensions which must be true for a row to be included in the measure’s aggregation.
A dimension id can be used to reference an existing dimension, or a new dimension can be specified inline.
id: total_sales
func: count
id: revenue
func: sum
of: price
id: revenue_from_us_deliveries
func: sum
of: price
filters:
- is_delivery
- expr_sql: region = 'US'
An aggregating SQL select expression that produces a single value over a set of rows.
id: adjusted_revenue
func_sql: SUM(price) * 0.9
An aggregating Hex calculation formula that produces a single value over a set of rows.
id: profits
func_calc: revenue - costs
The data type of this measure.
If omitted, defaults to number.
The user-facing display name for this measure.
If omitted, defaults to the sentence-case value of id.
The user-facing description for this measure.
The visibility of this measure.
Relations
A relation defines how two models connect to each other, allowing you to analyze data across multiple tables. For example, you might connect a sales model to a customers model to analyze sales by customer characteristics.
id: owner
target: users
type: many_to_one
join_sql: owner_id = ${owner}.id
The unique identifier for this relation. Typically, this is the id of the model being joined to.
This identifier is used as its reference and must be unique across all dimensions, measures and relations in this model. Changing this identifier may invalidate existing references.
The target model this relation joins to.
If omitted, defaults to the value of id.
The SQL condition to join the base model to a target.
Join conditions are typically of the form foreign_key = ${relation_id}.primary_key
.
The cardinality of the join, from the base model to the target.
One of the following values:
- many_to_one
- one_to_many
- one_to_one
The user-facing display name for this relation.
If omitted, defaults to the sentence-case value of id.
The user-facing description for this relation.
The visibility of this dimension.
Shared types
ID
An ID is a string that conforms to the following rules:
- Begins with a lowercase letter or an underscore
- Only contains lowercase letters, underscores, and numbers
- Between 2 and 128 characters long (inclusive)
The following IDs are reserved by the system and cannot be used:
this, self, dataset, model, env, _hex*
DataType
Data types are abstract. You do not need to specify the exact subtype of number (int64, unsigned int, double, float, etc), instead you can just specify "number".
- number: Includes INT, BIGINT, FLOAT, DECIMAL, DOUBLE, REAL, etc.
- string: Includes CHAR, VARCHAR, TEXT, etc.
- timestamp_tz: Timezone-aware.
- timestamp_naive: Without timezone information.
- date: The date portion of a timestamp; always without a timezone.
- boolean: True or false.
- other: Any other type without special support in Hex.
Visibility
visibility controls where a semantic object can be used and who can see it.visibility supports 3 built-in values:
- public: Can be viewed and used by everyone.
- internal: Can be referenced by other models in the semantic project, but will be hidden from end-users.
- private: Can only be used within the current model. Hidden from end-users.
visibility should not be relied on as a security control and is only used to visually hide content in the UI. For the strongest security guarantees, configure OAuth or role-based access within your database.
SQL interpolation
Any key containing sql will be treated as SQL to execute in your database. To reference semantic entities within these SQL snippets, you may use interpolation within ${}
.
Interpolations can reference all the dimensions, measures, and relations within the current model by their id.
Specifying a dimension id inside of a ${}
will interpolate a column expression which produces the value for the referenced dimension. This can be used to create derived dimensions, aggregates on top of dimensions, or for the join condition of relations.
dimensions:
- id: customer_tier
type: string
- id: is_premium_tier
func_sql: ${customer_tier} IN ('Premium', 'Plus')
Specifying a measure id inside of a ${}
will interpolate an aggregating expression which produces the value for the referenced measure. This is only valid inside of aggregating contexts, such as another measure.
measures:
- id: revenue
func: sum
of: price
- id: costs
func: sum
of: cost_of_goods
- id: profit
func_sql: ${revenue} - ${costs}
Specifying a relation id inside of a ${}
will interpolate the alias name for the joined table, which may differ from the id of the relation.
Using dot-notation inside of the interpolation will allow accessing the measures and dimensions inside of the joined model.
relations:
- id: customers
type: many_to_one
# reference the join alias
join_sql: customer_id = ${customers}.id
measures:
- id: adjusted_revenue
# reference a dimension on `customers`
# let's assume premium tier users get a 10% discount
func_sql: SUM(IF(${customers.is_premium_tier}, price * 0.9, price))
Note the distinction between referencing a dimension on a relation versus referencing the relation's table and then accessing a SQL column:
${customers.full_name}
accesses the full_name dimension on the customers model${customers}.first_name
accesses the underlying first_name column on the customers model's underlying table.
Calculation formulas
Any key containing calc will be treated as a Hex calculation formula.
Calculation formulas can reference all the dimensions, measures, and relations within the current model by their id. SQL columns which are not included as dimensions are not in scope and cannot be referenced.
Examples
- Basic
- Sales
This example builds a simple AAR measure between a users and organizations model, without defining anything else. It demonstrates the basic approach when authoring models.
id: organizations
base_sql_table: public.organizations
measures:
- id: arr
name: Annual reoccurring revenue
func: sum
of: users.annual_seat_price
relations:
- id: users
type: one_to_many
join_sql: id = ${users}.org_id
id: users
base_sql_table: public.users
dimensions:
- id: annual_seat_price
type: number
expr_sql: IF(seat_type = 'VIEW_ONLY', 20, 60)
This example models data for a hypothetical pizza shop which utilizes more features, joins between fact and dimensional tables, and includes more dimensions and measures that end-users could slice and dice around.
id: sales
base_sql_table: public.sales
description: Fact table of sales information.
measures:
- id: revenue
func: sum
of: value
description: |
The total sales of pizza.
synonyms: sales, top line revenue
- id: number_of_customers
func: count_distinct
of: customer_id
- id: number_of_orders
func: count_distinct
of: order_id
- id: revenue_per_customer
func_calc: revenue / number_of_customers
- id: revenue_per_order
func_calc: revenue / number_of_orders
- id: avg_unit_price
func_calc: SUM(value) / SUM(quantity)
- id: orders_per_customer
func_calc: number_of_orders / number_of_customers
- id: revenue_adjusted_for_delivery_cost
func: sum
of:
expr_sql: ${value} * IF(${order.is_delivery}, 0.8, 1)
- id: revenue_from_custom_delivery_pizza
func: sum
of: value
filters:
- order.is_delivery
- expr_sql: ${product.pizza_type} = 'Custom'
dimensions:
# facts
- id: value
type: number
expr_sql: item_price * quantity
description: The total price paid for the sale.
- id: item_price
type: number
- id: quantity
type: number
- id: timestamp
type: timestamp_tz
# keys
- id: id
type: string
unique: true
visibility: internal
- id: item_id
type: number
visibility: internal
- id: customer_id
type: string
visibility: internal
- id: order_id
type: number
visibility: internal
- id: product_id
type: string
visibility: internal
relations:
- id: customer
target_dataset: customers
join_sql: ${customer_id} = ${customer.id}
type: many_to_one
- id: order
target_dataset: orders
join_sql: ${order_id} = ${order.id}
type: many_to_one
- id: product
target_dataset: products
join_sql: ${product_id} = ${product.id}
type: many_to_one
id: orders
base_sql_table: public.orders
description: Fact table of order information.
measures:
- id: count
name: Number of orders
func: count
- id: total_order_value
func: sum
of: order_value
dimensions:
# facts
- id: timestamp
type: timestamp_naive
- id: payment_method
type: string
- id: is_delivery
type: boolean
expr_sql: delivery = 'Yes'
- id: type
type: string
- id: order_value
type: number
- id: discount_code
type: string
- id: feedback_rating
type: number
- id: delivery_time
type: number
- id: preparation_time
type: number
- id: special_request
type: string
- id: referral_source
type: string
# ids
- id: id
type: number
visibility: internal
unique: true
- id: customer_id
type: string
visibility: internal
- id: cook_id
type: string
visibility: internal
- id: location_id
type: string
visibility: internal
relations:
- id: customer
target_dataset: customers
join_sql: ${customer_id} = ${customer.id}
type: many_to_one
- id: sales
join_sql: ${id} = ${sales.order_id}
type: one_to_many
id: products
base_sql_table: public.products
description: Dimensional table of product information.
dimensions:
- id: id
type: string
visibility: internal
unique: true
- id: name
type: string
- id: pizza_size
type: string
- id: pizza_shape
type: string
- id: pizza_type
type: string
- id: price
type: number
id: customers
name: Valued customers
base_sql_table: public.users
description: Dimensional table of customer information.
dimensions:
- id: id
type: string
visibility: internal
unique: true
- id: name
type: string
- id: address
type: string
- id: phone_number
type: string