Data connections
Data connections are secure configurations to supported data sources.
Hex makes it easy to securely connect to your data warehouse or cloud storage provider.
Supported data sources
Project and workspace data connections
- Users will need the Admin workspace role to create workspace data connections. Workspace data connections are available on the Professional, Team and Enterprise plans.
- Users will need Can Edit or higher project permissions to create project data connections.
Hex has two types of data connections:
- Workspace data connections are created and managed by Admins, from the Data Sources page of Settings. Workspace data connections can be used across multiple projects.
- Project data connections are created by Editors, from the the Data browser sidebar of the Notebook view. Project data connections are scoped to the project they are created in, and cannot be used across projects.
In general, workspace data connections are recommended over project data connections as they can be used across projects rather than configured for each new project. Since they are centrally managed by admins, they also remove the need for Editors to have access to data warehouse credentials. Further, workspace data connections offer additional configuration features including automated schema refreshes and data connection permissions.
Add a workspace data connection
To create a workspace data connection, an Admin should head to Settings → Data sources, and select the + Connection button.
When you add a new connection, you will be prompted to select a data source. Fill in the relevant connection details and make sure that Hex is allowed to connect to your data source.
Workspace data connections can also be created from within a project (as described below) by enabling the Share with workspace setting — this option will be enabled by default if an Admin is creating the connection.
Add a project data connection
Navigate to the Data sources tab within a project in the left sidebar, and select "Add".
Follow the same steps as above to create the data connection.
Example data connection form
Required connection details
The data connection form fields differ depending on the data connection type. For most data connections (PostgreSQL, MySQL, Redshift, MS SQL Server, ClickHouse, AlloyDB, MariaDB, Materialize & Cloud SQL) you'll need the following connection details to complete the data connection form:
- Database host url
- Database port
- Database name
- Username & password
BigQuery
To connect to BigQuery, you'll need:
- Google Cloud Project ID (instructions to locate)
- Entire contents of a JSON Service Account Key (instructions to generate)
Note, by default the Use BigQuery Storage API option is enabled when creating a BQ connection. This connection method significantly improves BigQuery query performance in Hex. However it requires two additional permissions be granted to the service account: bigquery.readsessions.create
and bigquery.readsessions.getData
. Additionally, there may be some BigQuery costs associated when using this method. Pricing details from Google is available here.
Snowflake
To connect to Snowflake, you'll need:
- Account name
- Warehouse
- Database
- Key Pair or OAuth
If connecting to Snowflake using Key Pair authentication, paste the entire multi-line key, including "-----BEGIN PRIVATE KEY-----" and "-----END PRIVATE KEY-----", in the Private Key box.
As of October 2024, Snowflake is transitioning away from password-only authentication.
Existing data connections authenticated using username & password authentication will continue to function, but the username & password fields will be uneditable. Connections will fail if the Snowflake password is changed or if MFA is enabled for the account used to authenticate the data connection. It is highly recommended to switch to Key Pair authentication to avoid issues going forward.
Databricks
For Databricks, you'll need:
- JDBC URL (instructions to locate)
- Hex currently only supports the legacy JDBC driver where the JDBC URL start with
jdbc:spark
- Hex currently only supports the legacy JDBC driver where the JDBC URL start with
- Access token (instructions to generate)
Dremio
For Dremio, you'll need:
- JDBC URL (retrieve this from the Project Settings page under General information in your Dremio account)
- Access token (instructions to generate)
Trino & Starburst
For Trino & Starburst, you'll need:
- Database host url
- Database port
- Username & password
Presto
For Presto, you'll need:
- Database host url
- Database port
- Catalog name
- Username & password
Athena
For Athena, you'll need:
- AWS access key ID
- AWS secret access key
- S3 bucket to write query results to
- The default port is 443 and the host is
athena.[region].amazonaws.com
(if you're using a VPC to connect then the host address is[vpc-specific-url].athena.[region].amazonaws.com
).
Note: only AWS users who do not use MFA authentication are supported.
AWS uses the AWSQuicksightAthenaAccess policy as an example policy for JDBC connections. The IAM user will also need permissions to utilize prepared statements.
Motherduck
For MotherDuck, you'll need:
- MotherDuck Token (instructions to generate)
Using data connections in projects
Any available project or workspace data connections are listed in the Data sources tab. When a workspace connection is first used in a project, it will be imported into the project.
Removing data connections from projects
To remove a data connection from a project, use the three-dot menu next to the data connection's name to remove it. You also have the option of replacing any existing references to the connection before removing it.
Manage data connection schemas
When a new data connection is added, Hex retrieves information about the objects that can be queried with these credentials, populating the data browser with information about the schemas, tables, columns and columns, collectively referred to as a "schema".
This information can be refreshed any time from the refresh button of the data browser (see below for information on restricting this action), or by scheduling a refresh.
Schema filtering
Admins can filter databases, schemas and/or tables from a data connection to display specific data in the Data browser. On refresh, only the non-filtered databases, schemas and/or tables will be synced, allowing Admins to curate what is viewable when exploring a connection.
Below are some scenarios where you'd want to consider filtering your connection:
- You have a large warehouse by default but expect that most folks will only use a small subset of the warehouse. By filtering, you will greatly improve the viewer experience of data browsing for your end users by providing a focused view of relevant data.
- What's helpful for end viewers is also helpful for Hex Magic. By filtering, you will reduce the scope of tables Magic has access to which can result in better Magic suggestions.
Filtering your connection does not affect the underlying permissions set in your data connection. Even if a database, schema, and/or table is hidden from the Data browser, it can still be queried from if the user has proper permissions to do so.
Schedule schema refreshes
Scheduled refreshes are only available for workspace data connections
Workspace data connections can be configured to refresh the data connection's schema on a schedule. On refresh, Hex updates the data browser with your latest data connection schema, making any new schema data available as well as removing anything that was deleted.
Automated refreshes query your data connection's entire schema. Consider query costs and your data warehouse load when deciding on a cadence schedule.
Workspace data connection permissions
Workspace data connections offer additional permissions that can be used to restrict access to sensitive information.
Can query
This permission determines who can use the connection to author queries. By default, all members of a workspace are granted Can query access to a data connection.
This permission can be set to a list of groups to explicitly define which users can use this connection to author queries.
If this connection is imported in a project where a user with Can edit project access does not have Can query access to the data connection, the user will be prevented from authoring queries by being downgraded to Can view project access.
Admins always have Can query access to data connections.
Can view results
Can view results data permissions are only available on the Enterprise plan.
This permission determines who can view the results of projects where this connection is imported, and can effectively be used to lock users out from projects.
By default, all users are granted Can view results access to a data connection, including any anonymous users if the project is shared with the web.
This permission can be set to a list of groups to explicitly define which users can view results when this data connection is used.
If this connection is imported in a project where a user with access to the project does not have Can view results access to the data connection, the user will be prevented from accessing the project.
Admins, and any groups that are granted Can query access always have Can view results access to a data connection.
Data connection permission rules are determined based on which connections are imported into a project — a connection may not be actively used by any cells, but might still be imported. Follow these steps to remove a connection from a project completely.
Can refresh schema
By default, all users that have can query access can manually refresh the data connection schema via the Refresh button in the data browser. To restrict this ability and only allow Admins to manually refresh the data browser, update the "Can refresh schema" setting in your data connection configuration to "Admins only".
Use data connections to query your database
Once you have configured a data connection, you can query it using a SQL cell. See the SQL cells section for more details.
Some Hex-native functionality, including Query Mode, Pivot cells, and Table display cells, rely on modern CTE functionality when operating on data directly in your warehouse.
If you are using an older version of some data sources (such as PostgreSQL<8.4
, MySQL<8.0
, or MS Sql Server<2005
), you will need to query your data into Hex and utilize these features locally, rather than pointing them directly to your warehouse.
Database security
For each data connection, it's possible to configure SSH or SSL/TLS for a more secure connection and encryption in transit.
Using Secure Socket Layer (SSL) or Transport Layer Security (TLS) provides an additional layer of security by encrypting data in transit that moves from Hex to your database instance. Some data connection types have SSL/TLS configured by default.
In the table below, Enabled by default
indicates that creating a data connection with that database will have encryption in transit enabled by default, and no further action is required. If Configurable
, please reference the documentation linked in the first column for instructions on how to configure SSL/TLS for that specific database.
Data Connection | SSL/TLS |
---|---|
Athena | Enabled by default |
BigQuery | Enabled by default |
Databricks | Enabled by default |
Materialize | Enabled by default |
Presto | Enabled by default |
Redshift | Enabled by default |
Snowflake | Enabled by default |
MotherDuck | Enabled by default |
AlloyDB | Configurable |
Clickhouse | Configurable |
Cloud SQL (MySQL) | Configurable |
Cloud SQL (PostgreSQL) | Configurable |
Cloud SQL (SQL Server) | Configurable |
Dremio | Configurable |
MariaDB | Configurable |
MS SQL Server | Configurable |
MySQL | Configurable |
Postgres | Configurable |
Trino | Configurable |
Starburst | Configurable |