Snowpark integration
Hex lets Snowflake users take advantage of Snowpark, directly from your Hex projects. Unlike with standard Pandas DataFrames, these are not loaded into the memory of a Hex project. Operations on Snowpark DataFrames are executed on Snowflake’s infrastructure. This enables Hex users to operate on large datasets without requiring additional memory in Hex.
To learn more about Snowpark, visit the Snowpark Developer Guide.
Enable Snowpark
Snowpark can be enabled on both workspace and project-level data connections. To enable Snowpark on a workspace data connection, a Hex Admin will need to update the connection via the Data sources page of the Admin settings. To enable Snowpark in a project data connection, navigate to the Data sources tab in the left sidebar of the project, locate the data connection for which you'd like to enable Snowpark, and select "Edit" from its 3-dot menu.
In the Snowflake data connection's settings, scroll to the Integrations section, and toggle on Snowpark.
Use Snowpark in a Hex project
Using Snowpark DataFrames can be an extremely effective way to manage memory usage in your Hex project. Operations on Snowpark DataFrames are executed within your Snowflake infrastructure, and Hex won't pull the contents of your DataFrames into memory unless you choose to do so. Under the hood, this is possible because Snowpark writes your DataFrames to a Snowflake temp table directly in your warehouse.
Currently, Snowpark DataFrames can be referenced in SQL, Python, filter, pivot, single-value and chart cells. See below for more details on how to leverage Snowpark DataFrames in your Hex projects.
Return a Snowpark DataFrame from a SQL cell
For SQL cells using a Snowflake warehouse as their source, you can return query results as a Snowpark DataFrame by setting the return type to Dataframe -> Snowpark in the dropdown at the top right of a SQL cell.
Note that if this DataFrame is being referenced downstream, changing the output type may cause breaking changes as Snowpark DataFrames are returned to Hex as a Python object and not a full Pandas DataFrame. Often, an effective way to continue to use the DataFrame is to call the .to_pandas()
method, bringing the full dataframe into memory.
Snowpark does not support query parameterization, which is required for Jinja templating. Because of this, Jinja syntax is not supported in Snowpark queries.
Dataframe SQL & Snowpark DataFrames
With Dataframe SQL you can use existing Snowpark DataFrames as the source for your SQL queries. Hex brings the Snowpark DataFrame into memory as a Pandas DataFrame via the .to_pandas()
function. This makes it so that DuckDB, the engine that powers Dataframe SQL cells, can operate against this data.
If your source Snowpark DataFrame is very large, there is a possibility that calling to_pandas()
will overload your project memory. In this case you may need to futher reduce the size of your output DataFrame in Snowflake before bringing it into memory, or increase your project's compute profile.
Accessing your Snowpark session from a Python cell
Hex creates a Snowpark session on kernel startup, and automatically closes that session when a kernel shuts down.
To access this Snowpark session from a Python cell, create a cell with the following code replacing the argument of the get_data_connection
method with the name of your connection.
import hextoolkit
hex_snowflake_conn = hextoolkit.get_data_connection('My data connection')
hex_snowpark_session = hex_snowflake_conn.get_snowpark_session()
You can also generate a Python cell with this code from the Data browser menu:
By default, Hex will attempt to parallelize the execution of code when possible. Manually closing your session with session.close()
may cause this parallelization to end your session prematurely.
If you manually close the default Snowpark session initiated on kernel startup, you will need to restart your kernel to restore your session or manually create a new session via the Snowpark API.
Snowpark DataFrames in transformation and visualization cells
To use a Snowpark DataFrame in a filter, pivot, or chart cell you need only indicate your source DataFrame by selecting it in the Source dropdown for your cell.
The compute actions associated with filter, pivot, and chart cells are pushed to Snowflake. For example, any aggregation or filtering in a Chart cell is executed via a call to your Snowflake database and does not happen in local project memory.
Known limitations
-
Pivot cells which use a Snowpark DataFrame as their source return a Pandas DataFrame and not a Snowpark DataFrame. Any downstream references to the output of a pivot cell will use Pandas DataFrames.
-
The Display tables integrated directly into SQL cells are limited to returning 10,000 rows of a Snowpark DataFrame. Any rows beyond this limit will not be retrieved from Snowflake.
-
Filter cells using Snowpark DataFrames as a source cannot auto-populate distinct values for filtering a given column.
-
Chart cells which use a "Group by" field have some limitations in styling by distinct column values. For example, category color may not be stable if upstream changes are made to the DataFrame and grouping by boolean columns can result in un-filterable data.
-
Snowpark DataFrames cannot be written back to Snowflake with a Hex Writeback cell.
-
Snowpark DataFrames cannot be used as the source data for Input parameters.