Skip to main content

Snowpark integration

Hex lets Snowflake users take advantage of Snowpark, directly from your Hex projects. Unlike with standard Pandas DataFrames, these are not loaded into the memory of a Hex project. Operations on Snowpark DataFrames are executed on Snowflake’s infrastructure. This enables Hex users to operate on large datasets without requiring additional memory in Hex.

To learn more about Snowpark, visit the Snowpark Developer Guide

Enable Snowpark

Snowpark can be enabled on both workspace and project-level data connections. To enable Snowpark on a workspace data connection, a Hex Admin will need to update the connection via the Data sources page of the Admin settings. To enable Snowpark in a project data connection, navigate to the Data sources tab in the left sidebar of the project, locate the data connection for which you'd like to enable Snowpark, and select "Edit" from its 3-dot menu.

In the Snowflake data connection's settings, scroll to the Integrations section, and toggle on Snowpark.

Use Snowpark in a Hex project

Using Snowpark DataFrames can be an extremely effective way to manage memory usage in your Hex project because we don't pull the contents of your DataFrames into Hex memory directly. Under the hood, this is possible because Hex writes your Snowpark results to a Snowflake temp table in your warehouse directly.

Currently, Snowpark DataFrames can be referenced in SQL cells, filter, pivot and chart cells. See below for more details on how to leverage Snowpark DataFrames in your Hex projects.

Return a Snowpark DataFrame from a SQL cell

For SQL cells using a Snowflake warehouse as their source, you can return query results as a Snowpark DataFrame by setting the return type to Dataframe -> Snowpark in the dropdown at the top right of a SQL cell.

Note that if this DataFrame is being referenced downstream, changing the output type may cause breaking changes as Snowpark DataFrames are returned to Hex as an object and not a full Pandas DataFrame. Often, an effective way to continue to use the DataFrame is to call the .to_pandas() method, though this may have memory implications for your project.

caution

Snowpark does not support query parameterization, which is required for Jinja templating. Because of this, Jinja syntax is not supported in Snowpark queries.

Dataframe SQL & Snowpark DataFrames

With Dataframe SQL you can use existing Snowpark DataFrames as the source for your SQL queries. Hex brings the Snowpark DataFrame into memory as a Pandas DataFrame via the .to_pandas() function. This make it so that DuckDB, the engine that powers Dataframe SQL cells, can operate against this data, as well as any other Pandas DataFrame as normal. Note: If your source Snowpark DataFrame is very large, there is the possibility that the to_pandas() call can overload your project memory. In this case you may need to futher reduce the size of your output DataFrame in Snowflake or increase your project's compute profile.

Create a Snowpark session from a Python cell

To create a Snowpark session from a Python cell, create a cell with the following code replacing the argument of the get_data_connection method with the name of your connection.

import hextoolkit
hex_snowflake_conn = hextoolkit.get_data_connection('My data connection')
hex_snowpark_session = hex_snowflake_conn.get_snowpark_session()

You can also generate a Python cell with this code from the Data browser menu:

Creating a Snowpark session from the Data browser

Snowpark DataFrames in transformation and visualization cells

To use a Snowpark DataFrame in a filter, pivot, or chart cell you need only indicate your source DataFrame by selecting it in the Source dropdown for your cell.

The compute actions associated with filter, pivot, and chart cells is pushed to Snowflake. For example, any aggregations or filtering in a Chart cell is executed via a call to your Snowflake database and does not happen in local project memory.

Known limitations

  • Pivot cells which use a Snowpark DataFrame as their source return a Pandas DataFrame and not a Snowpark DataFrame. Any downstream references to the output of a pivot cell will use Pandas DataFrames.

  • The Display tables integrated directly into SQL cells are limited to returning 10,000 rows of a Snowpark DataFrame. Any rows beyond this limit will not be retrieved from Snowflake.

  • Filter cells using Snowpark DataFrames as a source cannot auto-populate distinct values for filtering a given column.

  • Chart cells which use a "Group by" field have some limitations in styling by distinct column values. For example, category color may not be stable if upstream changes are made to the DataFrame and grouping by boolean columns can result in un-filterable data.

  • Snowpark DataFrames cannot be written back to Snowflake with the Hex's Writeback cell.

  • Snowpark DataFrames cannot be used as the source data for Single value cells or Input parameters.