Skip to main content

BigQuery DataFrame integration - BETA

BigQuery users can now take advantage of BigQuery DataFrames within Hex. Unlike standard Pandas DataFrames, these are not loaded into the memory of a Hex project. Operations on BigQuery DataFrames are executed on BigQuery’s infrastructure, enabling Hex users to work with large datasets without requiring additional memory in Hex.

To learn more about BigQuery DataFrames, visit the BigQuery DataFrames docs.

Create a BigQuery DataFrame session from a Python cell

To create a BigQuery DataFrame session from a Python cell, create a cell with the following code, replacing the argument of the get_data_connection method with the name of your connection.

import hextoolkit
hex_bigquery_conn = hextoolkit.get_data_connection('Demo Bigquery')
session = hex_bigquery_conn.get_bigquery_session()

You can also generate a Python cell with this code from the “Get BigQuery session” button in the Data browser menu:

Once your connection is established, you can interact with BigQuery DataFrames in python cells - just reference “session” in places where you might reference the bigframes.pandas package. For example, to query data, you can write:

df1 = session.read_gbq("select * from demo_data.cc_cards")

Working with BigQuery DataFrames

BigQuery DataFrames will, by default, not be brought into memory in Hex. If you want to bring the DataFrame into Hex memory, you can use the function “to_pandas” to convert it into a Pandas DataFrame in memory.

Note that BigQuery DataFrames uses a BigQuery session, which is tied to a BigQuery location. To learn more about locations and how to set them appropriately, check out Google’s docs on BigQuery DataFrames.