Check out the companion Hex project for this tutorial here!
While working in Hex, you may want to pull data from or save data to your Google Cloud Storage (GCS) buckets. This tutorial shows you how to do both!
Set up Google Service Account key
As a first step, you'll need to create a Service Account key in Google Cloud that has access to the GCS bucket you want to access in your project. Take the following steps in order to create the key:
- Navigate to Google IAM & Admin Service Accounts.
- Select a project or create a new one.
- Select a Service Account or select + Create Service Account .
- Click the Keys tab.
- Click the Add key drop-down menu, then select Create new key.
- Select JSON as the Key type and click Create. The JSON file will download automatically.
For more detailed instructions on how to set up your Service Account, you can check out this guide.
Create a Secret for your Service Account key
Once you have the JSON file with the Service Account credentials, add the JSON file as a Secret in your project.
Navigate to the Variables tab, and use the + Add button to create a Secret where the Secret’s value is the entire service account key JSON. In this example, the Secret is named
Connect to GCS
Start by importing the packages required to establish a connection to your GCS bucket.
from google.cloud import storage
import pandas as pd
Use the credentials from the Secret you created (in this case,
sa_json) to create your connection. Use
json.loads to process the JSON and store it in the variable,
service_account_info. Next, Use the name of your GCS bucket in place of our bucket,
hex-demo-test. If a bucket by this name doesn't exist, it will be created.
bucket_name = 'hex-demo-test'
service_account_info = json.loads(sa_json)
client2 = storage.Client.from_service_account_info(service_account_info)
bucket = client2.get_bucket(bucket_name)
Upload blob to GCS
In this example, we’ll upload a dataset of restaurant orders we have in our database. The dataframe we have is called
Convert the dataframe to CSV to be uploaded to GCS.
Create a variable,
source_file_name, to hold the name of the csv we just created,
orders_to_upload.csv. Next, create a variable,
destination_blob_name, to hold the name of the blob you are creating in the GCS bucket defined above (
source_file_name = 'orders_to_upload.csv'
destination_blob_name = 'uploaded_orders.csv'
orders_to_upload.csv file to a blob named
uploaded_orders.csv in the
blob = bucket.blob(destination_blob_name)
Download blob from GCS
Now let's download the data that was just uploaded. Start by defining a variable,
source_blob_name, to hold the the name of the blob we'll be accessing,
uploaded_orders.csv. Then, define a variable for the name of the downloaded file,
source_blob_name = 'uploaded_orders.csv'
destination_file_name = 'orders_downloaded.csv'
Pull the data we uploaded in
uploaded_orders.csv back into our project.
blob = bucket.blob(source_blob_name)
And lastly, let's look at the downloaded data!
df = pd.read_csv('orders_downloaded.csv')