There is a finite amount of memory available in Hex projects. On rare occasions, you may find that you are running our of memory in your environment. These errors commonly come in the form of Python MemoryErrors. The MemoryError message in the top and bottom left corners of this error screenshot are good indicators that the code in this cell caused the kernel to run out of memory.
Although there is an upper bound on how much memory you will be able to use in any project, there are a few strategies described below to help you reduce the amount of memory your project needs.
Every variable created in a Hex project is stored in memory until it is deleted. When you no longer need a variable in a project, you can save memory by deleting the variable in a Python cell. This example deletes a variable named 'example'.
The variables that take up the most memory are typically the dataframe outputs of SQL cells.
To identify the variables taking up the most memory, you can use this Python code:
import sysimport pandas as pd # These are the usual ipython objects, including this one you are creatingipython_vars = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'ipython_vars'] # Format size of bytes output to human readable formatdef sizeof_fmt(num, suffix='B'): for unit in ['','K','M','G']: if abs(num) < 1000.0: return "%3.1f %s%s" % (num, unit, suffix) num /= 1000.0 return "%.1f %s" % (num, suffix) # Get a sorted list of objects and their sizesvariables = sorted([(x, sizeof_fmt(sys.getsizeof(globals().get(x))), sys.getsizeof(globals().get(x)) ) for x in dir() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars], key=lambda x: x, reverse=True) variables_df = pd.DataFrame(variables, columns=['ITEM', 'SIZE', 'SIZE_IN_BYTES']) variables_df
If you have a variable you want to delete but will need to reference it later, you can save it as a file in your project. Any files you write to the working directory in your environment will be saved as part of your project. This allows you to store your data without having to keep it in memory.
For dataframes, a common way to do this is to write them as a CSV in your Python code. This example writes a dataframe to a file called "saved_df.csv" in the project.
You can use read_csv from Pandas to read the saved csv back into memory later in your project.
import pandas as pddf = pd.read_csv('saved_df.csv', header=0, index_col=0)
You may be able to shrink the memory usage of a dataframe by changing the data type of some of the columns. You can check the data types in a Python cell like this:
Based on these types there are a few common conversions you can make to save memory.
- object -> category
- Convert object columns to category columns if there are relatively few unique values in this column compared to the number of rows.
- float64 -> float32
- Convert float64 columns to float32 columns unless you need 16 digits of precision.
- int64 -> int32
- Convert int64 columns to int32 columns unless your data is outside of the range (-2147483648, 2147483648).
You can convert data types in a Python cell. This example converts a column called "example_column" to the category data type.
df['example_column'] = df['example_column'].astype('category')