Did you know that GoFigr can automatically track and version your inputs? It’s as easy as calling gf.read_csv
instead of pd.read_csv
.
First, make sure you have the latest Python client. Data tracking was added in version 1.2.0:
$ pip install --upgrade gofigr && pip freeze | grep gofigr
gofigr==1.2.0
Then, in your Jupyter notebook load the GoFigr extension and replace calls to pd.read_[format]
with gf.read_[format]
:
%load_ext gofigr
df = gf.read_csv("bivariate_dist.csv") # or read_xlsx, or any other pandas file reader
That’s it! bivariate_dist.csv
will now be synced with GoFigr. What it means:
- The dataset will become available as a downloadable “Asset” in the GoFigr portal. You can see all assets by navigating to the Workspace. Jupyter will also show you a direct link.
- We will automatically create new versions if the file changes.
- All figures you create in the notebook will be automatically linked to this asset, and vice-versa.
Other ways to track files
In addition to drop-in replacements for pandas’ readers, you can also call gf.open
. This is particularly useful for binary files:
with gf.open("my_binary_file.bin", "r") as f:
print(len(f.read()))
You can also sync a path without opening it:
_ = gf.sync.sync("bivariate_dist.csv")
De-duplication
We only store each file once. You can sync and re-sync the same file without worrying about duplication.