Using GoFigr to track input data in Python

Did you know that GoFigr can automatically track and version your inputs? It’s as easy as calling gf.read_csv instead of pd.read_csv.

First, make sure you have the latest Python client. Data tracking was added in version 1.2.0:

$ pip install --upgrade gofigr && pip freeze | grep gofigr
gofigr==1.2.0

$ pip install --upgrade gofigr && pip freeze | grep gofigr
gofigr==1.2.0

Then, in your Jupyter notebook load the GoFigr extension and replace calls to pd.read_[format] with gf.read_[format]:

%load_ext gofigr

df = gf.read_csv("bivariate_dist.csv") # or read_xlsx, or any other pandas file reader

%load_ext gofigr

df = gf.read_csv("bivariate_dist.csv") # or read_xlsx, or any other pandas file reader

That’s it! bivariate_dist.csv will now be synced with GoFigr. What it means:

The dataset will become available as a downloadable “Asset” in the GoFigr portal. You can see all assets by navigating to the Workspace. Jupyter will also show you a direct link.
We will automatically create new versions if the file changes.
All figures you create in the notebook will be automatically linked to this asset, and vice-versa.

In addition to drop-in replacements for pandas’ readers, you can also call gf.open. This is particularly useful for binary files:

with gf.open("my_binary_file.bin", "r") as f:
    print(len(f.read()))

with gf.open("my_binary_file.bin", "r") as f:
    print(len(f.read()))

You can also sync a path without opening it:

_ = gf.sync.sync("bivariate_dist.csv")

_ = gf.sync.sync("bivariate_dist.csv")

We only store each file once. You can sync and re-sync the same file without worrying about duplication.