Using local data paths with zea¶
Most zea examples use Hugging Face links for convenience, but you can also work with local datasets by configuring a users.yaml file that points to your data root. This notebook shows how to set up local paths and load data from your own storage.
[1]:
%%capture
%pip install zea
[2]:
config_picmus_rf = "hf://zeahub/configs/config_picmus_rf.yaml"
Setting up your users.yaml¶
Many codebases and projects are littered with hardcoded absolute paths, which can make it difficult to share code or run it on different machines. To avoid this, zea makes use of a users.yaml file to define local data paths. The idea is that users can specify a local data root, and zea will use this to resolve paths dynamically, relative to the user’s data root.
Create a users.yaml file in your project directory. This file tells zea where your local data is stored. Example content:
data_root: /home/your_username/data
Replace /home/your_username/data with the actual path to your data directory.
Tip: You can auto-generate this file by running:
python -m zea.datapathsand following the prompts.
Using Local Data Paths¶
Once your users.yaml is set up, you can load data from your local data root. Here’s a minimal example:
[3]:
from zea import set_data_paths
user = set_data_paths("users.yaml")
data_root = user.data_root
username = user.username
print(f"🔔 Hi {username}! You are using data from {data_root}")
zea: Using backend 'jax'
zea: WARNING Could not create user profile for root on e567b8caf7d0, using default.
zea: WARNING data_root path `/mnt/z/data` does not exist, please update your users.yaml file.
zea: WARNING output path `/mnt/z/data/output` does not exist, please update your users.yaml file.
🔔 Hi root! You are using data from /mnt/z/data
Advanced Data Path Configuration¶
In the above example, we use the most simple configuration in users.yaml, with just a data_root key. However, there are many more advanced options you can configure using users.yaml. For example, you can specify multiple data roots, for different projects, users and machines. Additionally, you can define a path for local and remote data (if you use for instance a remote storage). Let’s have a look at a more advanced example.
Example: Complex users.yaml Layout¶
For collaborative projects or when working across multiple machines and operating systems, you can use a more structured users.yaml file. Here is an example:
alice:
workstation1:
system: linux
data_root:
local: /mnt/data/alice
remote: /mnt/remote/alice
output: /mnt/data/alice/output
laptop:
system: windows
data_root: D:/data/alice
output: D:/data/alice/output
bob:
server:
system: linux
data_root:
local: /mnt/data/bob
remote: /mnt/remote/bob
system: linux
data_root: /mnt/data/bob
output: /mnt/data/bob/output
# Default fallback if no user/machine matches
data_root: /mnt/shared/data
output: /mnt/shared/output
Each user can have different machines, each with their own
systemanddata_root.data_rootcan be a string or a dictionary withlocalandremotekeys.If no user or machine matches, the default
data_rootat the bottom is used.
[4]:
# Example: Select remote data root (if defined in users.yaml)
user_remote = set_data_paths("users.yaml", local=False)
print("Remote data root:", user_remote.data_root)
user_local = set_data_paths("users.yaml", local=True)
print("Local data root:", user_local.data_root)
zea: WARNING data_root path `/mnt/z/data` does not exist, please update your users.yaml file.
zea: WARNING output path `/mnt/z/data/output` does not exist, please update your users.yaml file.
Remote data root: /mnt/z/data
zea: WARNING data_root path `/mnt/z/data` does not exist, please update your users.yaml file.
zea: WARNING output path `/mnt/z/data/output` does not exist, please update your users.yaml file.
Local data root: /mnt/z/data
Full Environment Setup with setup¶
For convenience, zea provides a setup function that configures everything in one step: config, data paths, and device (GPU/CPU).
This will prompt for missing user profiles if needed, set up data paths, and initialize the device.
Use this in your main scripts for reproducible and portable setups.
[5]:
from zea.internal.setup_zea import setup
# config_path: path to your config YAML file
# user_config: path to your users.yaml file
config = setup(config_path=config_picmus_rf, user_config="users.yaml")
data_root = config.data.user.data_root
device = config.device
zea: Using config file: hf://zeahub/configs/config_picmus_rf.yaml
zea: Git branch and commit: feature/notebook-precommit=9443c0bf1cffc5bad87bdfc7b0c92da367b91e86
zea: WARNING data_root path `/mnt/z/data` does not exist, please update your users.yaml file.
zea: WARNING output path `/mnt/z/data/output` does not exist, please update your users.yaml file.
-------------------GPU settings-------------------
memory
GPU
0 10893
1 10899
2 45469
3 45469
4 45469
Selecting 1 GPU based on available memory.
Selected GPU 2 with Free Memory: 45469.00 MiB
Hiding GPUs [0, 1, 3, 4] from the system.
--------------------------------------------------
Summary¶
Use
users.yamlto manage local/remote data roots for different users and systems.Use
set_data_pathsto resolve your data root dynamically.For advanced setups, structure
users.yamlwith users, hostnames, and local/remote keys.Use
setupfor a one-liner to initialize config, data paths, and device.