A filesystem in userspace for mounting the Unity Catalog from Databricks.
This is not an official databricks package. I, the author of this package, am not affiliated to Databricks. My capacity to support this package is very limited or none. I may review issues and pull requests but I won't commit to timelines or features.
The filesystem is read only.
This filesystem uses the public databricks API to retrieve files, directories and access permissions from the Unity Catalog.
To mitigate latency and improve performance, file metadata is cached in-memory. Data is cached
to a local cache directory (--disk-cache-dir) and partially to RAM as well. Options to control
the sizes of those caches are available.
Credentials are stored in RAM while the filesystem is mounted.
DATABRICKS_TOKEN, DATABRICKS_CONFIG_PROFILE and DATABRICKS_CONFIG_FILE are used to find the databricks configuration file and obtain a valid access token. Alternatively, credentials can be passed by writing a
personal access token to a virtual file:
echo "dapi0000000-2" > /Volumes/.auth/personal_access_token
If fuse (/etc/fuse.conf) has user_allow_other activated, this driver supports the --allow-other,
option so multiple users can access it. In this case, the fuse4dbricks process should run from a root account, who should have exclusive access to --disk-cache-dir. fuse4dbricks will inspect the environment variables of the requesting process, as well as the requesting user, in order to find the access credentials. This may require reading the .databrickscfg file from the requesting user. The cache is shared among all users in this scenario.
When an access token is missing, revoked or expired, the unity catalog is not accessible anymore and only
a virtual /Volumes/README.txt file appears, with instructions on how to add the access token manually.
This package depends on pyfuse3, that requires the libfuse3-dev or fuse3-devel system package
being installed. The package name depends on your specific linux distribution.
- Debian / Ubuntu:
sudo apt install libfuse3-dev - RedHat / Fedora:
sudo dnf install fuse3-devel - SUSE:
sudo zypper install fuse3-devel
If you want to see databricks volumes as if you were on Databricks (in /Volumes) you will need admin privileges to create /Volumes directory on your computer. Otherwise you can use $HOME/Volumes or whatever directory you have access to.
pip install "fuse4dbricks"
mkdir "$HOME/Volumes" # or any other directory
fuse4dbricks --workspace "https://adb-xxxx.azuredatabricks.net" $HOME/Volumes
ls $HOME/Volumes
# Your catalogs will appear
This uses databricks unified authentication (e.g. searches for $HOME/.databrickscfg in your home). If no unified auth file
is found, you will find a README file with instructions on how to provide a personal access token.
If you are the only user (e.g. on your laptop) and you have admin permissions, feel free to use /Volumes so paths are like on databricks:
pip install "fuse4dbricks"
sudo mkdir /Volumes"
sudo chown $USER /Volumes
fuse4dbricks --workspace "https://adb-xxxx.azuredatabricks.net" /Volumes
ls /Volumes
# Your catalogs will appear on /Volumes
If you are on a shared computer (e.g. an HPC), fuse4dbricks can run as a root service and use the access tokens from any user who aims to access unity catalog volumes.
The databricks token used will depend on the user who asks for the file.
A cache is shared so if many users read the same files the access speed is faster. Users can't access other users files, because each user has to provide its own access token.
-
Create a virtual environment and install fuse4dbricks there:
# Note that fuse4dbricks requires python>=3.11 sudo bash mkdir /opt/fuse4dbricks chmod 755 /opt/fuse4dbricks python3.11 -m venv /opt/fuse4dbricks/venv source /opt/fuse4dbricks/venv/bin/activate python3 -m pip install fuse4dbricks deactivate exit # exit from sudo bash -
Create the mount directory:
sudo mkdir /Volumes sudo chmod 0755 /Volumes -
Create the cache directory:
sudo mkdir /var/cache/fuse4dbricks sudo chmod 0700 /var/cache/fuse4dbricks -
Enable
user_allow_othersupport in/etc/fuse.confEdit
/etc/fuse.confand uncomment theuser_allow_otherline. -
Create a starting script and make it executable:
Please replace whatever you need here
cat << EOF | sudo tee /opt/fuse4dbricks/fuse4dbricks_start.sh #!/bin/bash source /opt/fuse4dbricks/venv/bin/activate fuse4dbricks \ --workspace "https://adb-xxxx.azuredatabricks.net" \ --disk-cache-dir /var/cache/fuse4dbricks \ --allow-other \ --ram-cache-mb 512 \ --disk-cache-gb 1024 \ --disk-cache-max-days 30 \ /Volumes EOF sudo chmod +x /opt/fuse4dbricks/fuse4dbricks_start.sh -
Create a systemd unit
cat << EOF | sudo tee /etc/systemd/system/fuse4dbricks.service [Unit] Description=fuse4dbricks After=network.target [Service] Type=simple User=root WorkingDirectory=/opt/fuse4dbricks ExecStart=/opt/fuse4dbricks/fuse4dbricks_start.sh Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target EOF -
Reload the daemon lists
sudo systemctl daemon-reload -
Enable and start the service
sudo systemctl enable fuse4dbricks sudo systemctl start fuse4dbricks