iBridges#
This notebook shows how to download files and directories from the UNLOCK FDP using iBridges. It focuses on demonstrating usage within Python which is applied in the tutorials within the Tutorials section.
Full Python API documentation:
https://ibridges.readthedocs.io/en/stable/quickstart.html
GitHub repository:
UtrechtUniversity/iBridges
If you are interested in the command-line interface, see:
https://ibridges.readthedocs.io/en/stable/ibridges_cli.html
For the graphical user interface, visit:
https://ibridges-for-irods.github.io/iBridges-GUI/docs/info.html
GitHub repository:
iBridges-for-iRODS/iBridges-GUI
Installation#
Make sure Python version 3.8+ is installed. If you are unfamiliar with Python, install Anaconda
You can install iBridges using pip in a terminal:
%pip install ibridges
Authentication#
iBridges (like all iRODS clients) requires a configuration file, which ensures that your connection is encrypted and that you do not need to provide all access information each time you connect to the UNLOCK FDP. This file is typically stored in your home directory under a folder called .irods
:
OS |
Path to |
---|---|
Windows |
|
macOS |
|
Linux |
|
Create a file named irods_environment.json
in this folder. Make sure that the extension is correct.
First off, we set the variables of the configuration file location and you will be prompted to enter your SRAM username:
# Set iRODS environment directory and environment file
irods_env_dir = "~/.irods"
irods_env_file = "irods_environment.json"
# Provide your SRAM username
if not 'username' in locals():
username = input("Enter your SRAM username: ")
If you already have an environment file in place, you can skip the next cell and go to “Start a session”.
Some text editors (especially on Windows) may automatically save files with a .txt
extension. The code snippet below helps create and save your personal UNLOCK iRODS configuration file (~/.irods/irods_environment.json
) with your credentials and connection settings in the correct location:
from pathlib import Path
import json
# CREATE above defined irods environment directory if not does exist yet
irods_env_dir = Path.expanduser(Path(irods_env_dir))
if not irods_env_dir.exists():
irods_env_dir.mkdir()
# Set irods_environment.json file as save in the .irods folder.
env = {
"irods_host": "data.m-unlock.nl",
"irods_port": 1247,
"irods_user_name": username,
"irods_zone_name": "unlock",
"irods_home": "/",
"irods_authentication_scheme": "pam_password",
"irods_encryption_algorithm": "AES-256-CBC",
"irods_encryption_key_size": 32,
"irods_encryption_num_hash_rounds": 16,
"irods_encryption_salt_size": 8,
"irods_client_server_policy": "CS_NEG_REQUIRE",
"irods_client_server_negotiation": "request_server_negotiation"
}
env_file = Path.expanduser(Path(irods_env_dir)).joinpath("irods_environment.json")
with open(env_file, 'w') as write_json:
json.dump(env, write_json,indent=2)
if Path.is_file(env_file):
print("Created environment file at", env_file)
else:
print("Failed to created environment file at", env_file)
Start a session!#
You will be prompted to enter your SRAM token:
from pathlib import Path
from ibridges import Session
from getpass import getpass
env_loc = Path(irods_env_dir) / irods_env_file
env_file = Path.expanduser(Path(env_loc))
if not 'password' in locals():
password = getpass("Enter your SRAM password: ")
try:
session = Session(env_file, password=password)
except Exception as e:
print("Failed to establish session:", e)
del password
Downloading files#
To show all the home folders you have access to in the UNLOCK zone:
from ibridges import IrodsPath
print("Home folders in the UNLOCK zone:")
irods_path = IrodsPath(session, "/" + session.zone +"/home")
for path in irods_path.collection.subcollections:
if "home/wur." in path.path:
print(path.path)
Define your investigation and study variables:
investigation = "<your_investigation>"
study = "<your_study>"
Create a local download directory (relative to the location where the code snippet is run from):
from pathlib import Path
# Define where to download files locally
download_path = "./unlock_downloads/"+investigation+"/"+study
# Create the directory if it doesn't exist yet
download_dir = Path.expanduser(Path(download_path))
download_dir.mkdir(parents=True, exist_ok=True )
Download a single file or directory#
Use the full iRODS path to specify what you want to download. Use the web interface, a file transfer client, or a mounted network drive to see the full folder structure.
You will receive a dictionary with the changes made. You can preview this beforehand using the option dry_run=True
.
Note: Existing local files will not be overwritten unless you specifically set
overwrite=True
.
from ibridges import download
# Download a single file to local download directory by specifying the file path on iRODS
irods_file = Path(f"/unlock/home/wur.{investigation}/stu_{study}/obs_observation_identifier/sam_sample_identifier/metagenomic_wgs_illumina/asy_assay_identifier/data/sample_name.fastq.gz")
download(session, irods_file, download_dir, overwrite=False)
# Download all files in a folder to local download directory by specifying the folder path on iRODS
irods_dir = Path(f"/unlock/home/wur.{investigation}/stu_{study}/obs_observation_identifier/sam_sample_identifier/metagenomic_wgs_illumina/asy_assay_identifier/data")
download(session, irods_dir, download_dir, overwrite=False)
Download multiple files and directories with a search#
You can also search and recursively download multiple files or directories (iRODS “collections”) based on a hits with your “path_pattern” search.
%
acts as wildcards in your search strings.It skips existing, non-empty local directories unless
overwrite=True
from ibridges import IrodsPath, search_data, download
from pathlib import Path
import os
# Define specified folder pattern (e.g. all humann3 output from the study)
folder_pattern="humann3"
# Define search path
search = f"/unlock/home/wur.{investigation}/stu_{study}/"
data = search_data(session, path=IrodsPath(session, search), path_pattern=folder_pattern)
# Set options and counters
overwrite = False
downloaded, skipped = 0,0
unique_folders = []
# Go through the search results and download all specified folders
for item in data:
irods_path = IrodsPath(session, item) # Create an IrodsPath object for the item
run = irods_path.parent.name # Extract the name of the parent folder
local_destination = Path(download_path) / run # Construct the local destination path
if item.collection_exists(): # Only process directories (collections)
if not local_destination.exists() or overwrite:
local_destination.mkdir(parents=True, exist_ok=True)
download(session, item, local_destination, overwrite=overwrite)
downloaded += 1
elif len(os.listdir(local_destination)) == 0:
download(session, item, local_destination, overwrite=overwrite)
downloaded += 1
else:
skipped += 1
elif item not in unique_folders: # Avoid downloading already processed items
unique_folders.append(item)
# Print download summary
print("\nDownloaded: ", downloaded)
print("Skipped: ", skipped)
print("Total unique folders processed:", len(unique_folders))
For files:
from ibridges import IrodsPath, search_data, download
from pathlib import Path
import os
# Define specified file pattern (e.g. ttl files)
pattern="%.ttl"
# Define search path
search = f"/unlock/home/wur.{investigation}/stu_{study}/"
data = search_data(session, path=IrodsPath(session, search), path_pattern=pattern)
# Set options and counters
overwrite = False
downloaded, skipped = 0,0
unique_folders = []
# Iterate over search results and download files
for item in data:
path = str(item.absolute()) # Convert path to string
local_destination = Path(download_path) # Base local directory
file_destination = local_destination / item.name # Construct local file path
if item.dataobject_exists(): # Check if file
local_destination.mkdir(parents=True, exist_ok=True)
if not file_destination.exists() or overwrite: # Download only if not present
download(session, item, file_destination, overwrite=overwrite)
downloaded += 1
else:
skipped += 1
# Print download summary
print("\nDownloaded: ", downloaded)
print("Skipped: ", skipped)