{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# iBridges\n", "\n", "This notebook shows how to download files and directories from the UNLOCK FDP using iBridges. It focuses on demonstrating usage within Python which is applied in the tutorials within the Tutorials section.\n", "\n", "Full Python API documentation:\\\n", "\n", "\n", "GitHub repository:\\\n", "\n", "\n", "If you are interested in the command-line interface, see:\\\n", "\n", "\n", "For the graphical user interface, visit:\\\n", "\n", "\n", "GitHub repository:\\\n", "\n", "\n", "## Installation\n", "\n", "Make sure Python version 3.8+ is installed. If you are unfamiliar with Python, install [Anaconda](https://www.anaconda.com/download/success)\n", "\n", "You can install iBridges using pip in a terminal:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install ibridges" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Authentication\n", "\n", "iBridges (like all iRODS clients) requires a configuration file, which ensures that your connection is encrypted and that you do not need to provide all access information each time you connect to the UNLOCK FDP. This file is typically stored in your home directory under a folder called `.irods`:\n", "\n", "| OS | Path to `.irods` folder |\n", "|----------|------------------------------------------|\n", "| Windows | `C:\\Users\\\\.irods` |\n", "| macOS | `/Users//.irods` |\n", "| Linux | `/home//.irods` |\n", "\n", "Create a file named `irods_environment.json` in this folder. Make sure that the extension is correct.\\\n", "First off, we set the variables of the configuration file location and you will be prompted to enter your SRAM username:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Set iRODS environment directory and environment file\n", "irods_env_dir = \"~/.irods\"\n", "irods_env_file = \"irods_environment.json\"\n", "\n", "# Provide your SRAM username\n", "if not 'username' in locals():\n", " username = input(\"Enter your SRAM username: \")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**If you already have an environment file in place, you can skip the next cell and go to \"Start a session\"**.\\\n", "Some text editors (especially on Windows) may automatically save files with a `.txt` extension. The code snippet below helps create and save your personal UNLOCK iRODS configuration file (`~/.irods/irods_environment.json`) with your credentials and connection settings in the correct location:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import json\n", "\n", "# CREATE above defined irods environment directory if not does exist yet\n", "irods_env_dir = Path.expanduser(Path(irods_env_dir))\n", "if not irods_env_dir.exists():\n", " irods_env_dir.mkdir()\n", "\n", "# Set irods_environment.json file as save in the .irods folder.\n", "env = {\n", " \"irods_host\": \"data.m-unlock.nl\",\n", " \"irods_port\": 1247,\n", " \"irods_user_name\": username,\n", " \"irods_zone_name\": \"unlock\",\n", " \"irods_home\": \"/\",\n", " \"irods_authentication_scheme\": \"pam_password\",\n", " \"irods_encryption_algorithm\": \"AES-256-CBC\",\n", " \"irods_encryption_key_size\": 32,\n", " \"irods_encryption_num_hash_rounds\": 16,\n", " \"irods_encryption_salt_size\": 8,\n", " \"irods_client_server_policy\": \"CS_NEG_REQUIRE\",\n", " \"irods_client_server_negotiation\": \"request_server_negotiation\"\n", "}\n", "\n", "env_file = Path.expanduser(Path(irods_env_dir)).joinpath(\"irods_environment.json\")\n", "with open(env_file, 'w') as write_json:\n", " json.dump(env, write_json,indent=2)\n", "\n", "if Path.is_file(env_file):\n", " print(\"Created environment file at\", env_file)\n", "else:\n", " print(\"Failed to created environment file at\", env_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Start a session!\n", "You will be prompted to enter your SRAM token:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from ibridges import Session\n", "from getpass import getpass\n", "\n", "env_loc = Path(irods_env_dir) / irods_env_file\n", "env_file = Path.expanduser(Path(env_loc))\n", "\n", "if not 'password' in locals():\n", " password = getpass(\"Enter your SRAM password: \")\n", "\n", "try:\n", " session = Session(env_file, password=password)\n", "except Exception as e:\n", " print(\"Failed to establish session:\", e)\n", " del password" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading files\n", "To show all the home folders you have access to in the UNLOCK zone:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ibridges import IrodsPath\n", "\n", "print(\"Home folders in the UNLOCK zone:\")\n", "irods_path = IrodsPath(session, \"/\" + session.zone +\"/home\")\n", "for path in irods_path.collection.subcollections:\n", " if \"home/wur.\" in path.path:\n", " print(path.path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define your investigation and study variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "investigation = \"\"\n", "study = \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a local download directory (relative to the location where the code snippet is run from):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "# Define where to download files locally\n", "download_path = \"./unlock_downloads/\"+investigation+\"/\"+study \n", "\n", "# Create the directory if it doesn't exist yet\n", "download_dir = Path.expanduser(Path(download_path))\n", "download_dir.mkdir(parents=True, exist_ok=True )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download a single file or directory\n", "Use the full iRODS path to specify what you want to download. Use the web interface, a file transfer client, or a mounted network drive to see the full folder structure.\n", "\n", "You will receive a dictionary with the changes made. You can preview this beforehand using the option `dry_run=True`.\n", "> Note: Existing local files will not be overwritten unless you specifically set `overwrite=True`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ibridges import download\n", "\n", "# Download a single file to local download directory by specifying the file path on iRODS\n", "irods_file = Path(f\"/unlock/home/wur.{investigation}/stu_{study}/obs_observation_identifier/sam_sample_identifier/metagenomic_wgs_illumina/asy_assay_identifier/data/sample_name.fastq.gz\")\n", "download(session, irods_file, download_dir, overwrite=False)\n", "\n", "# Download all files in a folder to local download directory by specifying the folder path on iRODS\n", "irods_dir = Path(f\"/unlock/home/wur.{investigation}/stu_{study}/obs_observation_identifier/sam_sample_identifier/metagenomic_wgs_illumina/asy_assay_identifier/data\")\n", "download(session, irods_dir, download_dir, overwrite=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download multiple files and directories with a search\n", "You can also search and recursively download multiple files or directories (iRODS \"collections\") based on a hits with your \"path_pattern\" search.\n", "\n", "- `%` acts as wildcards in your search strings.\n", "- It skips existing, non-empty local directories unless `overwrite=True`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ibridges import IrodsPath, search_data, download\n", "from pathlib import Path\n", "import os\n", "\n", "# Define specified folder pattern (e.g. all humann3 output from the study)\n", "folder_pattern=\"humann3\"\n", "# Define search path\n", "search = f\"/unlock/home/wur.{investigation}/stu_{study}/\"\n", "data = search_data(session, path=IrodsPath(session, search), path_pattern=folder_pattern)\n", "\n", "# Set options and counters\n", "overwrite = False\n", "downloaded, skipped = 0,0\n", "unique_folders = []\n", "\n", "# Go through the search results and download all specified folders\n", "for item in data:\n", " irods_path = IrodsPath(session, item) # Create an IrodsPath object for the item\n", " run = irods_path.parent.name # Extract the name of the parent folder\n", " local_destination = Path(download_path) / run # Construct the local destination path\n", " if item.collection_exists(): # Only process directories (collections)\n", " if not local_destination.exists() or overwrite:\n", " local_destination.mkdir(parents=True, exist_ok=True)\n", " download(session, item, local_destination, overwrite=overwrite)\n", " downloaded += 1\n", " elif len(os.listdir(local_destination)) == 0:\n", " download(session, item, local_destination, overwrite=overwrite)\n", " downloaded += 1\n", " else:\n", " skipped += 1\n", " elif item not in unique_folders: # Avoid downloading already processed items\n", " unique_folders.append(item)\n", "\n", "# Print download summary\n", "print(\"\\nDownloaded: \", downloaded)\n", "print(\"Skipped: \", skipped)\n", "print(\"Total unique folders processed:\", len(unique_folders))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**For files:**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ibridges import IrodsPath, search_data, download\n", "from pathlib import Path\n", "import os\n", "\n", "# Define specified file pattern (e.g. ttl files)\n", "pattern=\"%.ttl\"\n", "# Define search path\n", "search = f\"/unlock/home/wur.{investigation}/stu_{study}/\"\n", "data = search_data(session, path=IrodsPath(session, search), path_pattern=pattern)\n", "\n", "# Set options and counters\n", "overwrite = False\n", "downloaded, skipped = 0,0\n", "unique_folders = []\n", "\n", "# Iterate over search results and download files\n", "for item in data:\n", " path = str(item.absolute()) # Convert path to string\n", " local_destination = Path(download_path) # Base local directory\n", " file_destination = local_destination / item.name # Construct local file path\n", "\n", " if item.dataobject_exists(): # Check if file\n", " local_destination.mkdir(parents=True, exist_ok=True)\n", " if not file_destination.exists() or overwrite: # Download only if not present\n", " download(session, item, file_destination, overwrite=overwrite)\n", " downloaded += 1\n", " else:\n", " skipped += 1\n", "\n", "# Print download summary\n", "print(\"\\nDownloaded: \", downloaded)\n", "print(\"Skipped: \", skipped)" ] } ], "metadata": { "kernelspec": { "display_name": "Testing_nbs", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }