{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Download files and directories using iBridges\n", "\n", "This sections shows how you can download stuff from iRODS using iBridges. These are just a few examples on how you could use it within Python.\n", "\n", "For the full documentation and usage go to:\\\n", "https://ibridges.readthedocs.io/en/stable/\n", "\n", "git repository:\\\n", "https://github.com/UtrechtUniversity/iBridges/\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Authentication\n", "\n", "All iRODS clients (icommands and APIs) expect the above parameters to be stored in a special folder. This folder is called .irods and it lies in your home directory:\n", "\n", " Mac: /Users//.irods\n", " Linux: /home//.irods\n", " Windows: C:\\Users\\\\.irods\n", "\n", "You can store the irods_environment.json in that folder and make sure that its extension is json.\n", "\n", "Again under Windows the text editors usually save files with the .txt extension. So please watch out for this. Below we provide a code snippet which saves your personal UNLOCK iRODS information in the right place." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Set irods environment location and username\n", "\n", "irods_env_dir = \"~/.irods\"\n", "irods_env_file = \"irods_environment.json\"\n", "username = " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**If you already have an environment file in place, you can skip the next cell and go to \"Start a session\"**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import json\n", "\n", "# CREATE above defined irods environment directory if not does exist yet\n", "irods_env_dir = Path.expanduser(Path(irods_env_dir))\n", "if not irods_env_dir.exists():\n", " irods_env_dir.mkdir()\n", "\n", "# Set irods_environment.json file as save in the .irods folder.\n", "env = {\n", " \"irods_host\": \"unlock-icat.irods.surfsara.nl\",\n", " \"irods_port\": 1247,\n", " \"irods_user_name\": username,\n", " \"irods_zone_name\": \"unlock\",\n", " \"irods_authentication_scheme\": \"pam\",\n", " \"irods_encryption_algorithm\": \"AES-256-CBC\",\n", " \"irods_encryption_key_size\": 32,\n", " \"irods_encryption_num_hash_rounds\": 16,\n", " \"irods_encryption_salt_size\": 8,\n", " \"irods_client_server_policy\": \"CS_NEG_REQUIRE\",\n", " \"irods_client_server_negotiation\": \"request_server_negotiation\"\n", "}\n", "\n", "env_file = Path.expanduser(Path(irods_env_dir)).joinpath(\"irods_environment.json\")\n", "with open(env_file, 'w') as write_json:\n", " #json.dump(env, write_json,indent=2)\n", " json.dump(env, write_json,indent=2)\n", "\n", "if Path.is_file(env_file):\n", " print(\"Created environment file at\", env_file)\n", "else:\n", " print(\"Failed to created environment file at\", env_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Start a session! It will ask you for the SRAM token" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from ibridges import Session\n", "from getpass import getpass\n", "\n", "env_loc = irods_env_dir+\"/\"+irods_env_file\n", "env_file = Path.expanduser(Path(env_loc))\n", "\n", "password = getpass()\n", "session = Session(env_file, password=password)\n", "\n", "if session:\n", " print(\"Session succesfully established\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading files" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "investigation = \n", "study = " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a local download directory" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from ibridges import IrodsPath\n", "\n", "# Define where to download files locally\n", "download_path = \"./unlock_downloads/\"+investigation+\"/\"+study \n", "\n", "# Create the directory if it doesn't exist yet\n", "download_dir = Path.expanduser(Path(download_path))\n", "download_dir.mkdir( parents=True, exist_ok=True )\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download a single file or directory\n", "Use the full iRODS path\n", "\n", "You will receive a dictionary with changes, which you can also retrieve beforehand with the option **dry_run=True**.\\\n", "Existing local data will not be overwritten. Please use the option **overwrite=True** if you want to overwrite your local data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ibridges import download\n", "\n", "irods_file = Path(\"/unlock/home/wur.fdp/stu_bmock12_prjna496047/obs_bmock12_mocktest_cwl/sam_bmock12_synthetic_metagenome/metagenomic_other_illumina/asy_illumina_srr8073716/data/SRR8073716_1.fastq.gz\")\n", "download(session, irods_file, download_dir)\n", "\n", "irods_dir = Path(\"/unlock/home/wur.fdp/stu_bmock12_prjna496047/obs_bmock12_mocktest_cwl/sam_bmock12_synthetic_metagenome/metagenomic_other_illumina/asy_illumina_srr8073716\")\n", "download(session, irods_dir, download_dir)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download multiple files and directories with a search\n", "Likely you would like to download multiple files or directories (collections in iRODS)\n", "\n", "**For directories:**\n", "\n", "This will download directories that will have a hit with your \"search\".\\\n", "It will skip the download when the directory exist **AND** is not empty.\\\n", "Set the variable **overwrite** to **True** to change this behaviour.\n", "\n", "\"%\" denote wildcards in your search string" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from ibridges import IrodsPath\n", "from ibridges import search_data\n", "from ibridges import download\n", "\n", "search = \"/unlock/home/wur.\"+investigation+\"/stu_\"+study+\"%3_PICRUSt2\"\n", "data = search_data(session, path=IrodsPath(session, search))\n", "\n", "overwrite = False\n", "\n", "# set counters\n", "downloaded, skipped = 0,0\n", "unique_folders = []\n", "\n", "for item in data:\n", " run = item[\"COLL_NAME\"].split(\"/\")[-2]\n", " local_destination = Path.expanduser(Path(download_path+\"/\"+run))\n", "\n", " if not local_destination.exists() or overwrite:\n", " local_destination.mkdir( parents=True)\n", " download(session, item[\"COLL_NAME\"], local_destination, overwrite=overwrite)\n", " downloaded += 1\n", " elif len(os.listdir(local_destination)) == 0:\n", " download(session, item[\"COLL_NAME\"], local_destination, overwrite=overwrite)\n", " downloaded += 1\n", " else: \n", " if item[\"COLL_NAME\"] not in unique_folders: \n", " skipped += 1\n", " \n", " if item[\"COLL_NAME\"] not in unique_folders: unique_folders.append(item[\"COLL_NAME\"])\n", "\n", "print(\"\\nDownloaded: \", downloaded)\n", "print(\"Skipped: \", skipped)\n", "print(\"Total\",len(unique_folders))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**For files:**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ibridges import IrodsPath\n", "from ibridges import search_data\n", "from ibridges import download\n", "from pathlib import Path\n", "import re, os\n", "\n", "search = \"/unlock/home/wur.\"+investigation+\"/stu_\"+study+\"%2_Classification/%\"\n", "data = search_data(session, path=IrodsPath(session, search))\n", "\n", "# Download the desired files \n", "# in this case only files that have \".ttl\" in the file name. (the pattern is a regex)\n", "overwrite = False\n", "pattern = \".ttl\"\n", "\n", "downloaded,skipped = 0,0\n", "for i, item in enumerate(data):\n", " # if i < 10:\n", " if re.search(pattern, item[\"DATA_NAME\"]): \n", " data_name = item[\"DATA_NAME\"]\n", " irods_path = IrodsPath(session, item[\"COLL_NAME\"],item[\"DATA_NAME\"])\n", " local_destination = Path.expanduser(Path(download_path+\"/\"+\"/\"+data_name))\n", " if not os.path.isfile(str(local_destination)) or overwrite:\n", " print(\"Downloading \", data_name)\n", " download(session, irods_path, local_destination, overwrite=overwrite)\n", " downloaded += 1\n", " else:\n", " print(\"Skipped \", data_name)\n", " skipped += 1\n", "\n", "print(\"\\nDownloaded: \", downloaded)\n", "print(\"Skipped: \", skipped)\n", "print(\"Total: \", downloaded+skipped)\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }