Workflow overview#

The UNLOCK FAIR Data Platfrom provides several bioinformatics workflows designed to automate data analysis. These workflows are run with the workflow manager CWL (Common Workflow Language), and containerized using Docker, reducing the need for complex dependencies while allowing us to incorporate many different tools. We priortize using publicly available and community-maintained images, but host our own when necessary.

Access and usage

Below is a summary of key workflows available at the FDP. For a full, detailed list of published workflows (including inputs, steps, and outputs), visit the UNLOCK WorkflowHub.

For guidance on how to set up and run these workflow, see the Setup section.

Workflow: Metagenomics Assembly#

View on Workflowhub

This workflow assembles genomes from Illumina reads and/or long reads. It is customizable to a certain extent regarding which steps to run and can also be used for isolates.

Main steps involved:

  • Illumina Quality Workflow

  • Long Read Quality Workflow

  • Assembly: SPAdes / Flye

  • Short read polishing (Pilon)

  • ONT read polishing (Medaka)

  • QUAST (Assembly quality report)

  • Metagenomics Binning workflow

  • Metagenomics GEM workflow


Workflow: Illumina Quality#

View on Workflowhub

This workflow ensures high-quality Illumina read data before further analysis.

Steps included:

  • FastQC quality plots (before and after filtering)

  • fastp quality filtering

  • BBduk PhiX removal and rRNA filtering

  • BBmap Reference/contamination filtering (mapped or unmapped)

  • Kraken2 taxonomic read classification (before and after)


Workflow: Long Reads Quality#

View on Workflowhub

This workflow ensures high-quality Nanopore/long-read data before further analysis.

Steps included:

  • NanoPlot quality plots and reports (before and after filtering)

  • Filtlong long reads quality filtering

  • Minimap2 Reference/contamination filtering (mapped or unmapped)

  • Kraken2 taxonomic read classification (before and after)


Workflow: Metagenomics Binning#

View on Workflowhub

This workflow bins metagenomic reads into individual genomes.

Steps included:

  • Metabat2 / MaxBin2 / SemiBin binning

  • DAS Tool bin refinement

  • EukRep (eukaryotic classification)

  • CheckM bin quality

  • BUSCO bin quality

  • GTDB-Tk bin taxonomic classification


Workflow: Metagenomics GEM#

View on Workflowhub

!! Important caveat: The CarveMe, MEMOTA and SMETANA Docker container images used in this workflow include the licenced CPLEX Optimizer. Therefore, we cannot make these images public. This means the workflow will not work out-of-the-box. However, we have made the Docker Build files available here

Steps included:

  • Prodigal protein prediction

  • CarveMe GEnome-scale Metabolic model reconstruction

  • MEMOTE for metabolic model testing

  • SMETANA Species METabolic interaction ANAlysis