Examples

Clustering algorithms (clustbench)

We have ported the clustbench clustering benchmark (Gagolewski, 2022) to evaluate 105 datasets with a known ground truth and 27 methods using six partition metrics (see table below).

If present, we included noisy points during the clustering process, but ignored them when calculating the performance metrics.

We grouped datasets and methods according to their generator and/or software environment, hence writing modules able to run multiple methods on demand. The Clustering_conda.yml manifest makes use of this parametrization to specify the dataset/method/metric to be run from a benchmarking module.

Beyond Conda, we also designed EasyBuild and Apptainer execution environments to evaluate the impact of the software backend on benchmarking results, both in terms of algorithmic outcomes (e.g., clustering metrics) and computational performance.

Components

Stage	Module	Components	Count
Data	fcps	atom, chainlink, engytime, hepta, lsun, target, tetra, twodiamonds, wingnut	9
	graves	graves1--graves12	12
	other	aggregation, aniso, blobs, circles, complex9v1--complex9v55	59
	sipu	a1, a2, a3, a4, dim032, dim064, dim128, dim256, g2--g6, s1--s4, unbalance, triangle1, triangle2	20
	uci	iris, wine, yeast	3
	wut	spiral, zigzag_outliers	2
Clustering	fastcluster	complete, ward, average, weighted, median, centroid	6
	sklearn	birch, kmeans, spectral, gm	4
	agglomerative	average, complete, ward	3
	genieclust	genie, gic, ica	3
	fcps	Minimax, MinEnergy, HDBSCAN_2/4/8, Diana, Fanny, Hardcl, Softcl, Clara, PAM	11
Metrics	partition_metrics	normalized_clustering_accuracy, adjusted_fm_score, adjusted_mi_score, adjusted_rand_score, fm_score, mi_score	6

Running clustbench with Omnibenchmark

To run the benchmark using Conda as a software backend:

git clone git@github.com:omnibenchmark/clustering_example.git
cd clustering_example

ob run Clustering_conda.yml

Community Benchmarks

These are benchmarks organized by the community using Omnibenchmark.

Git repository	Description	API Version	License	DOI
OB_GSEA	Comparison of 17 single-sample gene set enrichment analysis (GSEA) methods across diverse datasets	0.3.2	MIT	—
scrna-bench	End-to-end scRNA-seq analysis pipelines, from HDF5 inputs to PCA, clustering, and other outputs	0.4.0	MIT	10.5281/zenodo.19886347, 10.64898/2026.05.01.722166
CyTOF pipelines	Comparison of CyTOF preprocessing and clustering pipelines	0.3.2	MIT	10.64898/2026.06.02.729500
VarCallBench	Comparison of variant callers for long read RNA-seq data	0.5.1	MIT	10.5281/zenodo.19857089, 10.64898/2026.04.29.721619
OB_LONGREAD_GERMLINE_SV_CALLERS	Benchmarking long-read germline structural variant callers for research and clinic using both linear and pangenome-graph references with a novel dataset	0.3.2	MIT	—
OB_SV_BENCHMARK_GERMLINE_SHORTREADS	An Omnibenchmark workflow running multiple structural-variant (SV) callers on short-reads data and benchmarking their results against a single truth set with Truvari	0.3.2	MIT	—

Have a benchmark to share? Open a PR (or an issue) to add it to this section.