poss_dataset_ids = dataset_info
.map(d => d.dataset_id)
.filter(d => results.map(r => r.dataset_id).includes(d))
poss_method_ids = method_info
.map(d => d.method_id)
.filter(d => results.map(r => r.method_id).includes(d))
poss_metric_ids = metric_info
.map(d => d.metric_id)
.filter(d => results.map(r => Object.keys(r.scaled_scores)).flat().includes(d))
Predict Modality
Predicting the profiles of one modality (e.g. protein abundance) from another (e.g. mRNA expression).
6 datasets · 4 methods · 4 control methods · 8 metrics
Info
Repository
Issues
build_main
MIT
Task info Method info Metric info Dataset info Results
Experimental techniques to measure multiple modalities within the same single cell are increasingly becoming available. The demand for these measurements is driven by the promise to provide a deeper insight into the state of a cell. Yet, the modalities are also intrinsically linked. We know that DNA must be accessible (ATAC data) to produce mRNA (expression data), and mRNA in turn is used as a template to produce protein (protein abundance). These processes are regulated often by the same molecules that they produce: for example, a protein may bind DNA to prevent the production of more mRNA. Understanding these regulatory processes would be transformative for synthetic biology and drug target discovery. Any method that can predict a modality from another must have accounted for these regulatory processes, but the demand for multi-modal data shows that this is not trivial.
Summary
Display settings
Filter datasets
Filter methods
Filter metrics
Results
Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets.
Dataset info
Show
NeurIPS2021 CITE-Seq (GEX2ADT)
Data source · 25-11-2024 · 688.47 KiB
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
NeurIPS2021 Multiome (GEX2ATAC)
Data source · 25-11-2024 · 29.64 MiB
Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
NeurIPS2021 Multiome (ATAC2GEX)
Data source · 25-11-2024 · 7.52 MiB
Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
OpenProblems NeurIPS2022 CITE-Seq (GEX2ADT)
Data source · 25-11-2024 · 578.01 KiB
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (... 2024).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
OpenProblems NeurIPS2022 CITE-Seq (ADT2GEX)
Data source · 25-11-2024 · 31.04 MiB
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (... 2024).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
NeurIPS2021 CITE-Seq (ADT2GEX)
Data source · 25-11-2024 · 12.84 MiB
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
Method info
Show
KNNR (Py)
Documentation · Repository · Source Code · Container · build_main
K-nearest neighbor regression in Python (Fix and Hodges 1989)
K-nearest neighbor regression in Python.
KNNR (R)
Documentation · Repository · Source Code · Container · build_main
K-nearest neighbor regression in R (Fix and Hodges 1989)
K-nearest neighbor regression in R.
Linear Model
Documentation · Repository · Source Code · Container · build_main
Linear model regression (Wilkinson and Rogers 1973)
A linear model regression method.
Guanlab-dengkw
Documentation · Repository · Source Code · Container · build_main
A kernel ridge regression method with RBF kernel (Lance et al. 2022)
This is a solution developed by Team Guanlab - dengkw in the Neurips 2021 competition to predict one modality from another using kernel ridge regression (KRR) with RBF kernel. Truncated SVD is applied on the combined training and test data from modality 1 followed by row-wise z-score normalization on the reduced matrix. The truncated SVD of modality 2 is predicted by training a KRR model on the normalized training matrix of modality 1. Predictions on the normalized test matrix are then re-mapped to the modality 2 feature space via the right singular vectors.
Control method info
Show
Mean per gene
Documentation · Repository · Source Code · Container · build_main
Returns the mean expression value per gene
Returns the mean expression value per gene.
Random predictions
Documentation · Repository · Source Code · Container · build_main
Returns random training profiles
Returns random training profiles.
Zeros
Documentation · Repository · Source Code · Container · build_main
Returns a prediction consisting of all zeros
Returns a prediction consisting of all zeros.
Solution
Documentation · Repository · Source Code · Container · build_main
Returns the ground-truth solution
Returns the ground-truth solution.
Metric info
Show
Mean pearson per cell
The mean of the pearson values of per-cell expression value vectors (1895).
The mean of the pearson values of per-cell expression value vectors.
Mean spearman per cell
The mean of the spearman values of per-cell expression value vectors (KENDALL 1938).
The mean of the spearman values of per-cell expression value vectors.
Mean pearson per gene
The mean of the pearson values of per-gene expression value vectors (1895).
The mean of the pearson values of per-gene expression value vectors.
Mean spearman per gene
The mean of the spearman values of per-gene expression value vectors (KENDALL 1938).
The mean of the spearman values of per-gene expression value vectors.
Overall pearson
The mean of the pearson values of vectorized expression matrices (1895).
The mean of the pearson values of vectorized expression matrices.
Overall spearman
The mean of the spearman values of vectorized expression matrices (KENDALL 1938).
The mean of the spearman values of vectorized expression matrices.
RMSE
The root mean squared error (Chai and Draxler 2014).
The square root of the mean of the square of all of the error.
MAE
The mean absolute error (Chai and Draxler 2014).
The average difference between the expression values and the predicted expression values.
Quality control results
Show
Category | Name | Value | Condition | Severity |
---|---|---|---|---|
Raw results | Dataset 'openproblems_neurips2022/pbmc_multiome/swap' %missing | 0.3611111 | pct_missing <= .1 | ✗✗✗ |
Dataset info | Pct 'task_id' missing | 1.0000000 | percent_missing(dataset_info, field) | ✗✗ |
Method info | Pct 'paper_reference' missing | 0.5555556 | percent_missing(method_info, field) | ✗✗ |
Metric info | Pct 'paper_reference' missing | 1.0000000 | percent_missing(metric_info, field) | ✗✗ |
Raw results | Method 'guanlab_dengkw_pm' %missing | 0.2500000 | pct_missing <= .1 | ✗✗ |
Raw results | Method 'zeros' %missing | 0.2500000 | pct_missing <= .1 | ✗✗ |
Scaling | Worst score lmds_irlba_rf overall_pearson | -2.4102000 | worst_score >= -1 | ✗✗ |
Raw results | Metric 'overall_pearson' %missing | 0.1666667 | pct_missing <= .1 | ✗ |
Raw results | Metric 'overall_spearman' %missing | 0.1666667 | pct_missing <= .1 | ✗ |
Raw results | Dataset 'openproblems_neurips2022/pbmc_multiome/normal' %missing | 0.1388889 | pct_missing <= .1 | ✗ |
Raw results | Method 'knnr_py' %missing | 0.1250000 | pct_missing <= .1 | ✗ |
Raw results | Method 'lm' %missing | 0.1250000 | pct_missing <= .1 | ✗ |
Normalisation visualisation
Show
References
1895. Proceedings of the Royal Society of London 58 (347–352): 240–42. https://doi.org/10.1098/rspl.1895.0041.
... 2024. “Predicting Cellular Profiles Across Modalities in Longitudinal Single-Cell Data: An Open Problems Competition.” In Preparation.
Chai, T., and R. R. Draxler. 2014. “Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)?” February. https://doi.org/10.5194/gmdd-7-1525-2014.
Fix, Evelyn, and J. L. Hodges. 1989. “Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties.” International Statistical Review / Revue Internationale de Statistique 57 (3): 238. https://doi.org/10.2307/1403797.
KENDALL, M. G. 1938. “A NEW MEASURE OF RANK CORRELATION.” Biometrika 30 (1–2): 81–93. https://doi.org/10.1093/biomet/30.1-2.81.
Lance, Christopher, Malte D. Luecken, Daniel B. Burkhardt, Robrecht Cannoodt, Pia Rautenstrauch, Anna Laddach, Aidyn Ubingazhibov, et al. 2022. “Multimodal Single Cell Data Integration Challenge: Results and Lessons Learned,” April. https://doi.org/10.1101/2022.04.11.487796.
Luecken, Malte, Daniel Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann Chen, et al. 2021. “A Sandbox for Prediction and Integration of DNA, RNA, and Proteins in Single Cells.” In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, edited by J. Vanschoren and S. Yeung. Vol. 1. Curran. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/158f3069a435b314a80bdcb024f8e422-Paper-round2.pdf.
Wilkinson, G. N., and C. E. Rogers. 1973. “Symbolic Description of Factorial Models for Analysis of Variance.” Applied Statistics 22 (3): 392. https://doi.org/10.2307/2346786.