Single-cell ATAC sequencing (scATAC-seq) is a powerful and increasingly popular technique to explore the regulatory landscape of heterogeneous cellular populations. However, the high noise levels, degree of sparsity, and scale of the generated data make its analysis challenging. Here, we present PeakVI, a probabilistic framework that leverages deep neural networks to analyze scATAC-seq data. PeakVI fits an informative latent space that preserves biological heterogeneity while correcting batch effects and accounting for technical effects, such as library size and region-specific biases. In addition, PeakVI provides a technique for identifying differential accessibility at a single-region resolution, which can be used for cell-type annotation as well as identification of key cis-regulatory elements. We use public datasets to demonstrate that PeakVI is scalable, stable, robust to low-quality data, and outperforms current analysis methods on a range of critical analysis tasks. PeakVI is publicly available and implemented in the scvi-tools framework.
deep learning single-cell ATAC-seq single-cell chromatin accessibility single-cell genomics
Details
Title
PeakVI: A deep generative model for single-cell chromatin accessibility analysis
Creators
Tal Ashuach - University of California, Berkeley
Daniel A. Reidenbach - University of California, Berkeley
Adam Gayoso - University of California, Berkeley
Nir Yosef (Corresponding Author) - University of California, Berkeley
We thank Florian Wimmers for many helpful discussions and insightful feedback. We thank Christina Usher for assistance with visualizations. This work was funded by Chan Zuckerberg Foundation Network grant no. 2019-02452 and NIH–NIAID grant U19 AI090023.
Author contributions - T.A. and N.Y. conceived of the model and designed the analyses. T.A. implemented the model with input from A.G. T.A. and D.A.R. performed the analyses. N.Y. supervised the work. T.A. and N.Y. wrote the manuscript.