Published (Version of record)CC BY V4.0, Open Access
Abstract
Non-coding genetic variants/mutations can play functional roles in the cell by disrupting regulatory interactions between transcription factors (TFs) and their genomic target sites. For most human TFs, a myriad of DNA-binding models are available and could be used to predict the effects of DNA mutations on TF binding. However, information on the quality of these models is scarce, making it hard to evaluate the statistical significance of predicted binding changes. Here, we present QBiC-Pred, a web server for predicting quantitative TF binding changes due to nucleotide variants. QBiC-Pred uses regression models of TF binding specificity trained on high-throughput in vitro data. The training is done using ordinary least squares (OLS), and we leverage distributional results associated with OLS estimation to compute, for each predicted change in TF binding, a P-value reflecting our confidence in the predicted effect. We show that OLS models are accurate in predicting the effects of mutations on TF binding in vitro and in vivo, outperforming widely-used PWM models as well as recently developed deep learning models of specificity. QBiC-Pred takes as input mutation datasets in several formats, and it allows post-processing of the results through a user-friendly web interface. QBiC-Pred is freely available at http://qbic.genome.duke.edu.
The authors would like to thank Brendan Frey, Leo J. Lee and Alice Gao (University of Toronto) for help with training DeepBind PBM models on the same datasets used for our OLS models. High-performance computing was partially supported by the Duke Center for Genomic and Computational Biology.
National Institutes of Health [R01-GM117106 to R.G.]; National Science Foundation [MCB-1715589 to R.G.]. Funding for open access charge: NIH Grant [R01-GM117106].