Gene Regulatory Network Inference as Relaxed Graph Matching.
Deborah Weighill, Marouen Ben Guebila, Camila Lopes-Ramos, Kimberly Glass, John Quackenbush, John Platig, Rebekka Burkholz.
Gene regulatory network inference is instrumental to the discovery of genetic mechanisms driving diverse diseases, including cancer. Here, we present a theoretical framework for PANDA, an established method for gene regulatory network inference. PANDA is based on iterative message passing updates that resemble the gradient descent of an optimization problem, OTTER, which can be interpreted as relaxed inexact graph matching between a gene-gene co-expression and a protein-protein interaction matrix. The solutions of OTTER can be derived explicitly and inspire an alternative spectral algorithm, for which we can provide network recovery guarantees. We compare different solution approaches of OTTER to other inference methods using three biological data sets, which we make publicly available to offer a new application venue for relaxed graph matching in gene regulatory network inference. We find that using modern gradient descent methods with superior convergence properties solving OTTER outperforms state-of-the-art gene regulatory network inference methods in predicting binding of transcription factors to regulatory regions.
Several OTTER networks are available in GRAND database. In addition, the raw data for reconstruction and benchmarking of the networks are provided below.
|expressed_genes_tissue.txt||Column (gene) names of the gene regulatory matrix W or the initial guess W0. They are also the node names of the correlation matrix C.||breast, cervix, liver, liver_tcga_gtex|
|expressed_tf_names_tissue.txt||Row (TF) names of the gene regulatory matrix W or the initial guess W0. They are also the node names of the protein-protein interaction matrix P.||breast, cervix, liver, liver_tcga_gtex|
|motif_prior_matrix_tissue_otter.txt||Initial gene regulatory network W0, which was constructed based on TF binding motifs in the human reference genome. Row names: TF names in order of file expressed_tf_names_tissue.txt Column names: Gene names in order of file expressed_genes_tissue.txt||breast, cervix, liver, liver_tcga_gtex|
|PPI_matrix_tissue.txt||Protein-protein interaction matrix P. The node names are provided in the file expressed_tf_names_tissue.txt||breast, cervix, liver, liver_tcga_gtex|
|tcga_tissue_TPM_otter.txt||Gene expression data. Columns refer two samples (i.e. people) and rows to genes. The gene names are defined in expressed_genes_tissue.txt. This data is used to compute the correlation matrix C.||breast, cervix, tcga_liver|
|corTissue.csv||Correlation matrix C. Rows and columns refer to genes with names defined in expressed_genes_tissue.txt||breast, cervix, liver|
|chipseq_postive_edges_tissue.txt||Validation set. Existing edges between TFs and genes. All TFs in the first column are tested with all genes. Thus, if an edge between any TFs in the first column and any other gene is not listed, it was not measured and counts as non-existent in the validation.||breast, cervix, liver|
|otterTissue.txt||Inferred gene regulatory network by optimizing OTTER with gradient descent.||breast, cervix, liver|
|otterLiverTCGA.csv||Inferred gene regulatory network by optimizing OTTER with gradient descent. Liver cancer tissue. For comparison with the normal tissue network (otterLiverGTEX.csv), it has the same set of nodes.||liver_tcga|
|otterLiverGTEX.csv||Inferred gene regulatory network by optimizing OTTER with gradient descent. Normal liver tissue. For comparison with the liver cancer tissue network (otterLiverTCGA.csv), it has the same set of nodes.||liver_gtex|
|Supplementary_Figures.zip||Result figures of GO term enrichment analysis comparing TCGA and GTEx OTTER networks||Supplementary figures|
|Supplementary_Tables.zip||Result tables of GO term enrichment analysis comparing TCGA and GTEx OTTER networks||Supplementary tables|
|Otter_AAAI2021-12.pdf||Supplementary material file||Supplementary material|
The following netbooks use OTTER: