Last updated: 2023-08-16
Checks: 6 1
Knit directory: m6A_in_disease_genetics/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230331)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
absolute | relative |
---|---|
~/projects/m6A_in_disease_genetics/code/ctwas/ctwas_config_b37.R | code/ctwas/ctwas_config_b37.R |
~/projects/m6A_in_disease_genetics/code/ctwas/qiansheng/locus_plot.R | code/ctwas/qiansheng/locus_plot.R |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version a1e2443. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .ipynb_checkpoints/
Ignored: analysis/m6A_switch_to_disease_h2g.nb.html
Ignored: data/plots/
Untracked files:
Untracked: HMGCR_locus_gene_tracks.pdf
Untracked: Rplots.pdf
Untracked: analysis/.ipynb_checkpoints/
Untracked: analysis/IBD_E_S_m6A.Rmd
Untracked: analysis/IBD_E_S_m6A_output.Rmd
Untracked: analysis/LDL_E_S_m6A.Rmd
Untracked: analysis/LDL_m6A_output.Rmd
Untracked: analysis/RA_m6A_output.Rmd
Untracked: analysis/WhiteBlood_WholeBlood_E_M.Rmd
Untracked: analysis/identify_m6A_mechanisms_with_finemapping.Rmd
Untracked: analysis/lymph_m6A_output.Rmd
Untracked: analysis/pre_weights_m6AQTL.txt
Untracked: analysis/rbc_E_S_m6A_output.Rmd
Untracked: analysis/rbc_m6A_output.Rmd
Untracked: analysis/wbc_E_S_m6A_output.Rmd
Untracked: code/.ipynb_checkpoints/
Untracked: code/all_m6a_sites_with_paired_cisNATs_summary.csv
Untracked: code/annotating_fine-mapped_m6A_QTLs.Rmd
Untracked: code/check_double_strand.ipynb
Untracked: code/check_double_strand_v2.ipynb
Untracked: code/ctwas/
Untracked: code/figure/
Untracked: code/learn_gviz.Rmd
Untracked: code/learn_gviz.html
Untracked: code/learn_gviz.nb.html
Untracked: code/m6AQTL_finemapping.Rmd
Untracked: code/summary_TWAS_coloc_m6A_2023.Rmd
Untracked: code/test_gviz.ipynb
Untracked: code/twas_genes_PP4_0.3_immune_traits_trackplots.pdf
Untracked: data/.ipynb_checkpoints/
Untracked: data/ADCY7_gwas_input.tsv
Untracked: data/ADCY7_qtl_input.tsv
Untracked: data/Allergy_full_coloc.txt
Untracked: data/Asthma_full_coloc.txt
Untracked: data/CAD_full_coloc.txt
Untracked: data/Eosinophil_count_full_coloc.txt
Untracked: data/GSE125377_jointPeakReadCount.txt
Untracked: data/HMGCR_ctwas_dat.Rd
Untracked: data/IBD_full_coloc.txt
Untracked: data/JointPeaks.bed
Untracked: data/Li2022_dsRNAs.xlsx
Untracked: data/Lupus_full_coloc.txt
Untracked: data/RA_full_coloc.txt
Untracked: data/TABLE1_hg19.txt
Untracked: data/TABLE1_hg19.txt.zip
Untracked: data/__MACOSX/
Untracked: data/coloc_blood_traits.csv
Untracked: data/crohns_disease_full_coloc.txt
Untracked: data/edit_sites_and_GE_neg_correlated.txt
Untracked: data/edit_sites_and_GE_pos_correlated.txt
Untracked: data/features
Untracked: data/human_EERs.csv
Untracked: data/human_EERs.txt
Untracked: data/lymph_full_coloc.txt
Untracked: data/m6A_TWAS_results.csv
Untracked: data/m6a_TWAS_genes.txt
Untracked: data/m6a_joint_calling_peaks.csv
Untracked: data/nasser_2021_ABC_IBD_genes.txt
Untracked: data/nat_sense_pairs.csv
Untracked: data/plt_full_coloc.txt
Untracked: data/rbc_full_coloc.txt
Untracked: data/rdw_full_coloc.txt
Untracked: data/reported_AS_targets_S1.txt
Untracked: data/reported_AS_wanowska.txt
Untracked: data/sig_coloc_results/
Untracked: data/test_locuscomparer.pdf
Untracked: data/ulcerative_colitis_full_coloc.txt
Untracked: data/wbc_full_coloc.txt
Untracked: data/zhao_silver_genes.csv
Untracked: output/.ipynb_checkpoints/
Untracked: output/HMGCR_gene_track_plot.pdf
Untracked: output/HMGCR_locus_plot.pdf
Untracked: output/all_m6a_sites_with_cisNATs.csv
Untracked: output/all_m6a_sites_with_paired_cisNATs_summary.csv
Untracked: output/all_m6a_sites_with_paired_cisNATs_summary_PP40.3.csv
Untracked: output/all_m6a_sites_with_paired_cisNATs_summary_PP40.5.csv
Untracked: output/all_m6a_sites_with_paired_cis_NATs.csv
Untracked: output/fine_mapped_m6AQTLs_TWAS_genes_highPP4.rds
Untracked: output/gene_summary.csv
Untracked: output/immune_related_m6A_targets.csv
Untracked: output/m6aQTL_dsRNAs_PPP2R3C_PRORP.pdf
Untracked: output/m6a_peaks_nearby_dsRNAs.csv
Untracked: output/m6a_sites_near_all_dsRNAs_twas.csv
Untracked: output/m6a_sites_near_dsRNAs_coloc.csv
Untracked: output/m6a_sites_near_dsRNAs_twas.csv
Untracked: output/m6a_sites_near_dsRNAs_twas_summary.csv
Untracked: output/m6a_sites_overlapping_NAT_twas.csv
Untracked: output/m6a_sites_overlapping_dsRNAs_coloc.csv
Untracked: output/m6a_sites_overlapping_dsRNAs_twas.csv
Untracked: output/m6a_sites_overlapping_dsRegions.csv
Untracked: output/m6a_sites_overlapping_dsRegions_coloc.csv
Untracked: output/negatively_correlated_genes.txt
Untracked: output/postively_correlated_genes.txt
Untracked: output/rs1806261_RABEP1-NUP88_focused_locusview.pdf
Untracked: output/rs1806261_RABEP1-NUP88_locusview.pdf
Untracked: output/rs3177647_MAPKAPK5-AS1-MAPKAPK5_locusview.pdf
Untracked: output/rs3204541_DDX55-EIF2B1_locusview.pdf
Untracked: output/rs7184802_ADCY7-BRD7_locusview.pdf
Untracked: output/rs7184802_ADCY7_locuscompare.pdf
Untracked: output/twas_genes_PP4_0.3_immune_traits_trackplots.pdf
Untracked: output/twas_genes_PP4_0.5_blood_traits_trackplots.pdf
Untracked: output/twas_m6a_sites_with_all_cisNATs.RDS
Untracked: output/twas_m6a_sites_with_cisNATs_range.RDS
Untracked: output/twas_m6a_sites_with_the_nearest_cisNAT.RDS
Untracked: twas_genes_PP4_0.3_immune_traits_trackplots.pdf
Unstaged changes:
Modified: analysis/m6A_switch_to_disease_h2g.Rmd
Modified: analysis/wbc_m6A_output.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/rbc_m6A_output_hg19.Rmd
)
and HTML (docs/rbc_m6A_output_hg19.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | a1e2443 | Jing Gu | 2023-08-16 | added rbc |
# top 1 method
res <- impute_expr_z(z_snp, weight = weight, ld_R_dir = ld_R_dir,
method = NULL, outputdir = outputdir, outname = outname.e,
harmonize_z = T, harmonize_wgt = T, scale_by_ld_variance=F,
strand_ambig_action_z = "recover",
recover_strand_ambig_wgt = T
# lasso/elastic-net method
res <- impute_expr_z(z_snp, weight = weight, ld_R_dir = ld_R_dir,
method = NULL, outputdir = outputdir, outname = outname.e,
harmonize_z = T, harmonize_wgt = T, scale_by_ld_variance=F,
strand_ambig_action_z = "none",
recover_strand_ambig_wgt = F
GWAS: UK Biobank GWAS summary statistics - European individuals
Weights: FUSION weights using top1, lasso, or elastic-net models were converted into PredictDB format and were not needed to do scaling when running ctwas.
cTWAS analysis on m6A alone
[1] "Check convergence for the top1 model:"
[1] "Table of group size:"
SNP gene
8713250 888
SNP gene
estimated_group_prior 2.452e-04 2.534e-08
estimated_group_prior_var 2.551e+01 1.357e+01
estimated_group_pve 1.555e-01 8.711e-10
attributable_group_pve 1.000e+00 5.601e-09
[1] "Check convergence for the lasso model:"
[1] "Table of group size:"
SNP gene
8713250 912
SNP gene
estimated_group_prior 2.389e-04 1.809e-06
estimated_group_prior_var 2.384e+01 2.900e+01
estimated_group_pve 1.416e-01 1.365e-07
attributable_group_pve 1.000e+00 9.643e-07
$top1
$lasso
Joint analysis of expression, splicing and m6A
[1] "Check convergence for the top1 model when jointly analyzing expression, splicing and m6A:"
[1] "Table of group size before/after matching with UKBB SNPs:"
SNP eQTL sQTL m6AQTL
prior_group_size 9.324e+06 2005.0000 2191.000 918.0000
group_size 8.713e+06 1928.0000 2123.000 888.0000
percent_of_overlaps 9.345e-01 0.9616 0.969 0.9673
SNP eQTL sQTL m6AQTL
estimated_group_prior 2.364e-04 0.019161 8.127e-03 3.872e-09
estimated_group_prior_var 2.553e+01 28.495736 1.868e+01 8.508e+00
estimated_group_pve 1.500e-01 0.003004 9.196e-04 8.348e-11
attributable_group_pve 9.745e-01 0.019509 5.973e-03 5.422e-10
[1] "Check convergence for the lasso model when jointly analyzing expression, splicing and m6A:"
[1] "Table of group size before/after matching with UKBB SNPs:"
SNP eQTL sQTL m6AQTL
prior_group_size 9.324e+06 2005.0000 2191.000 918.0000
group_size 8.713e+06 1998.0000 2180.000 912.0000
percent_of_overlaps 9.345e-01 0.9965 0.995 0.9935
SNP eQTL sQTL m6AQTL
estimated_group_prior 2.143e-04 0.013424 2.429e-04 9.197e-11
estimated_group_prior_var 2.336e+01 62.619390 1.820e+01 2.083e+01
estimated_group_pve 1.245e-01 0.004792 2.749e-05 4.986e-12
attributable_group_pve 9.627e-01 0.037062 2.126e-04 3.856e-11
$top1
$lasso
Lasso model
[1] genename region_tag susie_pip z
<0 rows> (or 0-length row.names)
Summing up PIPs for m6A peaks located in the same gene
Top m6A PIPs by genes
# A tibble: 0 × 2
# ℹ 2 variables: genename <chr>, total_susie_pip <dbl>
For m6A or splicing QTLs, they are assigned to the nearest genes (m6A needs to be confirmed with Kevin).
Top SNPs or genes with PIP > 0.6
$eQTL
genename susie_pip group region_tag
1914 ADAM15 0.9934 eQTL 1_79
1921 STK36 0.9925 eQTL 2_129
1995 ENSG00000205559 0.9801 eQTL 22_24
1924 SMIM19 0.9462 eQTL 8_37
1929 ENSG00000180139 0.8333 eQTL 10_57
1933 RBMS2 0.7782 eQTL 12_36
1527 SKAP1 0.7562 eQTL 17_28
1979 ELL 0.7499 eQTL 19_16
749 TSC22D4 0.6867 eQTL 7_61
1922 NSUN2 0.6741 eQTL 5_6
1202 GPN3 0.6468 eQTL 12_67
1368 CSNK1G1 0.6460 eQTL 15_29
912 PTPA 0.6099 eQTL 9_66
$m6AQTL
[1] genename susie_pip group region_tag
<0 rows> (or 0-length row.names)
$sQTL
[1] genename susie_pip group region_tag
<0 rows> (or 0-length row.names)
genename region_tag susie_pip z
1 TAP2 6_27 3.170e-08 -6.439
2 C2CD2L 11_71 1.818e-08 4.267
3 PSKH1 16_37 1.489e-08 -11.068
4 FERMT3 11_36 1.443e-08 -3.254
5 DHX38 16_38 1.261e-08 3.456
6 ZNF282 7_92 1.259e-08 3.636
7 ISG20 15_41 1.254e-08 -3.563
8 CD2BP2 16_24 9.926e-09 11.194
9 PARVB 22_19 7.998e-09 2.964
10 SMG9 19_30 6.096e-09 6.864
Summing up PIPs for m6A peaks located in the same gene
Top 10 m6A PIPs by genes
# A tibble: 819 × 2
genename total_susie_pip
<chr> <dbl>
1 TAP2 0.0000000317
2 C2CD2L 0.0000000182
3 PSKH1 0.0000000149
4 FERMT3 0.0000000144
5 ISG20 0.0000000138
6 DHX38 0.0000000126
7 ZNF282 0.0000000126
8 CD2BP2 0.00000000993
9 PARVB 0.00000000800
10 SMG9 0.00000000610
# ℹ 809 more rows
peak_id genename pos region_tag susie_pip z
1 chr20:25204793-25205798 ENTPD6 25206654 20_19 0.31803 5.780
2 chr1:155679615-155686797 DAP3 155666961 1_79 0.03517 -1.429
3 chr11:66053043-66053172 YIF1A 65998757 11_37 0.03010 -2.807
4 chr15:75197572-75198619 FAM219B 75125645 15_35 0.02974 8.047
5 chr16:31503661-31504299 RUSF1 31470540 16_25 0.02597 -3.560
6 chr19:5717627-5719715 LONP1 5655792 19_5 0.02525 3.858
7 chr19:5714282-5719715 LONP1 5655792 19_5 0.02253 -3.822
8 chr8:38852924-38853732 TM2D2 38762056 8_35 0.02052 4.076
9 chr1:155658965-155664582 DAP3 155625602 1_79 0.01965 -1.188
10 chr1:93815558-93819465 DR1 93816400 1_57 0.01954 -6.505
Summing up PIPs for spliced introns located in the same gene
Top 10 splicing PIPs by genes
# A tibble: 10 × 2
genename total_susie_pip
<chr> <dbl>
1 ENTPD6 0.318
2 DAP3 0.0553
3 LONP1 0.0478
4 DR1 0.0384
5 CIAO2A 0.0361
6 ENOSF1 0.0311
7 ATXN2L 0.0306
8 YIF1A 0.0304
9 FAM219B 0.0297
10 TRAF3IP3 0.0272
genename combined_pip expression_pip splicing_pip m6A_pip
81 ADAM15 0.993 0.9934 0.0000000 0.000e+00
2862 STK36 0.993 0.9925 0.0000000 0.000e+00
900 ENSG00000205559 0.980 0.9801 0.0000000 0.000e+00
2760 SMIM19 0.946 0.9462 0.0000000 0.000e+00
874 ENSG00000180139 0.833 0.8333 0.0000000 0.000e+00
2459 RBMS2 0.778 0.7782 0.0000000 0.000e+00
2687 SKAP1 0.760 0.7562 0.0037724 0.000e+00
826 ELL 0.750 0.7499 0.0000000 0.000e+00
3120 TSC22D4 0.687 0.6867 0.0000000 0.000e+00
2070 NSUN2 0.676 0.6741 0.0024136 0.000e+00
1420 GPN3 0.647 0.6468 0.0000000 0.000e+00
624 CSNK1G1 0.646 0.6460 0.0000000 4.822e-09
2386 PTPA 0.610 0.6099 0.0000000 0.000e+00
829 ELOA 0.579 0.5790 0.0000000 0.000e+00
2758 SMG9 0.572 0.5724 0.0000000 6.096e-09
460 CD22 0.569 0.5691 0.0001497 5.656e-11
306 BIK 0.552 0.5519 0.0000000 0.000e+00
1874 MPI 0.544 0.5259 0.0176415 0.000e+00
945 ENSG00000226526 0.519 0.5187 0.0000000 0.000e+00
665 DAP 0.500 0.4977 0.0018242 5.655e-10
region_tag
81 1_79
2862 2_129
900 22_24
2760 8_37
874 10_57
2459 12_36
2687 17_28
826 19_16
3120 7_61
2070 5_6
1420 12_67
624 15_29
2386 9_66
829 1_17
2758 19_30
460 19_25
306 22_18
1874 15_35
945 1_12
665 5_9
Loading required package: grid
Warning: replacing previous import 'utils::download.file' by
'restfulr::download.file' when loading 'rtracklayer'
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] biomaRt_2.52.0 Gviz_1.40.1 cowplot_1.1.1
[4] ggplot2_3.4.3 GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
[7] IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0
[10] ctwas_0.1.38 dplyr_1.1.2 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 deldir_1.0-6
[3] rjson_0.2.21 rprojroot_2.0.3
[5] biovizBase_1.44.0 htmlTable_2.4.0
[7] XVector_0.36.0 base64enc_0.1-3
[9] fs_1.6.3 dichromat_2.0-0.1
[11] rstudioapi_0.15.0 farver_2.1.1
[13] bit64_4.0.5 AnnotationDbi_1.58.0
[15] fansi_1.0.4 xml2_1.3.3
[17] codetools_0.2-18 logging_0.10-108
[19] cachem_1.0.8 knitr_1.39
[21] Formula_1.2-4 jsonlite_1.8.7
[23] Rsamtools_2.12.0 cluster_2.1.3
[25] dbplyr_2.3.3 png_0.1-7
[27] compiler_4.2.0 httr_1.4.6
[29] backports_1.4.1 lazyeval_0.2.2
[31] Matrix_1.6-1 fastmap_1.1.1
[33] cli_3.6.1 later_1.3.0
[35] htmltools_0.5.2 prettyunits_1.1.1
[37] tools_4.2.0 gtable_0.3.3
[39] glue_1.6.2 GenomeInfoDbData_1.2.8
[41] rappdirs_0.3.3 Rcpp_1.0.11
[43] Biobase_2.56.0 jquerylib_0.1.4
[45] vctrs_0.6.3 Biostrings_2.64.0
[47] rtracklayer_1.56.0 iterators_1.0.14
[49] xfun_0.30 stringr_1.5.0
[51] ps_1.7.0 lifecycle_1.0.3
[53] ensembldb_2.20.2 restfulr_0.0.14
[55] XML_3.99-0.14 getPass_0.2-2
[57] zlibbioc_1.42.0 scales_1.2.1
[59] BSgenome_1.64.0 VariantAnnotation_1.42.1
[61] ProtGenerics_1.28.0 hms_1.1.3
[63] promises_1.2.0.1 MatrixGenerics_1.8.0
[65] parallel_4.2.0 SummarizedExperiment_1.26.1
[67] AnnotationFilter_1.20.0 RColorBrewer_1.1-3
[69] yaml_2.3.5 curl_5.0.2
[71] memoise_2.0.1 gridExtra_2.3
[73] sass_0.4.1 rpart_4.1.16
[75] latticeExtra_0.6-30 stringi_1.7.12
[77] RSQLite_2.3.1 highr_0.9
[79] BiocIO_1.6.0 foreach_1.5.2
[81] checkmate_2.1.0 GenomicFeatures_1.48.4
[83] filelock_1.0.2 BiocParallel_1.30.3
[85] rlang_1.1.1 pkgconfig_2.0.3
[87] matrixStats_0.62.0 bitops_1.0-7
[89] evaluate_0.15 lattice_0.20-45
[91] htmlwidgets_1.5.4 GenomicAlignments_1.32.0
[93] labeling_0.4.2 bit_4.0.5
[95] processx_3.8.0 tidyselect_1.2.0
[97] magrittr_2.0.3 R6_2.5.1
[99] generics_0.1.3 Hmisc_5.1-0
[101] DelayedArray_0.22.0 DBI_1.1.3
[103] pgenlibr_0.3.6 pillar_1.9.0
[105] whisker_0.4 foreign_0.8-82
[107] withr_2.5.0 KEGGREST_1.36.2
[109] RCurl_1.98-1.7 nnet_7.3-17
[111] tibble_3.2.1 crayon_1.5.2
[113] interp_1.1-4 utf8_1.2.3
[115] BiocFileCache_2.4.0 rmarkdown_2.14
[117] jpeg_0.1-10 progress_1.2.2
[119] data.table_1.14.8 blob_1.2.4
[121] callr_3.7.3 git2r_0.30.1
[123] digest_0.6.33 httpuv_1.6.5
[125] munsell_0.5.0 bslib_0.3.1