Last updated: 2023-08-17
Checks: 6 1
Knit directory: m6A_in_disease_genetics/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230331)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
absolute | relative |
---|---|
~/projects/m6A_in_disease_genetics/code/ctwas/ctwas_config_b37.R | code/ctwas/ctwas_config_b37.R |
~/projects/m6A_in_disease_genetics/code/ctwas/qiansheng/locus_plot.R | code/ctwas/qiansheng/locus_plot.R |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 06e2427. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .ipynb_checkpoints/
Ignored: analysis/m6A_switch_to_disease_h2g.nb.html
Ignored: data/plots/
Untracked files:
Untracked: HMGCR_locus_gene_tracks.pdf
Untracked: Rplots.pdf
Untracked: analysis/.ipynb_checkpoints/
Untracked: analysis/IBD_E_S_m6A.Rmd
Untracked: analysis/IBD_E_S_m6A_output.Rmd
Untracked: analysis/LDL_E_S_m6A.Rmd
Untracked: analysis/LDL_m6A_output.Rmd
Untracked: analysis/RA_m6A_output.Rmd
Untracked: analysis/WhiteBlood_WholeBlood_E_M.Rmd
Untracked: analysis/identify_m6A_mechanisms_with_finemapping.Rmd
Untracked: analysis/lymph_m6A_output.Rmd
Untracked: analysis/pre_weights_m6AQTL.txt
Untracked: analysis/rbc_E_S_m6A_output.Rmd
Untracked: analysis/rbc_m6A_output.Rmd
Untracked: analysis/wbc_E_S_m6A_output.Rmd
Untracked: code/.ipynb_checkpoints/
Untracked: code/all_m6a_sites_with_paired_cisNATs_summary.csv
Untracked: code/annotating_fine-mapped_m6A_QTLs.Rmd
Untracked: code/check_double_strand.ipynb
Untracked: code/check_double_strand_v2.ipynb
Untracked: code/ctwas/
Untracked: code/figure/
Untracked: code/learn_gviz.Rmd
Untracked: code/learn_gviz.html
Untracked: code/learn_gviz.nb.html
Untracked: code/m6AQTL_finemapping.Rmd
Untracked: code/summary_TWAS_coloc_m6A_2023.Rmd
Untracked: code/test_gviz.ipynb
Untracked: code/twas_genes_PP4_0.3_immune_traits_trackplots.pdf
Untracked: data/.ipynb_checkpoints/
Untracked: data/ADCY7_gwas_input.tsv
Untracked: data/ADCY7_qtl_input.tsv
Untracked: data/Allergy_full_coloc.txt
Untracked: data/Asthma_full_coloc.txt
Untracked: data/CAD_full_coloc.txt
Untracked: data/Eosinophil_count_full_coloc.txt
Untracked: data/GSE125377_jointPeakReadCount.txt
Untracked: data/HMGCR_ctwas_dat.Rd
Untracked: data/IBD_full_coloc.txt
Untracked: data/JointPeaks.bed
Untracked: data/Li2022_dsRNAs.xlsx
Untracked: data/Lupus_full_coloc.txt
Untracked: data/RA_full_coloc.txt
Untracked: data/TABLE1_hg19.txt
Untracked: data/TABLE1_hg19.txt.zip
Untracked: data/__MACOSX/
Untracked: data/coloc_blood_traits.csv
Untracked: data/crohns_disease_full_coloc.txt
Untracked: data/edit_sites_and_GE_neg_correlated.txt
Untracked: data/edit_sites_and_GE_pos_correlated.txt
Untracked: data/features
Untracked: data/human_EERs.csv
Untracked: data/human_EERs.txt
Untracked: data/lymph_full_coloc.txt
Untracked: data/m6A_TWAS_results.csv
Untracked: data/m6a_TWAS_genes.txt
Untracked: data/m6a_joint_calling_peaks.csv
Untracked: data/nasser_2021_ABC_IBD_genes.txt
Untracked: data/nat_sense_pairs.csv
Untracked: data/plt_full_coloc.txt
Untracked: data/rbc_full_coloc.txt
Untracked: data/rdw_full_coloc.txt
Untracked: data/reported_AS_targets_S1.txt
Untracked: data/reported_AS_wanowska.txt
Untracked: data/sig_coloc_results/
Untracked: data/test_locuscomparer.pdf
Untracked: data/ulcerative_colitis_full_coloc.txt
Untracked: data/wbc_full_coloc.txt
Untracked: data/zhao_silver_genes.csv
Untracked: output/.ipynb_checkpoints/
Untracked: output/HMGCR_gene_track_plot.pdf
Untracked: output/HMGCR_locus_plot.pdf
Untracked: output/all_m6a_sites_with_cisNATs.csv
Untracked: output/all_m6a_sites_with_paired_cisNATs_summary.csv
Untracked: output/all_m6a_sites_with_paired_cisNATs_summary_PP40.3.csv
Untracked: output/all_m6a_sites_with_paired_cisNATs_summary_PP40.5.csv
Untracked: output/all_m6a_sites_with_paired_cis_NATs.csv
Untracked: output/fine_mapped_m6AQTLs_TWAS_genes_highPP4.rds
Untracked: output/gene_summary.csv
Untracked: output/immune_related_m6A_targets.csv
Untracked: output/m6aQTL_dsRNAs_PPP2R3C_PRORP.pdf
Untracked: output/m6a_peaks_nearby_dsRNAs.csv
Untracked: output/m6a_sites_near_all_dsRNAs_twas.csv
Untracked: output/m6a_sites_near_dsRNAs_coloc.csv
Untracked: output/m6a_sites_near_dsRNAs_twas.csv
Untracked: output/m6a_sites_near_dsRNAs_twas_summary.csv
Untracked: output/m6a_sites_overlapping_NAT_twas.csv
Untracked: output/m6a_sites_overlapping_dsRNAs_coloc.csv
Untracked: output/m6a_sites_overlapping_dsRNAs_twas.csv
Untracked: output/m6a_sites_overlapping_dsRegions.csv
Untracked: output/m6a_sites_overlapping_dsRegions_coloc.csv
Untracked: output/negatively_correlated_genes.txt
Untracked: output/postively_correlated_genes.txt
Untracked: output/rs1806261_RABEP1-NUP88_focused_locusview.pdf
Untracked: output/rs1806261_RABEP1-NUP88_locusview.pdf
Untracked: output/rs3177647_MAPKAPK5-AS1-MAPKAPK5_locusview.pdf
Untracked: output/rs3204541_DDX55-EIF2B1_locusview.pdf
Untracked: output/rs7184802_ADCY7-BRD7_locusview.pdf
Untracked: output/rs7184802_ADCY7_locuscompare.pdf
Untracked: output/twas_genes_PP4_0.3_immune_traits_trackplots.pdf
Untracked: output/twas_genes_PP4_0.5_blood_traits_trackplots.pdf
Untracked: output/twas_m6a_sites_with_all_cisNATs.RDS
Untracked: output/twas_m6a_sites_with_cisNATs_range.RDS
Untracked: output/twas_m6a_sites_with_the_nearest_cisNAT.RDS
Untracked: twas_genes_PP4_0.3_immune_traits_trackplots.pdf
Unstaged changes:
Modified: analysis/lymph_m6A_output_hg19.Rmd
Modified: analysis/m6A_switch_to_disease_h2g.Rmd
Modified: analysis/wbc_m6A_output.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/lupus_m6A_output_hg19.Rmd
)
and HTML (docs/lupus_m6A_output_hg19.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 06e2427 | Jing Gu | 2023-08-17 | added lupus |
# top 1 method
res <- impute_expr_z(z_snp, weight = weight, ld_R_dir = ld_R_dir,
method = NULL, outputdir = outputdir, outname = outname.e,
harmonize_z = T, harmonize_wgt = T, scale_by_ld_variance=F,
strand_ambig_action_z = "recover",
recover_strand_ambig_wgt = T
# lasso/elastic-net method
res <- impute_expr_z(z_snp, weight = weight, ld_R_dir = ld_R_dir,
method = NULL, outputdir = outputdir, outname = outname.e,
harmonize_z = T, harmonize_wgt = T, scale_by_ld_variance=F,
strand_ambig_action_z = "none",
recover_strand_ambig_wgt = F
GWAS: UK Biobank GWAS summary statistics - European individuals
Weights: FUSION weights using top1, lasso, or elastic-net models were converted into PredictDB format and were not needed to do scaling when running ctwas.
cTWAS analysis on m6A alone
[1] "Check convergence for the top1 model:"
[1] "Table of group size:"
SNP gene
6731160 801
SNP gene
estimated_group_prior 0.0002642 0.0010266
estimated_group_prior_var 5.2751345 1.8792046
estimated_group_pve 0.6574782 0.0001083
attributable_group_pve 0.9998353 0.0001647
$top1
Joint analysis of expression, splicing and m6A
[1] "Check convergence for the lasso model when jointly analyzing expression, splicing and m6A:"
[1] "Table of group size before/after matching with UKBB SNPs:"
SNP eQTL sQTL m6AQTL
prior_group_size 9.324e+06 2005.0000 2191.0000 918.0000
group_size 6.731e+06 1984.0000 2149.0000 902.0000
percent_of_overlaps 7.219e-01 0.9895 0.9808 0.9826
SNP eQTL sQTL m6AQTL
estimated_group_prior 0.0002261 0.017777 0.002642 0.004184
estimated_group_prior_var 6.3614278 4.000501 3.878062 22.909135
estimated_group_pve 0.6787153 0.009889 0.001544 0.006061
attributable_group_pve 0.9748732 0.014205 0.002217 0.008705
$lasso
top1 model
[1] genename region_tag susie_pip z
<0 rows> (or 0-length row.names)
Summing up PIPs for m6A peaks located in the same gene
Top m6A PIPs by genes
# A tibble: 0 × 2
# ℹ 2 variables: genename <chr>, total_susie_pip <dbl>
For m6A or splicing QTLs, they are assigned to the nearest genes (m6A needs to be confirmed with Kevin).
Top SNPs or genes with PIP > 0.6
$eQTL
genename susie_pip group region_tag
1970 JADE2 0.6801 eQTL 5_80
$m6AQTL
genename susie_pip group region_tag
5032 MIR210HG 0.906 m6AQTL 11_1
$sQTL
[1] genename susie_pip group region_tag
<0 rows> (or 0-length row.names)
genename region_tag susie_pip z
1 MIR210HG 11_1 0.90599 6.8092
2 IRF5 7_80 0.16925 0.6668
3 ICOSLG 21_23 0.09542 -3.1780
4 SH2D3C 9_66 0.09282 3.0549
5 IRF3 19_34 0.07407 -3.6731
6 SCIMP 17_5 0.07083 2.9299
7 PSMB1 6_112 0.06819 -2.8428
8 MKRN2 3_9 0.06440 2.8379
9 SNRNP25 16_1 0.06333 -2.8226
10 DENND3 8_92 0.06004 2.5839
Summing up PIPs for m6A peaks located in the same gene
Top 10 m6A PIPs by genes
# A tibble: 809 × 2
genename total_susie_pip
<chr> <dbl>
1 MIR210HG 0.906
2 IRF5 0.169
3 SH2D3C 0.131
4 ICOSLG 0.101
5 IRF3 0.0741
6 SCIMP 0.0708
7 PSMB1 0.0682
8 MKRN2 0.0644
9 SNRNP25 0.0633
10 DENND3 0.0600
# ℹ 799 more rows
peak_id genename pos region_tag susie_pip z
1 chr22:42000157-42001876 DESI1 41914593 22_17 0.21438 3.351
2 chr20:44750537-44750690 CD40 44674743 20_28 0.15765 -3.713
3 chr14:24902238-24903306 KHNYN 24810413 14_4 0.14932 3.663
4 chr1:150240527-150241098 APH1A 150210120 1_75 0.14692 -3.263
5 chr19:50166515-50167931 IRF3 50093572 19_34 0.12050 -4.125
6 chr6:6589103-6625159 LY86 6521970 6_6 0.11113 3.131
7 chr12:6878840-6879010 PTMS 6870549 12_7 0.10886 -2.963
8 chr19:50168103-50168929 IRF3 50168871 19_34 0.09584 4.026
9 chr3:16305736-16306276 DPH3 16395668 3_12 0.09063 -2.866
10 chr20:5090091-5092142 TMEM230 5055138 20_4 0.08408 -3.286
Summing up PIPs for spliced introns located in the same gene
Top 10 splicing PIPs by genes
# A tibble: 10 × 2
genename total_susie_pip
<chr> <dbl>
1 CD40 0.309
2 IFI44L 0.307
3 WARS1 0.278
4 IRF3 0.223
5 DESI1 0.214
6 TMEM230 0.206
7 NADSYN1 0.177
8 LBP 0.163
9 PDCD2 0.163
10 MGST3 0.163
genename combined_pip expression_pip splicing_pip m6A_pip
1838 MIR210HG 0.906 0.00000 0.00000 0.905991
1602 JADE2 0.704 0.68005 0.02365 0.000000
1098 ENSG00000257073 0.488 0.48833 0.00000 0.000000
304 BIK 0.481 0.48108 0.00000 0.000000
3215 WARS1 0.468 0.19047 0.27791 0.000000
1315 FCRLA 0.419 0.37282 0.04602 0.000000
269 AZI2 0.378 0.37755 0.00000 0.000000
2960 TMA16 0.371 0.37074 0.00000 0.000000
760 DRAM2 0.360 0.24891 0.11092 0.000000
2763 SNX11 0.351 0.34579 0.00000 0.004929
1030 ENSG00000244625 0.348 0.34797 0.00000 0.000000
1954 NADSYN1 0.342 0.16519 0.17699 0.000000
694 DEF6 0.332 0.33198 0.00000 0.000000
1542 IFI44L 0.320 0.00000 0.30674 0.013162
467 CD40 0.309 0.00000 0.30888 0.000000
640 CTTNBP2NL 0.305 0.30490 0.00000 0.000000
2215 PITPNC1 0.303 0.30273 0.00000 0.000000
181 AP5B1 0.298 0.29767 0.00000 0.000000
2262 POLR1E 0.298 0.25222 0.01010 0.035767
1584 IRF3 0.297 0.00000 0.22294 0.074067
2554 RPS19BP1 0.297 0.29678 0.00000 0.000000
2965 TMCO4 0.296 0.29551 0.00000 0.000000
2006 NFRKB 0.288 0.28830 0.00000 0.000000
705 DESI1 0.285 0.07062 0.21438 0.000000
491 CDK11A 0.279 0.27856 0.00000 0.000000
151 ANKDD1A 0.265 0.26522 0.00000 0.000000
187 APIP 0.265 0.22475 0.04049 0.000000
1585 IRF5 0.265 0.01898 0.07662 0.169246
1830 MIB2 0.262 0.26245 0.00000 0.000000
719 DHTKD1 0.261 0.26086 0.00000 0.000000
2999 TMEM44 0.261 0.26106 0.00000 0.000000
2863 SWI5 0.259 0.25876 0.00000 0.000000
1428 GSDMD 0.255 0.19767 0.00000 0.057483
1827 MGST3 0.255 0.09258 0.16291 0.000000
824 ELL2 0.253 0.25318 0.00000 0.000000
3204 VPS16 0.253 0.25326 0.00000 0.000000
1682 LHPP 0.248 0.24766 0.00000 0.000000
210 ARL5B 0.247 0.24684 0.00000 0.000000
903 ENSG00000211891 0.247 0.24720 0.00000 0.000000
2370 PTPRA 0.242 0.20415 0.02904 0.008818
region_tag
1838 11_1
1602 5_80
1098 17_42
304 22_18
3215 14_52
1315 1_81
269 3_21
2960 4_105
760 1_69
2763 17_28
1030 22_8
1954 11_40
694 6_28
1542 1_48
467 20_28
640 1_69
2215 17_39
181 11_36
2262 9_28
1584 19_34
2554 22_16
2965 1_13
2006 11_80
705 22_17
491 1_1
151 15_30
187 11_23
1585 7_80
1830 1_1
719 10_10
2999 3_119
2863 9_66
1428 8_94
1827 1_83
824 5_56
3204 20_3
1682 10_78
210 10_15
903 14_55
2370 20_3
Loading required package: grid
Warning: replacing previous import 'utils::download.file' by
'restfulr::download.file' when loading 'rtracklayer'
Some literature that relates MIR210HG to m6A modification
genename combined_pip expression_pip splicing_pip m6A_pip region_tag
1838 MIR210HG 0.906 0 0 0.906 11_1
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] biomaRt_2.52.0 Gviz_1.40.1 cowplot_1.1.1
[4] ggplot2_3.4.3 GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
[7] IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0
[10] ctwas_0.1.38 dplyr_1.1.2 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 deldir_1.0-6
[3] rjson_0.2.21 rprojroot_2.0.3
[5] biovizBase_1.44.0 htmlTable_2.4.0
[7] XVector_0.36.0 base64enc_0.1-3
[9] fs_1.6.3 dichromat_2.0-0.1
[11] rstudioapi_0.15.0 farver_2.1.1
[13] bit64_4.0.5 AnnotationDbi_1.58.0
[15] fansi_1.0.4 xml2_1.3.3
[17] codetools_0.2-18 logging_0.10-108
[19] cachem_1.0.8 knitr_1.39
[21] Formula_1.2-4 jsonlite_1.8.7
[23] Rsamtools_2.12.0 cluster_2.1.3
[25] dbplyr_2.3.3 png_0.1-7
[27] compiler_4.2.0 httr_1.4.6
[29] backports_1.4.1 lazyeval_0.2.2
[31] Matrix_1.6-1 fastmap_1.1.1
[33] cli_3.6.1 later_1.3.0
[35] htmltools_0.5.2 prettyunits_1.1.1
[37] tools_4.2.0 gtable_0.3.3
[39] glue_1.6.2 GenomeInfoDbData_1.2.8
[41] rappdirs_0.3.3 Rcpp_1.0.11
[43] Biobase_2.56.0 jquerylib_0.1.4
[45] vctrs_0.6.3 Biostrings_2.64.0
[47] rtracklayer_1.56.0 iterators_1.0.14
[49] xfun_0.30 stringr_1.5.0
[51] ps_1.7.0 lifecycle_1.0.3
[53] ensembldb_2.20.2 restfulr_0.0.14
[55] XML_3.99-0.14 getPass_0.2-2
[57] zlibbioc_1.42.0 scales_1.2.1
[59] BSgenome_1.64.0 VariantAnnotation_1.42.1
[61] ProtGenerics_1.28.0 hms_1.1.3
[63] promises_1.2.0.1 MatrixGenerics_1.8.0
[65] parallel_4.2.0 SummarizedExperiment_1.26.1
[67] AnnotationFilter_1.20.0 RColorBrewer_1.1-3
[69] yaml_2.3.5 curl_5.0.2
[71] memoise_2.0.1 gridExtra_2.3
[73] sass_0.4.1 rpart_4.1.16
[75] latticeExtra_0.6-30 stringi_1.7.12
[77] RSQLite_2.3.1 highr_0.9
[79] BiocIO_1.6.0 foreach_1.5.2
[81] checkmate_2.1.0 GenomicFeatures_1.48.4
[83] filelock_1.0.2 BiocParallel_1.30.3
[85] rlang_1.1.1 pkgconfig_2.0.3
[87] matrixStats_0.62.0 bitops_1.0-7
[89] evaluate_0.15 lattice_0.20-45
[91] htmlwidgets_1.5.4 GenomicAlignments_1.32.0
[93] labeling_0.4.2 bit_4.0.5
[95] processx_3.8.0 tidyselect_1.2.0
[97] magrittr_2.0.3 R6_2.5.1
[99] generics_0.1.3 Hmisc_5.1-0
[101] DelayedArray_0.22.0 DBI_1.1.3
[103] pgenlibr_0.3.6 pillar_1.9.0
[105] whisker_0.4 foreign_0.8-82
[107] withr_2.5.0 KEGGREST_1.36.2
[109] RCurl_1.98-1.7 nnet_7.3-17
[111] tibble_3.2.1 crayon_1.5.2
[113] interp_1.1-4 utf8_1.2.3
[115] BiocFileCache_2.4.0 rmarkdown_2.14
[117] jpeg_0.1-10 progress_1.2.2
[119] data.table_1.14.8 blob_1.2.4
[121] callr_3.7.3 git2r_0.30.1
[123] digest_0.6.33 httpuv_1.6.5
[125] munsell_0.5.0 bslib_0.3.1