Identifying transcripts that can form double strand RNAs¶

Three scenarios that meet our interest in the formation of dsRNAs in cytosol:

  • antisense transcripts paired with the exons of a gene in the opposite strand
  • anti-sense transcripts paired with exons of nearby genes
  • paired genes with overlapped exons

Review of antisense transcription: https://www.nature.com/articles/nrm2738

  • The most prominent form of antisense transcription is AS-noncoding paired with protein coding transcripts.

Reich et al. 2019 mapping the dsRNA world The observation of ADAR editing sites in an endogenous RNA is proof that the RNA is double-stranded in vivo, a gold standard used not only in determining dsRNAomes, but also as proof for specific dsRNA structures (Sijen and Plasterk 2003).

Procedure¶

Given a list of query genes and a GRange object for GENCODE genomic annotations,

  1. partition the GRange object into two subsets
    • a subset with transcripts of query genes
    • a subset with exons and UTR of all genes
  2. identify any overlapping pair out of the two subset GRange objects
    • 50 base pairs as the the minimum length of base pairing
  3. filter out any pairs located on the same strand

Validate with labeled antisense transcripts from previous GENCODE release¶

523 out of 814 (64%) antisense genes found in the current GENCODE release were identified to span exons or UTR regions of any genes on the opposite strand.

A data.frame: 1 × 3
matchedASknownASprop
<int><int><dbl>
5238140.64

Validate with the three pairs of cis-NATs tested in Qin Li's nature paper¶

Three pairs of cis-NATs in the paper:

  • CTSA:PLTP
  • TNFRSF14:TNFRSF14-AS1
  • HBP1:COG5

igv_panel_CTSA

Validate with reported antisense transcripts and their targets in the review paper¶

A grouped_df: 10 × 3
queryTranscript.gene_nameref.gene_nametargets.gene_name
<chr><chr><chr>
PINK1PINK1-AS naPINK1
SIX3 SIX3-AS1 Six3OS
VAX2 ENSG00000258881Vax2OS
VAX2 ATP6V1B1-AS1 Vax2OS
ZEB2 ZEB2-AS1 Zeb2 NAT
MKRN2MKRN2OS RAF1
MKRN2RAF1 RAF1
SMAD5SMAD5-AS1 DAMS
CDYL CDYL-AS1 CDYL-AS
EMX2 EMX2OS EMX2OS

The list of reported antisense-target pairs have different gene names, so I don't have a good way to compare the results yet.

'19 out of 32 genes in GENCODE database were found to form dsRNAs.'

Testing on TWAS prioritized genes with two approaches¶

  1. Base-pairing of two transcripts on opposite strands (32 TWAS genes nearby)
  2. A single transcript predicted to fold into double stranded structure (9 TWAS genes nearby)

Regardless of traits, 82 of all the prioritized genes from TWAS were identified to potentially form dsRNAs. The top TWAS gene MAPKAPK5-AS1 was found in the list.

Examine whether dsRNA transcripts have m6A peaks nearby?¶

In [9]:
#m6A peak length distribution
quantile(as.numeric(peaks$end) - as.numeric(peaks$start))
0%
49
25%
248
50%
491
75%
1777
100%
85731
[1] "82 TWAS genes nearby dsRNAs predicted with antisense approach:"
  1. 'EIF2B2'
  2. 'ERCC1'
  3. 'FDPS'
  4. 'TRAF3IP3'
  5. 'ADCY7'
  6. 'AKAP8'
  7. 'AP4B1'
  8. 'TXNDC11'
  9. 'THAP3'
  10. 'GNL3'
  11. 'NEK4'
  12. 'NDUFA2'
  13. 'ATG13'
  14. 'PSKH1'
  15. 'ACADVL'
  16. 'RAI1'
  17. 'IRF3'
  18. 'DIS3L2'
  19. 'NCAPG'
  20. 'TRAM2'
  21. 'GNA12'
  22. 'TES'
  23. 'HNRNPK'
  24. 'DIABLO'
  25. 'MAX'
  26. 'ZFYVE26'
  27. 'ZNF839'
  28. 'AKTIP'
  29. 'DHX38'
  30. 'SPG7'
  31. 'CEP250'
  32. 'MADD'
  33. 'MAPKAPK5-AS1'
  34. 'TRAF4'
  35. 'ERAL1'
  36. 'LDLR'
  37. 'VGLL4'
  38. 'PCSK7'
  39. 'DENND3'
  40. 'WAC-AS1'
  41. 'DDX55'
  42. 'RABEP1'
  43. 'SLC9A3R1'
  44. 'FIGNL1'
  45. 'SENP1'
  46. 'MDM2'
  47. 'CYB5D1'
  48. 'PFAS'
  49. 'CALR'
  50. 'HM13'
  51. 'BET1L'
  52. 'MVK'
  53. 'RBM23'
  54. 'SLC7A5'
  55. 'TAOK1'
  56. 'GTPBP2'
  57. 'LAMTOR4'
  58. 'MEPCE'
  59. 'WDR5'
  60. 'FERMT3'
  61. 'C2CD2L'
  62. 'DDHD1'
  63. 'SGSM2'
  64. 'GIT1'
  65. 'CPSF4'
  66. 'BCL2L2'
  67. 'ARIH2'
  68. 'ZER1'
  69. 'SMUG1'
  70. 'PPP2R3C'
  71. 'FBXO34'
  72. 'CRTC3'
  73. 'RNF40'
  74. 'AKAP13'
  75. 'CCDC51'
  76. 'CD151'
  77. 'FANCA'
  78. 'ITGB2'
  79. 'PRPF38A'
  80. 'TRIP12'
  81. 'ASCC1'
  82. 'TUG1'
[1] "9 TWAS genes nearby dsRNA predicted with editing clusters:"
  1. 'KAT8'
  2. 'ICOSLG'
  3. 'GMIP'
  4. 'WAC-AS1'
  5. 'RBM23'
  6. 'ARIH2'
  7. 'FANCA'
  8. 'LAMTOR4'
  9. 'NSUN4'

Directional effects of m6A QTL on targeted genes that have dsRNAs nearby¶

By only requiring overlapping of m6A sites with any transcript that has antisense genes, more TWAS genes were identified, but the biased effects of m6A QTLs are gone across immune traits.

In [62]:
twas_dsRNA[, "trait_type"] <- "blood"
twas_dsRNA$trait_type[twas_dsRNA$trait %in% other_traits]<- "non-blood"
ggplot(data.frame(twas_dsRNA), aes(x = m6aQTL_Z, y = gwas_Z)) +

geom_point() + facet_grid(.~trait_type) +
theme(axis.text=element_text(size=16), 
                axis.title=element_text(size=16),
                strip.text.x=element_text(size = 18, face="bold"),
                panel.background = element_rect(fill="white"), 
                panel.spacing.x = unit(2, "lines"),
                panel.spacing.y = unit(1, "lines"),
                panel.grid.major.y=element_line(colour="grey", linetype = "dotted"),
                panel.grid.major.x=element_line(colour="grey", linetype = "dotted"),
                panel.border = element_rect(fill = NA, 
                                            colour = "black"))
In [552]:
twas_sub<-twas_dsRNA[twas_dsRNA$trait_type == "immune", ]
model <-lm(twas_sub$gwas_Z ~ twas_sub$m6aQTL_Z)
summary(model)
Call:
lm(formula = twas_sub$gwas_Z ~ twas_sub$m6aQTL_Z)

Residuals:
   Min     1Q Median     3Q    Max 
-9.491 -4.359  2.479  4.657 12.854 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)         0.7041     1.7989   0.391    0.702
twas_sub$m6aQTL_Z   0.3802     0.4104   0.926    0.371

Residual standard error: 6.957 on 13 degrees of freedom
Multiple R-squared:  0.06193,	Adjusted R-squared:  -0.01023 
F-statistic: 0.8582 on 1 and 13 DF,  p-value: 0.3711

Repeat the analysis with genes that show evidence of colocalization¶

Overlapped distances between m6A peaks and dsRNAs

We consider the flanking regions of m6A peaks, which is +/-250 bp from the two ends of the peaks.

Enrichment for dsRNA genes among TWAS prioritized genes¶

[1] "The total number prioritized TWAS genes potentially form dsRNAs:"
82
[1] "Proportion of all the genes that can form dsRNAs: "
0.32
A data.frame: 6 × 3
num_dsRNAnum_TWAS_genesenrichment
<dbl><dbl><dbl>
Allergy0 70.00000
Allergy Eczema5200.78125
Alzheimers0 10.00000
Asthma0 20.00000
Autoimmune Traits (Sure)0 10.00000
Bipolar Disorder2 23.12500

Length distribution for dsRNA genes¶

In [307]:
quantile(resultsTWAS$queryL)
hist(resultsTWAS$queryL, main = "", breaks=20, 
     xlab = "Gene lengths for all genes with overlapped targets",
     cex = 1.5, cex.axis = 1.5, cex.lab = 1.5)
0%
1922
25%
14182
50%
34798
75%
67562
100%
383106

An alternative approach for dsRNA predictions¶

Using genes associated with predicted dsRNAs from the review paper¶

Mapping the dsRNA world

Reich et al. 2019

The review paper defines enriched editing regions (EER) as the window that contains clusters of editing sites. This indicates the presence of long, highly base-paired dsRNA. With their EER-detecting pipelines, around 3400 EER transcripts were identified using RNA-seq data from human blood monocytes. They also validated those regions with RNA secondary structure prediction algorithms to confirm whether they are more structured than random controls. When filtered by predicted folding free energy per nucleotide to be negative, 2759 genes remain to be associated with EERs.

Distribution for the predicted folding free energy per nucleotide¶

In [12]:
head(overlaps)
DataFrame with 6 rows and 9 columns
                         gr        gene   m6aQTL_ID  m6aQTL_Z
                  <GRanges> <character> <character> <numeric>
1 chr16:31128534-31138618:+        KAT8   rs4527034      5.72
2 chr16:31128534-31138618:+        KAT8   rs4527034      5.72
3 chr21:45642428-45647025:-      ICOSLG   rs2070554     -3.48
4 chr21:45642428-45647025:-      ICOSLG   rs2070554     -3.48
5 chr19:19739832-19745177:-        GMIP    rs873870     -4.58
6 chr19:19739832-19745177:-        GMIP    rs873870     -4.58
                   trait     gwas_ID    gwas_Z                 gr.double
             <character> <character> <numeric>                 <GRanges>
1        Body Mass Index    rs749767    -6.079 chr16:31132570-31135107:+
2        Body Mass Index    rs749767    -6.079 chr16:31127726-31128688:-
3 Inflammatory Bowel D..   rs2838520   -10.110 chr21:45643735-45644567:-
4   Rheumatoid Arthritis   rs4819388    -5.050 chr21:45643735-45644567:-
5 Low Density Lipoprot..  rs16996148   -14.150 chr19:19741468-19744826:-
6      Total Cholesterol  rs16996148   -16.690 chr19:19741468-19744826:-
    gene_name
  <character>
1        KAT8
2           .
3      ICOSLG
4      ICOSLG
5       LPAR2
6       LPAR2

Enrichment of TWAS genes predicted to form dsRNA¶

A summary table for TWAS prioritized genes with at least 2 fold of enrichment

A data.frame: 46 × 2
traitdsRNA_genes
<chr><chr>
1Body Mass Index TRUB2
2Body Mass Index KAT8
5Rheumatoid Arthritis RUFY3
6Rheumatoid Arthritis TXNDC11
7Rheumatoid Arthritis ICOSLG
15Crohn's Disease GON4L
16Crohn's Disease ICOSLG
17High Density LipoproteinTGOLN2
18High Density LipoproteinBAZ1B
20Triglycerides BAZ1B
21Triglycerides PCSK7
22Triglycerides KAT8
23White Blood Cell Count THEMIS2
24White Blood Cell Count GNA12
25White Blood Cell Count DENND3
26White Blood Cell Count WAC-AS1
27White Blood Cell Count DDX55
28White Blood Cell Count PHF11
29White Blood Cell Count RABEP1
30White Blood Cell Count SLC9A3R1
35Hemoglobin ConcentrationGTPBP2
36Hemoglobin ConcentrationZNF839
37Hemoglobin ConcentrationSLC7A5
38Hemoglobin ConcentrationSMG9
39Platelet Count THEMIS2
40Platelet Count GNA12
41Platelet Count OGDH
42Platelet Count BAZ1B
43Platelet Count LAMTOR4
44Platelet Count MEPCE
45Platelet Count FERMT3
46Platelet Count NEAT1
47Platelet Count ZNF839
48Platelet Count HM13
57Granulocyte Count TGOLN2
58Granulocyte Count DENND3
59Granulocyte Count WAC-AS1
60Granulocyte Count RABEP1
61Granulocyte Count SLC9A3R1
62Neutrophil Count TGOLN2
63Neutrophil Count DENND3
64Neutrophil Count WAC-AS1
65Neutrophil Count RABEP1
76Menopause Age NSUN4
77Menopause Age MEPCE
78Menopause Age RABEP1

Directional effects of m6A QTL on genes associated with dsRNA¶

No clear patterns of directional effects were found with m6A QTLs when comparing dsRNA-associate TWAS genes and the rest.

In [358]:
colnames(twas)
  1. 'Trait'
  2. 'GENE'
  3. 'm6A.PEAK.ID'
  4. 'CHR'
  5. 'HSQ'
  6. 'BEST.GWAS.ID'
  7. 'BEST.GWAS.Z'
  8. 'm6AQTL.ID'
  9. 'm6AQTL.R2'
  10. 'm6AQTL.Z'
  11. 'm6AQTL.GWAS.Z'
  12. 'N.SNP'
  13. 'MODEL'
  14. 'MODELCV.R2'
  15. 'MODELCV.PV'
  16. 'TWAS.Z'
  17. 'TWAS.P'
  18. 'TWAS.P.Bonferroni'
  19. 'dsRNA'