首页
/ ENSG00000187634 chr1 [ 4807614, 4807614] + | ENSG00000187634

ENSG00000187634 chr1 [ 4807614, 4807614] + | ENSG00000187634

2025-06-07 03:44:16作者:羿妍玫Ivan

ENSG00000187634 chr1 [ 4807614, 4807614] + | ENSG00000187634

ENSG00000187961 chr1 [ 4855568, 4855568] + | ENSG00000187961

ENSG00000188290 chr1 [ 4899965, 4899965] + | ENSG00000188290

ENSG00000187608 chr1 [ 4955148, 4955148] + | ENSG00000187608

ENSG00000188157 chr1 [ 5016563, 5016563] + | ENSG00000188157

... ... ... ... . ...

ENSG00000131591 chr1 [24894508, 24894508] - | ENSG00000131591

ENSG00000177700 chr1 [24924056, 24924056] - | ENSG00000177700

ENSG00000131584 chr1 [24925042, 24925042] - | ENSG00000131584

ENSG00000177757 chr1 [24925345, 24925345] - | ENSG00000177757

ENSG00000131586 chr1 [24925405, 24925405] - | ENSG00000131586

-------

seqinfo: 1 sequence from an unspecified genome; no seqlengths


We normalize chromatin states in ESC and lung to the TSS of genes with
bivalent states in ESC.


```r
mat_states_esc = normalizeToMatrix(states, tss_biv, value_column = "states_simplified")
mat_states_lung = normalizeToMatrix(states_lung, tss_biv, value_column = "states_simplified")

We also normalize methylation in ESC and lung to the same TSS.

mat_meth_esc = normalizeToMatrix(meth, tss_biv, value_column = "E003", mean_mode = "absolute",
	smooth = TRUE)
mat_meth_lung = normalizeToMatrix(meth, tss_biv, value_column = "E096", mean_mode = "absolute",
	smooth = TRUE)

We apply k-means clustering on the chromatin states in ESC (1kb upstream and downstream of TSS) to separate genes with bivalent TSS into two groups.

split = kmeans(mat_states_esc[, 40:60], centers = 2)$cluster

Now we make the heatmap list. The order of heatmaps are: chromatin states in ESC, chromatin states in lung, methylation in ESC and methylation in lung. Expression in ESC and lung are also added to the right side of the heatmap list.

expr_esc = expr[names(tss_biv), "E003"]
expr_lung = expr[names(tss_biv), "E096"]
ht_list = EnrichedHeatmap(mat_states_esc, name = "states_esc", col = states_col, 
	row_split = split, cluster_rows = TRUE,
	top_annotation = HeatmapAnnotation(enrich = anno_enriched(gp = gpar(lty = 1:2)))) +
EnrichedHeatmap(mat_states_lung, name = "states_lung", col = states_col, 
	top_annotation = HeatmapAnnotation(enrich = anno_enriched(gp = gpar(lty = 1:2)))) +
EnrichedHeatmap(mat_meth_esc, name = "meth_esc", col = meth_col_fun,
	top_annotation = HeatmapAnnotation(enrich = anno_enriched(gp = gpar(lty = 1:2)))) +
EnrichedHeatmap(mat_meth_lung, name = "meth_lung", col = meth_col_fun,
	top_annotation = HeatmapAnnotation(enrich = anno_enriched(gp = gpar(lty = 1:2)))) +
Heatmap(log2(expr_esc + 1), name = "expr_esc", show_row_names = FALSE, width = unit(5, "mm"),
	col = colorRamp2(c(0, 5), c("white", "red"))) +
Heatmap(log2(expr_lung + 1), name = "expr_lung", show_row_names = FALSE, width = unit(5, "mm"),
	col = colorRamp2(c(0, 5), c("white", "red")))
draw(ht_list, ht_gap = unit(8, "mm"))

plot of chunk tssbiv_heatmap

From the heatmap, we can see in cluster 1, the bivalent TSS in ESC are transited to active states in lung while in cluster 2, the bivalent TSS in ESC are transited to repressive states in lung. Also in cluster 1, the methylation is low in both ESC and lung while in cluster 2, the methylation is high in lung. The expression in cluster 1 is higher than in cluster 2 in lung.

Session info

sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
## [10] base     
## 
## other attached packages:
##  [1] circlize_0.4.1            EnrichedHeatmap_1.7.3     data.table_1.10.4        
##  [4] GenomicFeatures_1.28.3    AnnotationDbi_1.38.1      Biobase_2.36.2           
##  [7] GenomicRanges_1.28.3      GenomeInfoDb_1.12.2       IRanges_2.10.2           
## [10] S4Vectors_0.14.3          BiocGenerics_0.22.0       knitr_1.16               
## [13] markdown_0.8              evaluate_0.10             stringr_1.2.0            
## [16] rtracklayer_1.36.4        GenomicAlignments_1.12.1  Rsamtools_1.28.0         
## [19] Biostrings_2.44.1         XVector_0.16.0            SummarizedExperiment_1.6.3
## [22] DelayedArray_0.2.7        matrixStats_0.52.2        BiocParallel_1.10.1      
## [25] BSgenome_1.44.0           rmarkdown_1.6             TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [28] GenomicInfoDb_1.12.1     
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.3-2            rprojroot_1.2               htmlTable_1.9              
##  [4] base64enc_0.1-3             dichromat_2.0-0             rstudioapi_0.6             
##  [7] bit64_0.9-7                 splines_3.4.0               R.methodsS3_1.7.1          
## [10] doParallel_1.0.10           geneplotter_1.54.0          annotate_1.54.0            
## [13] cluster_2.0.6               R.oo_1.21.0                 shiny_1.0.3                
## [16] compiler_3.4.0              httr_1.2.1                  backports_1.1.0            
## [19] assertthat_0.2.0            Matrix_1.2-10               lazyeval_0.2.0             
## [22] htmltools_0.3.6             tools_3.4.0                 gtable_0.2.0               
## [25] glue_1.1.1                  GenomeInfoDbData_0.99.0     reshape2_1.4.2             
## [28] dplyr_0.5.0                 Rcpp_0.12.11                BiocInstaller_1.26.0       
## [31] iterators_1.0.8             xfun_0.1                    XML_3.98-1.7               
## [34] zlibbioc_1.22.0             scales_0.4.1                VariantAnnotation_1.22.1   
## [37] hms_0.3                     yaml_2.1.14                 memoise_1.1.0              
## [40] gridExtra_2.2.1             ggplot2_2.2.1               rpart_4.1-11               
## [43] latticeExtra_0.6-28         stringi_1.1.5               RSQLite_1.1-2              
## [46] highr_0.6                   foreach_1.4.3               checkmate_1.8.2            
## [49] caTools_1.17.1              BiocStyle_2.4.0             rlang_0.1.1                
## [52] pkgconfig_2.0.1             bitops_1.0-6                matrixcalc_1.0-3           
## [55] lattice_0.20-35             purrr_0.2.2.2               htmlwidgets_0.8            
## [58] bit_1.1-12                  tidyselect_0.2.0            plyr_1.8.4                 
## [61] magrittr_1.5                R6_2.2.1                    DBI_0.6-1                  
## [64] pillar_1.0.1                foreign_0.8-68              survival_2.41-3            
## [67] RCurl_1.95-4.8              tibble_1.3.3                rjson_0.2.15               
## [70] GetoptLong_0.1.6            digest_0.6.12               xtable_1.8-2               
## [73] tidyr_0.6.3                 R.utils_2.5.0               munsell_0.4.3              
## [76] viridisLite_0.2.0

引言

在基因组学研究中,我们经常需要分析不同类型的基因组信号在特定基因组特征(如转录起始位点TSS、基因体等)周围的富集模式。EnrichedHeatmap包提供了一种强大的方法来可视化这些富集模式。本文将重点介绍如何使用EnrichedHeatmap处理和分析分类(categorical)基因组信号,特别是染色质状态数据。

染色质状态简介

染色质状态是通过整合多种表观遗传标记(如组蛋白修饰)来定义的基因组区域分类。ChromHMM等工具可以将基因组划分为不同的功能状态,如:

  • 活跃转录起始位点(TssActive)
  • 转录区域(Transcript)
  • 增强子区域(Enhancer)
  • 异染色质(Heterochromatin)
  • 双价状态(TssBivalent)
  • 抑制状态(Repressive)
  • 静息状态(Quiescent)

这些分类数据为我们理解基因组功能提供了重要线索。

数据准备

首先我们需要加载必要的R包并准备数据:

library(GenomicRanges)
library(data.table)
library(EnrichedHeatmap)
library(circlize)

从Roadmap项目中获取染色质状态数据后,我们可以将其转换为GRanges对象:

states_bed = fread("染色质状态数据文件路径")
states = GRanges(seqnames = states_bed[[1]], 
                ranges = IRanges(states_bed[[2]] + 1, states_bed[[3]]), 
                states = states_bed[[4]])

为了简化分析,我们可以将相似的染色质状态进行合并:

state_mapping = c(
    "1_TssA" = "TssActive",
    "2_TssAFlnk" = "TssActive",
    # 其他状态映射...
)
states$simplified_states = state_mapping[states$states]

基本可视化

转录起始位点分析

首先我们提取基因的TSS区域:

library(GenomicFeatures)
txdb = loadDb("转录组数据库路径")
genes = genes(txdb)
tss = promoters(genes, upstream = 0, downstream = 1)

然后我们将染色质状态信号标准化到TSS周围:

mat_states = normalizeToMatrix(states, tss, value_column = "simplified_states")

使用EnrichedHeatmap进行可视化:

state_colors = c(
    TssActive = "red",
    Transcript = "green",
    # 其他状态颜色...
)
EnrichedHeatmap(mat_states, name = "states", col = state_colors, cluster_rows = TRUE)

基因体分析

我们也可以分析染色质状态在基因体上的分布:

mat_gene_body = normalizeToMatrix(states, genes, value_column = "simplified_states")
EnrichedHeatmap(mat_gene_body, name = "states", col = state_colors) +
rowAnnotation(gene_len = anno_points(log10(width(genes) + 1))

高级分析:整合多组学数据

结合DNA甲基化和基因表达

为了更全面地理解染色质状态的功能意义,我们可以整合DNA甲基化和基因表达数据:

# 标准化甲基化数据
mat_meth = normalizeToMatrix(meth_data, tss, value_column = "sample1")

# 创建热图列表
ht_list = EnrichedHeatmap(mat_states, name = "states") +
          EnrichedHeatmap(mat_meth, name = "methylation") +
          Heatmap(log2(expr_data + 1), name = "expression")
draw(ht_list)

双价TSS状态分析

在胚胎干细胞中,双价TSS状态(同时具有活跃和抑制标记)是一个重要特征。我们可以分析这些状态在分化过程中的变化:

# 识别具有双价状态的TSS
mat_bivalent = normalizeToMatrix(states[states$simplified_states == "TssBivalent"], tss)
bivalent_tss = tss[rowSums(mat_bivalent[, 40:60]) > 0]  # 1kb窗口内的双价状态

# 比较不同细胞类型
mat_states_esc = normalizeToMatrix(esc_states, bivalent_tss)
mat_states_diff = normalizeToMatrix(diff_states, bivalent_tss)

# 可视化比较
ht_list = EnrichedHeatmap(mat_states_esc, name = "ESC states") +
          EnrichedHeatmap(mat_states_diff, name = "Differentiated states")
draw(ht_list)

实用技巧

  1. 行排序优化:通过将分类变量转换为因子并指定水平顺序,可以控制热图中行的排序方式。

  2. 部分聚类:可以只对特定区域(如TSS附近1kb)进行聚类,增强关键模式的识别。

  3. 状态转换分析:使用弦图可视化染色质状态在不同条件下的转换情况。

  4. 多组学整合:结合染色质状态、DNA甲基化、基因表达等多组学数据,获得更全面的生物学见解。

结论

EnrichedHeatmap为分析分类基因组信号提供了强大而灵活的工具。通过本文介绍的方法,研究人员可以:

  1. 直观地可视化染色质状态在基因组特征周围的分布模式
  2. 识别不同功能状态的基因群体
  3. 分析发育或疾病过程中染色质状态的动态变化
  4. 整合多组学数据揭示更复杂的基因调控机制

这些分析对于理解基因组功能调控和表观遗传机制具有重要意义。

登录后查看全文
热门项目推荐

热门内容推荐

最新内容推荐

项目优选

收起
openHiTLS-examplesopenHiTLS-examples
本仓将为广大高校开发者提供开源实践和创新开发平台,收集和展示openHiTLS示例代码及创新应用,欢迎大家投稿,让全世界看到您的精巧密码实现设计,也让更多人通过您的优秀成果,理解、喜爱上密码技术。
C
53
468
kernelkernel
deepin linux kernel
C
22
5
nop-entropynop-entropy
Nop Platform 2.0是基于可逆计算理论实现的采用面向语言编程范式的新一代低代码开发平台,包含基于全新原理从零开始研发的GraphQL引擎、ORM引擎、工作流引擎、报表引擎、规则引擎、批处理引引擎等完整设计。nop-entropy是它的后端部分,采用java语言实现,可选择集成Spring框架或者Quarkus框架。中小企业可以免费商用
Java
7
0
RuoYi-Vue3RuoYi-Vue3
🎉 (RuoYi)官方仓库 基于SpringBoot,Spring Security,JWT,Vue3 & Vite、Element Plus 的前后端分离权限管理系统
Vue
878
517
Cangjie-ExamplesCangjie-Examples
本仓将收集和展示高质量的仓颉示例代码,欢迎大家投稿,让全世界看到您的妙趣设计,也让更多人通过您的编码理解和喜爱仓颉语言。
Cangjie
336
1.1 K
ohos_react_nativeohos_react_native
React Native鸿蒙化仓库
C++
180
264
cjoycjoy
一个高性能、可扩展、轻量、省心的仓颉Web框架。Rest, 宏路由,Json, 中间件,参数绑定与校验,文件上传下载,MCP......
Cangjie
87
14
CangjieCommunityCangjieCommunity
为仓颉编程语言开发者打造活跃、开放、高质量的社区环境
Markdown
1.08 K
0
openHiTLSopenHiTLS
旨在打造算法先进、性能卓越、高效敏捷、安全可靠的密码套件,通过轻量级、可剪裁的软件技术架构满足各行业不同场景的多样化要求,让密码技术应用更简单,同时探索后量子等先进算法创新实践,构建密码前沿技术底座!
C
349
381
cherry-studiocherry-studio
🍒 Cherry Studio 是一款支持多个 LLM 提供商的桌面客户端
TypeScript
612
60