Seurat对象中样本标识符(orig.ident)的自定义修改指南

2025-07-02 00:51:58作者：瞿蔚英Wynne

背景介绍

在单细胞RNA测序数据分析中，Seurat是一个广泛使用的R包。当合并多个样本数据时，Seurat会自动创建一个名为orig.ident的元数据列来记录每个细胞所属的原始样本。然而，有时默认的样本命名方式可能不够直观或需要进一步调整以满足分析需求。

问题场景

在实际分析中，我们可能会遇到以下情况：

样本名称包含复杂的前缀(如"4_D_MI2_S2"和"4_MI1_S5")
默认的orig.ident只提取了数字前缀(如"4")
需要更详细的样本区分标识

解决方案

方法一：创建新的元数据列

最推荐的做法是保留原始orig.ident不变，创建一个新的元数据列来存储自定义的样本标识：

# 创建新的元数据列
seurat.obj$sample_label <- NA

# 根据原始标识符设置新标签
seurat.obj$sample_label[grepl("1_", seurat.obj$orig.ident)] <- "1"
seurat.obj$sample_label[grepl("2_", seurat.obj$orig.ident)] <- "2"
seurat.obj$sample_label[grepl("3_C_", seurat.obj$orig.ident)] <- "3C"
seurat.obj$sample_label[grepl("4_D_", seurat.obj$orig.ident)] <- "4D"
seurat.obj$sample_label[grepl("4_", seurat.obj$orig.ident)] <- "4"  # 注意顺序，更具体的模式要放在前面
seurat.obj$sample_label[grepl("5_", seurat.obj$orig.ident)] <- "5"
seurat.obj$sample_label[grepl("7_", seurat.obj$orig.ident)] <- "7"

方法二：直接修改orig.ident

虽然不推荐，但也可以直接修改orig.ident列：

seurat.obj$orig.ident <- ifelse(grepl("4_D_", seurat.obj$orig.ident), "4D",
                               ifelse(grepl("4_", seurat.obj$orig.ident), "4",
                                      as.character(seurat.obj$orig.ident)))

可视化应用

创建自定义标签后，可以在可视化时使用新的样本标识：

DimPlot(seurat.obj, reduction = "umap", group.by = "sample_label", label = TRUE)

最佳实践建议

保留原始数据：始终保留原始的orig.ident列作为参考
清晰的命名：新创建的样本标签应该直观且易于理解
顺序很重要：在设置条件时，更具体的模式应该放在前面
文档记录：记录所有样本重命名的对应关系，便于后续分析

进阶技巧

对于更复杂的样本命名模式，可以考虑使用正则表达式提取特定部分：

# 提取样本名称中的特定部分作为新标签
seurat.obj$sample_label <- gsub("^(\\d+).*", "\\1", seurat.obj$orig.ident)

通过这种方式，我们可以灵活地调整样本标识符，使其更符合分析需求，同时保持数据的完整性和可追溯性。

seurat

R toolkit for single cell genomics

项目地址：https://gitcode.com/gh_mirrors/se/seurat

登录后查看全文