OmniGen 使用教程

2026-01-30 04:41:06作者：鲍丁臣Ursa

1. 项目介绍

OmniGen 是一个统一图像生成模型，能够从多模态提示生成广泛的图像。它被设计得简单、灵活且易于使用。OmniGen 不需要额外的网络模块（如 ControlNet、IP-Adapter、Reference-Net 等）和预处理步骤（例如人脸检测、姿态估计、裁剪等）来生成满意图像。它的目标是实现一个简单灵活的图像生成范式，即通过任意的多模态指令直接生成各种图像，而无需额外的插件和操作。

2. 项目快速启动

首先，您需要克隆项目仓库到本地环境：

git clone https://github.com/VectorSpaceLab/OmniGen.git
cd OmniGen
pip install -e .

您也可以创建一个新的环境来避免冲突：

# 创建一个 python 3.10.13 的 conda 环境（您也可以使用 virtualenv）
conda create -n omnigen python=3.10.13
conda activate omnigen

# 根据您的 CUDA 版本安装 pytorch，例如：
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/VectorSpaceLab/OmniGen.git
cd OmniGen
pip install -e .

然后，您可以使用以下代码来快速启动 OmniGen：

from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

# 文本到图像
images = pipe(
    prompt="一个卷发男人穿着红衬衫正在喝茶。",
    height=1024,
    width=1024,
    guidance_scale=2.5,
    seed=0
)
images[0].save("example_t2i.png")

# 多模态到图像
# 在提示中，我们使用占位符来表示图像。图像占位符应该是 <img><|image_*|></img> 的格式。
# 您可以在 input_images 中添加多个图像。请确保每个图像都有其占位符。例如，对于列表 input_images = [img1_path, img2_path]，提示中需要有两位占位符： <img><|image_1|></img>，<img><|image_2|></img>。
images = pipe(
    prompt="一个男人穿着黑衬衫正在读书。这个男人是 <img><|image_1|></img> 右边的人。",
    input_images=["./imgs/test_cases/two_man.jpg"],
    height=1024,
    width=1024,
    guidance_scale=2.5,
    img_guidance_scale=1.6,
    seed=0
)
images[0].save("example_ti2i.png")

3. 应用案例和最佳实践

文本到图像生成

您可以输入一段描述性文本，OmniGen 会根据您的描述生成相应的图像。

prompt = "一个美丽的海滩，夕阳下的浪漫氛围，高清，印象派风格。"
image = pipe(prompt=prompt, height=768, width=1024)
image.save("beach_sunset.png")

图像编辑

您也可以使用 OmniGen 来编辑现有图像，例如改变图像中人物的姿态或者表情。

# 假设您有一张人物图像，并且想要改变其姿态
prompt = "一个在海滩上冲浪的年轻人，动感十足。"
edited_image = pipe(prompt=prompt, input_images=["path/to/young_man.jpg"])
edited_image.save("surfing_young_man.png")