LAMP: 少样本视频生成的最佳实践

2025-05-20 23:58:13作者：田桥桑Industrious

1. 项目介绍

LAMP（Learn A Motion Pattern）是一个基于少样本学习的视频生成方法。它通过学习少量的视频样本，能够生成具有一致运动模式的新视频。该项目适用于文本到视频的生成，只需要8到16个视频样本和一块显存大于15GB的GPU即可进行训练。LAMP的官方实现提供了在CVPR 2024上发表的论文的代码，并且支持视频编辑功能。

2. 项目快速启动

环境准备

操作系统：Ubuntu 18.04以上版本
CUDA版本：11.3
Python版本：3.8

克隆代码库

git clone https://github.com/RQ-Wu/LAMP.git
cd LAMP

创建虚拟环境

conda create -n LAMP python=3.8
conda activate LAMP

安装依赖

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
pip install xformers==0.0.13

获取预训练权重和数据

从Hugging Face下载预训练的T2I扩散模型，并放置在./checkpoints目录下。收集视频数据，建议网站包括pexels和frozen-in-time，将视频文件放置在./training_videos/[motion_name]/目录下。

训练模型

CUDA_VISIBLE_DEVICES=X accelerate launch train_lamp.py config="configs/horse-run.yaml"

推断生成视频

python inference_script.py --weight ./my_weight/turn_to_smile/unet --pretrain_weight ./checkpoints/stable-diffusion-v1-4 --first_frame_path ./benchmark/turn_to_smile/head_photo_of_a_cute_girl,_comic_style.png --prompt "head photo of a cute girl, comic style, turns to smile"