从论文到项目：蘑菇书EasyRL学术引用完全指南

2026-02-04 05:21:12作者：董灵辛Dennis

为什么需要规范引用？

学术引用是知识传承与创新的基石，但在强化学习领域，研究者常面临引用混乱的问题：同一算法有多种命名方式（如DQN与Deep Q-Network）、论文与代码版本不匹配、开源项目引用格式不统一等。这些问题不仅影响学术成果的可追溯性，还可能导致重复劳动和方法误解。

以深度强化学习的里程碑算法DQN为例，其原始论文《Playing Atari with Deep Reinforcement Learning》被引用时，常出现遗漏 arXiv 版本号、未标注代码实现版本等问题。通过规范引用，可明确指向[DQN论文解读](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/papers/DQN/Playing Atari with Deep Reinforcement Learning.md?utm_source=gitcode_repo_files)和官方实现，确保研究的可复现性。

论文引用规范

基础引用格式

强化学习论文引用需包含核心三要素：作者、年份、来源（期刊/会议/预印本）。对于有代码实现的论文，还应补充代码链接。以PPO算法为例：

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347.
代码实现: [PPO.ipynb](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/notebooks/PPO.ipynb?utm_source=gitcode_repo_files)

不同类型论文的引用差异

论文类型	引用要点	示例
期刊论文	包含卷(Volume)、期(Issue)、页码	Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
会议论文	注明会议全称及地点	Lillicrap, T. P., et al. (2016). Continuous control with deep reinforcement learning. In International Conference on Machine Learning (pp. 2980-2989).
预印本	必须标注arXiv编号及版本	Haarnoja, T., et al. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290v2.

蘑菇书特色引用格式

对于蘑菇书收录的论文解读，推荐采用"论文标题+解读链接+代码链接"的增强格式：

Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
中文解读: [Double DQN](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/papers/DQN/Deep Reinforcement Learning with Double Q-learning.md?utm_source=gitcode_repo_files)
代码实现: DoubleDQN.ipynb

项目代码引用方法

算法实现引用

蘑菇书提供丰富的Jupyter Notebook实现，引用时需包含文件路径、算法版本和关键参数。以Q-Learning为例：

# 引用自蘑菇书Q-Learning实现
# [QLearning.ipynb](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/notebooks/Q-learning/QLearning.ipynb?utm_source=gitcode_repo_files)
# 参数设置: ε=0.1, α=0.5, γ=0.9
agent = QLearningAgent(actions=action_space, epsilon=0.1, alpha=0.5, gamma=0.9)

环境与工具引用

对于项目中的自定义环境，如赛车环境，引用格式如下：

自定义环境引用自: [racetrack.py](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/notebooks/envs/racetrack.py?utm_source=gitcode_repo_files)
环境参数: 赛道尺寸10x10, 最大速度3, 动作空间{加速,减速,左转,右转}

数据集与模型引用

若使用项目中的预训练模型或数据集，需注明来源路径及使用条件：

实验数据集引用自: [track.txt](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/notebooks/envs/track.txt?utm_source=gitcode_repo_files)
该数据集包含5条赛车赛道的栅格化表示，尺寸为50x50像素

常见引用场景示例

学术论文中的引用

在方法部分引用PPO算法时：

我们采用Proximal Policy Optimization (PPO)算法进行策略优化(Schulman et al., 2017)。具体实现基于蘑菇书的PPO.ipynb，并调整了剪辑参数ε=0.2，折扣因子γ=0.99。网络结构使用3层全连接网络，隐藏层维度为[64,64]，激活函数采用ReLU。

技术报告中的引用

描述Q-Learning在悬崖漫步环境的应用：

2.3 实验设置
- 算法: Q-Learning (Watkins, 1989)
- 实现: [RL_example.py](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/docs/chapter1/RL_example.py?utm_source=gitcode_repo_files)
- 环境: CliffWalking-v0 (网格大小12x4)
- 训练参数:  episodes=500, max_steps=1000
- 评估指标: 平均回报 (100次独立实验的均值±标准差)

教学材料中的引用

讲解强化学习基础概念时：

强化学习智能体通过与环境交互获取延迟奖励(强化学习基础)。如图1所示，智能体在每个时间步t执行动作a_t，环境返回观测o_{t+1}和奖励r_{t+1}。

图1: 强化学习智能体与环境交互示意图(来源)

引用管理工具推荐

引用格式生成工具

GitHub Citation File Format: 为项目添加CITATION.cff文件，支持自动生成引用格式
Zotero插件: 使用Better BibTeX插件，自定义引用键为"EasyRL-算法名-年份"

版本控制与引用追踪

使用Git标签(tag)标记重要版本: git tag -a v1.0 -m "蘑菇书第一版正式发布"
在notebooks/README.md中维护算法实现的更新日志

引用常见问题解答

Q: 如何区分同一算法的不同实现?

A: 通过文件路径和参数配置区分。例如：

DQN.ipynb (原始DQN实现)
DoubleDQN.ipynb (双网络改进版)
DuelingDQN.ipynb (竞争网络结构)

Q: 引用未正式发表的论文解读时需要注意什么?

A: 需注明文档版本和访问日期，例如：

[Rainbow论文解读](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/papers/DQN/Rainbow_Combining Improvements in Deep Reinforcement Learning.md?utm_source=gitcode_repo_files) (版本: 2023-05-10, 访问日期: 2025-10-27)

Q: 如何引用项目中的图片资源?

A: 图片引用需包含路径、图注和页码，例如：

图2: Q-Learning算法的迭代更新流程(来源, p.23)

引用资源汇总

核心算法论文清单

算法	论文引用	中文解读	代码实现
DQN	Mnih et al., 2013	[Playing Atari with Deep RL](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/papers/DQN/Playing Atari with Deep Reinforcement Learning.md?utm_source=gitcode_repo_files)	DQN.ipynb
PPO	Schulman et al., 2017	[Proximal Policy Optimization](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/papers/Policy_gradient/Proximal Policy Optimization Algorithms.md?utm_source=gitcode_repo_files)	PPO.ipynb
SAC	Haarnoja et al., 2018	[Soft Actor-Critic](https://gitcode.com/gh_mirrors/ea/easy-rl/blob/fc4ece6ee54966f7f293f5b071a61a47dda4cb30/papers/Policy_gradient/Soft Actor-Critic_Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.md?utm_source=gitcode_repo_files)	SAC.ipynb