Changedetection.io 数据库损坏问题分析与修复指南

2025-05-08 21:24:48作者：卓艾滢Kingsley

The best and simplest free open source website change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification

项目地址：https://gitcode.com/GitHub_Trending/ch/changedetection.io

问题现象

在使用Changedetection.io进行网页变更监控时，用户遇到Web界面无法访问的问题，浏览器返回500内部服务器错误。日志显示关键错误为ValueError: invalid literal for int() with base 10，表明系统尝试将一个包含大量空字符(\x00)的无效字符串转换为整数时失败。

根本原因分析

该问题通常发生在以下场景：

系统异常崩溃导致数据写入不完整
存储设备出现故障或I/O错误
容器非正常关闭造成数据损坏
数据库文件被其他进程意外修改

具体到Changedetection.io的实现机制，系统使用JSON文件(url-watches.json)存储监控配置和历史记录。当文件中某个监控项的__newest_history_key字段被写入大量空字符而非有效时间戳时，就会触发这个类型转换异常。

影响范围

Web管理界面完全不可用（HTTP 500错误）
后台监控任务可能仍在运行（用户报告仍能收到Discord通知）
仅影响现有数据存储，新建实例不受影响

解决方案

手动修复步骤

停止服务 确保完全停止Changedetection.io容器/进程，避免写入冲突

备份数据

cp url-watches.json url-watches.json.bak

定位损坏数据 使用文本编辑器或命令行工具查找异常字段：
```
grep -n '__newest_history_key' url-watches.json
```
修复损坏字段 将类似以下内容的异常值：
```
"__newest_history_key": "\x00\x00\x00\x00..."
```
修改为合法时间戳或直接删除该字段：
```
"__newest_history_key": "1234567890"
```

验证JSON格式 使用JSON验证工具确保文件有效性：

python -m json.tool url-watches.json > temp.json && mv temp.json url-watches.json

预防措施

定期备份datastore目录
使用稳定的存储设备
避免直接手动修改数据文件
配置监控系统检查数据完整性

技术原理深入

Changedetection.io使用基于文件系统的轻量级存储方案，其中：

每个监控目标(Watch)对应JSON文件中的一个对象
__newest_history_key存储最后变更时间的Unix时间戳
排序功能依赖此字段进行时间排序

当该字段损坏时，系统尝试执行int()转换失败，进而导致模板渲染中断。这种设计在保证性能的同时，也要求数据必须保持严格的一致性。

高级修复方案（适用于大量数据）

对于包含数万条记录的大型数据库：

使用jq工具批量处理：

jq 'walk(if type == "object" and has("__newest_history_key") 
     then .__newest_history_key |= (tonumber? // empty) 
     else . end)' url-watches.json > repaired.json