Wazuh Agent在Windows系统重启后日志收集异常问题分析

2025-05-19 18:52:51作者：曹令琨Iris

问题现象

在Windows环境中部署的Wazuh Agent（版本4.11.1）在系统重启后出现了一个典型问题：虽然Agent服务能够正常启动并与服务器建立连接，但日志收集功能却无法正常工作。具体表现为：

系统重启后，Agent服务自动启动并显示已连接状态
但关键的日志收集功能（如Windows Defender事件监控）无法正常工作
手动重启Agent服务后，日志收集功能立即恢复正常
设置服务延迟启动也能解决该问题

问题排查过程

初始发现

技术团队最初注意到Agent在重启后会定期输出"Wazuh Agent will be reconnected because of force reconnect interval"的日志信息，间隔为10分钟。这提示了强制重连间隔的配置问题，但并非根本原因。

深入分析

通过调试日志分析，发现以下关键点：

连接状态正常：Agent确实能够与服务器建立并保持连接
日志收集器异常：logcollector模块在系统重启后未能正常采集事件日志
事件丢失：重启期间发生的事件无法被后续捕获

根本原因

经过技术团队深入分析，确定了两个核心问题：

日志收集器启动时序问题：Windows服务启动时，某些依赖的系统组件可能尚未完全初始化，导致logcollector无法正常采集事件
历史事件处理机制：默认配置下，Agent不会采集服务停止期间发生的事件

解决方案

针对日志收集器启动问题

服务延迟启动：通过设置服务延迟启动，确保系统组件完全初始化后再启动Agent
调试模式分析：建议在local_internal_options.conf中启用windows.debug=1进行详细调试

针对历史事件丢失问题

在ossec.conf配置文件中添加以下配置，使Agent能够采集服务停止期间发生的事件：

<localfile>
  <location>Security</location>
  <log_format>eventchannel</log_format>
  <only-future-events>no</only-future-events>
  <query>Event/System[EventID != 5145 and EventID != 5156 and EventID != 5447 and
    EventID != 4656 and EventID != 4658 and EventID != 4663 and EventID != 4660 and
    EventID != 4670 and EventID != 4690 and EventID != 4703 and EventID != 4907 and
    EventID != 5152 and EventID != 5157]</query>
</localfile>

其他优化建议

禁用强制重连：虽然与核心问题无关，但建议禁用强制重连以避免不必要的连接中断
```
<client>
  <force_reconnect_interval>0</force_reconnect_interval>
</client>
```

完整事件通道配置：确保所有需要监控的事件通道都正确配置

<localfile>
  <location>Application</location>
  <log_format>eventchannel</log_format>
</localfile>

<localfile>
  <location>System</location>
  <log_format>eventchannel</log_format>
</localfile>