EOS任务重复：定时任务执行与异常处理

2026-02-04 04:11:15作者：董灵辛Dennis

This repository features an Energy Optimization System (EOS) that optimizes energy distribution, usage for batteries, heat pumps& household devices. It includes predictive models for electricity prices (planned), load forecasting& dynamic optimization to maximize energy efficiency & minimize costs. Founder Dr. Andreas Schmitz (YouTube @akkudoktor)

项目地址：https://gitcode.com/GitHub_Trending/eos5/EOS

引言

在能源优化系统（Energy Optimization System，EOS）中，定时任务的可靠执行是系统稳定运行的核心保障。EOS作为一个复杂的分布式能源管理系统，需要定期执行缓存清理、能源管理、预测更新等关键任务。然而，在实际部署中，定时任务可能面临网络异常、资源竞争、外部服务不可用等多种挑战，如何确保任务的可靠执行和异常处理成为系统设计的关键问题。

本文将深入探讨EOS中定时任务的执行机制、异常处理策略以及最佳实践，帮助开发者构建更加健壮的能源管理系统。

EOS定时任务架构

核心组件概述

EOS采用基于FastAPI的异步任务调度框架，通过repeat_every装饰器实现周期性任务的执行。系统主要包含以下核心定时任务：

flowchart TD
    A[EOS Server Startup] --> B[Initialize Tasks]
    B --> C[Cache Cleanup Task]
    B --> D[Energy Management Task]
    
    C --> E[Clear Expired Cache]
    E --> F[Log Exception if any]
    
    D --> G[Manage Energy System]
    G --> H[Log Exception if any]
    
    F --> I[Continue Next Cycle]
    H --> I

任务执行机制

EOS使用repeat_every装饰器来定义周期性任务，该装饰器提供丰富的配置选项：

@repeat_every(
    seconds=float(config_eos.cache.cleanup_interval),
    on_exception=cache_cleanup_on_exception,
)
def cache_cleanup_task() -> None:
    """Repeating task to clear cache from expired cache files."""
    logger.debug("Clear cache")
    cache_clear()

异常处理策略

多层级异常捕获

EOS采用三层异常处理机制，确保系统在遇到异常时能够优雅降级：

sequenceDiagram
    participant T as Task
    participant EH as Exception Handler
    participant L as Logger
    participant S as System
    
    T->>T: Execute Task Logic
    T->>EH: Exception Occurred
    EH->>L: Log Error Details
    EH->>S: Continue Normal Operation
    Note right of S: System remains stable

异常处理实现

在src/akkudoktoreos/server/rest/tasks.py中，EOS实现了完善的异常处理机制：

async def _handle_exc(exc: Exception, on_exception: ExcArgNoReturnAnyFuncT | None) -> None:
    """处理任务执行过程中的异常"""
    if on_exception:
        if asyncio.iscoroutinefunction(on_exception):
            await on_exception(exc)
        else:
            await run_in_threadpool(on_exception, exc)

def cache_cleanup_on_exception(e: Exception) -> None:
    """缓存清理任务的异常处理函数"""
    logger.error("Cache cleanup task caught an exception: {}", e, exc_info=True)

def energy_management_on_exception(e: Exception) -> None:
    """能源管理任务的异常处理函数"""
    logger.error("Energy management task caught an exception: {}", e, exc_info=True)

任务配置与管理

配置参数详解

EOS通过配置文件管理定时任务的执行参数：

配置项	默认值	说明	影响范围
`cache.cleanup_interval`	3600秒	缓存清理间隔	系统性能
`ems.startup_delay`	10秒	能源管理启动延迟	系统初始化
`server.startup_eosdash`	True	是否启动仪表板	用户界面

任务生命周期管理

EOS使用FastAPI的lifespan管理器来协调任务的启动和关闭：

@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """应用生命周期管理器"""
    # 启动时初始化任务
    cache_load()
    if config_eos.cache.cleanup_interval is None:
        logger.warning("Cache file cleanup disabled. Set cache.cleanup_interval.")
    else:
        await cache_cleanup_task()
    await energy_management_task()
    
    yield  # 应用正常运行
    
    # 关闭时保存状态
    cache_save()

常见问题与解决方案

1. 任务执行超时

问题现象：任务执行时间超过预期，影响后续任务调度

解决方案：

@repeat_every(
    seconds=300,
    on_exception=handle_timeout,
    max_repetitions=10  # 限制最大重试次数
)
def critical_task() -> None:
    """关键任务带超时保护"""
    try:
        # 设置执行超时
        result = asyncio.wait_for(perform_operation(), timeout=60)
    except asyncio.TimeoutError:
        logger.warning("Task timeout, will retry in next cycle")

2. 资源竞争问题

问题现象：多个任务同时访问共享资源导致冲突

解决方案：

from asyncio import Lock

resource_lock = Lock()

@repeat_every(seconds=60)
async def resource_intensive_task() -> None:
    """资源密集型任务带锁保护"""
    async with resource_lock:
        await access_shared_resource()
        # 确保同一时间只有一个任务访问资源

3. 外部服务依赖

问题现象：任务依赖的外部服务不可用

解决方案：

@repeat_every(seconds=300)
async def external_service_task() -> None:
    """外部服务依赖任务"""
    try:
        if await check_service_availability():
            await call_external_service()
        else:
            logger.warning("External service unavailable, skipping task")
    except Exception as e:
        logger.error(f"External service error: {e}")
        # 可以在这里实现降级策略

监控与日志

任务执行监控

EOS提供完善的日志记录机制，便于监控任务执行状态：

# 在任务装饰器中添加详细日志
@repeat_every(
    seconds=3600,
    logger=logger,  # 使用系统日志器
    on_exception=detailed_exception_handler
)
def monitored_task() -> None:
    """带详细监控的任务"""
    start_time = time.time()
    logger.info("Task started")
    
    # 任务逻辑
    perform_operation()
    
    duration = time.time() - start_time
    logger.info(f"Task completed in {duration:.2f} seconds")

性能指标收集

通过装饰器模式收集任务执行指标：

def metrics_collector(func):
    """任务指标收集装饰器"""
    @wraps(func)
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        success = False
        
        try:
            result = await func(*args, **kwargs)
            success = True
            return result
        finally:
            duration = time.time() - start_time
            # 记录到监控系统
            record_task_metrics(func.__name__, duration, success)
    
    return wrapper

@metrics_collector
@repeat_every(seconds=300)
async def measured_task() -> None:
    """带性能指标的任务"""
    await perform_operation()

最佳实践

1. 任务幂等性设计

确保任务可以安全地重复执行，即使前一次执行未完成：

@repeat_every(seconds=1800)
def idempotent_task() -> None:
    """
    幂等性任务设计
    - 使用事务确保操作原子性
    - 检查执行状态避免重复操作
    - 记录操作标识用于恢复
    """
    if not is_task_already_done():
        with transaction.atomic():
            perform_operation()
            mark_task_done()

2. 优雅降级策略

当系统资源紧张或外部服务不可用时，实现优雅降级：

@repeat_every(seconds=300)
async def graceful_task() -> None:
    """带优雅降级的任务"""
    system_load = get_system_load()
    
    if system_load > CRITICAL_THRESHOLD:
        # 系统负载高，执行简化版本
        await perform_lightweight_operation()
        logger.info("Executed lightweight version due to high load")
    else:
        # 正常执行
        await perform_full_operation()

3. 任务依赖管理

处理任务之间的依赖关系，确保执行顺序：

# 任务依赖管理器
class TaskDependencyManager:
    def __init__(self):
        self.completed_tasks = set()
    
    def mark_completed(self, task_name: str):
        self.completed_tasks.add(task_name)
    
    def is_ready(self, task_name: str, dependencies: List[str]) -> bool:
        return all(dep in self.completed_tasks for dep in dependencies)

# 使用示例
dependency_manager = TaskDependencyManager()

@repeat_every(seconds=3600)
def dependent_task() -> None:
    """有依赖关系的任务"""
    if dependency_manager.is_ready("data_processing", ["data_collection"]):
        process_data()
        dependency_manager.mark_completed("data_processing")