Apache OpenWhisk 分布式部署问题排查指南

2025-06-01 14:21:42作者：申梦珏Efrain

问题背景

在部署Apache OpenWhisk分布式环境时，用户遇到了无法成功调用hello动作的问题。系统部署在两台Linux主机上，分别作为主节点(192.168.35.5)和调用节点(192.168.35.8)。尽管部署步骤全部完成，但在尝试调用动作时出现了失败。

关键错误现象

在调用hello动作时返回错误：error: Unable to invoke action 'hello': There was an internal server error. (code 716aef8948ee83567c132939f405d2fc)
调用节点日志中出现关键错误：cannot create test action for invoker health because runtime manifest is not valid
控制器日志显示Elasticsearch连接问题：org.apache.http.ConnectionClosedException: Connection closed

根本原因分析

经过深入排查，发现问题主要由两个因素导致：

运行时清单配置不完整：系统默认使用Node.js运行时来执行调用节点的健康检查动作，但在用户的运行时清单(runtimes.json)中只配置了Python运行时，缺少Node.js运行时配置。
Elasticsearch连接问题：虽然Elasticsearch容器已成功启动，但系统与其的连接不稳定，导致激活记录存储失败。

解决方案

运行时清单配置修正

正确的运行时清单应包含Node.js运行时配置，示例如下：

{
  "runtimes": {
    "nodejs": [
      {
        "kind": "nodejs:20",
        "default": true,
        "image": {
          "prefix": "openwhisk",
          "name": "action-nodejs-v20",
          "tag": "nightly"
        },
        "deprecated": false,
        "attached": {
          "attachmentName": "codefile",
          "attachmentType": "text/plain"
        }
      }
    ],
    "python": [
      {
        "kind": "python:3.10",
        "default": true,
        "image": {
          "prefix": "openwhisk",
          "name": "action-python-v3.10",
          "tag": "nightly"
        },
        "deprecated": false,
        "attached": {
          "attachmentName": "codefile",
          "attachmentType": "text/plain"
        }
      }
    ]
  },
  "blackboxes": [
    {
      "prefix": "openwhisk",
      "name": "dockerskeleton",
      "tag": "nightly"
    }
  ]
}