Apache Druid 集成 Pure Storage S3 存储的解决方案

2025-05-16 16:59:04作者：冯爽妲Honey

背景介绍

Apache Druid 作为一款高性能的实时分析数据库，通常需要与对象存储服务集成来持久化数据。在实际生产环境中，用户可能会选择不同的对象存储解决方案，其中 Pure Storage FlashBlade 是一种高性能的存储系统，提供了兼容 S3 协议的接口。然而，在将 Druid 与 Pure Storage S3 集成时，用户可能会遇到写入失败的问题。

问题现象

当配置 Druid 使用 Pure Storage 作为后端存储时，系统会出现以下错误：

java.lang.RuntimeException: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: 
A header you provided implies functionality that is not implemented.
(Service: Amazon S3; Status Code: 501; Error Code: NotImplemented)

具体表现为：

能够从 Pure Storage S3 读取数据段(get操作)
能够将数据保存到本地目录
但无法将数据段写入(push操作)到 Pure Storage S3

根本原因分析

这个问题的本质在于 Pure Storage 的 S3 实现与标准 AWS S3 服务存在一些功能差异。错误信息中的"501 Not Implemented"状态码表明，Druid 尝试使用的某些 S3 API 功能在 Pure Storage 的实现中尚未支持。

经过深入分析，问题主要出在 Druid 默认会尝试使用 S3 的访问控制列表(ACL)功能，而 Pure Storage 的 S3 实现可能不完全支持这些 ACL 操作。

解决方案

要解决这个问题，需要在 Druid 的配置中显式禁用 ACL 功能。具体配置如下：

对于主存储：

druid_storage_disableAcl: "true"

对于索引器日志存储：

druid_indexer_logs_disableAcl: "true"

完整配置建议

除了禁用 ACL 外，针对 Pure Storage S3 的完整推荐配置还包括：

druid_storage_type: s3
druid_storage_baseKey: warehouse
druid_storage_bucket: druid
druid_storage_storageDirectory: s3a://druid/warehouse/
druid_indexer_logs_type: s3
druid_indexer_logs_directory: s3a://druid/logs/
druid_indexer_logs_s3Bucket: druid
druid_indexer_logs_s3Prefix: logs
druid_storage_useS3aSchema: "true"
druid_s3_disableChunkedEncoding: "true"
druid_s3_credential: "your-credential"
druid_s3_secret: "your-secret"
druid_s3_protocol: http
druid_s3_enablePathStyleAccess: "true"
druid_s3_endpoint_signingRegion: us-east-1
druid_s3_endpoint_url: http://your-pure-storage-endpoint
druid_s3_forceGlobalBucketAccessEnabled: "true"
druid_storage_disableAcl: "true"
druid_indexer_logs_disableAcl: "true"

技术原理

禁用 ACL 后，Druid 将不再尝试设置对象级别的访问权限，而是依赖存储桶级别的权限控制。这种模式与许多非 AWS 的 S3 兼容存储服务更加匹配，特别是那些不完全实现 S3 ACL 功能的存储系统。

验证与测试

在实际环境中验证该解决方案时，需要注意：

确保 Pure Storage S3 服务正常运行
验证配置中的 endpoint URL 和认证信息正确无误
检查存储桶的权限设置是否允许 Druid 进行读写操作
监控初始数据写入过程，确认没有其他兼容性问题

总结

通过禁用 S3 ACL 功能，Druid 可以成功与 Pure Storage FlashBlade S3 服务集成。这个解决方案不仅适用于 Pure Storage，对于其他不完全兼容标准 S3 API 的对象存储服务也有参考价值。在实际部署时，建议先在小规模环境中验证配置，确认一切正常后再推广到生产环境。

对于企业用户来说，理解不同存储服务的 API 兼容性差异非常重要，这有助于在架构设计阶段就做出合理的技术选型，避免后期集成时遇到类似问题。

druid

Druid是一个高速的数据查询引擎，主要用于OLAP场景。它的特点是快速查询、支持复杂查询语句、易于部署等。适用于数据分析和报告生成场景。

项目地址：https://gitcode.com/gh_mirrors/dru/druid

登录后查看全文