Azure SDK for Python 中处理Blob存储目录的方法解析

2025-06-10 12:05:59作者：凤尚柏Louis

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

项目地址：https://gitcode.com/GitHub_Trending/az/azure-sdk-for-python

在Azure Blob存储服务中，目录(directory)的处理方式是一个常见的技术问题。本文将深入探讨在Azure SDK for Python中如何正确识别和处理Blob存储中的目录结构。

目录概念的本质

首先需要明确的是，Azure Blob存储本身是一个扁平化的命名空间。所谓的"目录"实际上是一种虚拟概念，通过名称中的"/"字符来模拟层次结构。这种设计在标准Blob存储账户(Flat Namespace, FNS)和分层命名空间账户(Hierarchical Namespace, HNS)中有不同的表现。

两种账户类型的区别

对于标准FNS账户：

目录实际上只是Blob名称中包含"/"的零长度Blob
没有真正的目录对象，只有模拟的层次结构
需要使用特定的API来识别这些"虚拟目录"

对于HNS账户：

在Python SDK中的处理方法

使用azure-storage-blob SDK

对于标准FNS账户，可以通过以下方式识别目录：

检查BlobProperties中的size属性是否为0
检查名称是否以"/"结尾
使用分层列表API遍历时，会自动处理目录结构

示例代码：

from azure.storage.blob import BlobServiceClient

# 连接Blob服务
blob_service_client = BlobServiceClient.from_connection_string(conn_str)

# 获取容器客户端
container_client = blob_service_client.get_container_client(container_name)

# 遍历Blob并识别目录
for blob in container_client.list_blobs():
    if blob.name.endswith('/') and blob.size == 0:
        print(f"发现目录: {blob.name}")

使用azure-storage-file-datalake SDK

对于HNS账户，应该使用专门的DataLake SDK：

PathProperties对象包含is_directory属性
get_paths API专门用于处理目录结构
提供了完整的目录操作支持

示例代码：

from azure.storage.filedatalake import DataLakeServiceClient

# 连接DataLake服务
datalake_service_client = DataLakeServiceClient.from_connection_string(conn_str)

# 获取文件系统客户端
file_system_client = datalake_service_client.get_file_system_client(file_system_name)

# 获取路径列表并识别目录
paths = file_system_client.get_paths()
for path in paths:
    if path.is_directory:
        print(f"发现目录: {path.name}")