BanditPAM 项目使用教程

2024-09-18 11:17:05作者：彭桢灵Jeremy

BanditPAM

请求失败，请稍后重试

1. 项目目录结构及介绍

BanditPAM 项目的目录结构如下：

BanditPAM/
├── CMakeLists.txt
├── LICENSE
├── MANIFEST.in
├── README.md
├── pyproject.toml
├── requirements.txt
├── setup.py
├── banditpam/
│   ├── __init__.py
│   ├── banditpam.py
│   └── ...
├── data/
│   ├── MNIST_1k.csv
│   └── ...
├── docs/
│   ├── conf.py
│   ├── index.rst
│   └── ...
├── scripts/
│   ├── docker/
│   │   ├── env_setup.sh
│   │   └── run_docker.sh
│   └── ...
├── src/
│   ├── banditpam.cpp
│   └── ...
├── tests/
│   ├── test_smaller.py
│   ├── test_larger.py
│   └── ...
└── ...

目录结构介绍

CMakeLists.txt: CMake 构建文件，用于编译 C++ 代码。
LICENSE: 项目许可证文件。
MANIFEST.in: Python 包清单文件，指定哪些文件需要包含在发布包中。
README.md: 项目介绍和使用说明。
pyproject.toml: Python 项目配置文件。
requirements.txt: Python 依赖包列表。
setup.py: Python 包安装脚本。
banditpam/: Python 包目录，包含主要的 Python 代码。
data/: 数据目录，包含示例数据集。
docs/: 文档目录，包含 Sphinx 文档配置和源文件。
scripts/: 脚本目录，包含 Docker 相关脚本。
src/: C++ 源代码目录。
tests/: 测试目录，包含单元测试脚本。

2. 项目启动文件介绍

启动文件

banditpam/banditpam.py: 这是 BanditPAM 的主要 Python 启动文件。它包含了 KMedoids 类的实现，用于执行 k-medoids 聚类。

使用示例

from banditpam import KMedoids
import numpy as np

# 生成示例数据
data = np.random.rand(100, 2)

# 初始化 KMedoids 对象
kmed = KMedoids(n_medoids=3, algorithm="BanditPAM")

# 执行聚类
kmed.fit(data, 'L2')

# 输出聚类结果
print(kmed.labels)

3. 项目配置文件介绍

配置文件

pyproject.toml: 这个文件定义了 Python 项目的构建系统和依赖项。它通常包含以下内容：

[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "banditpam"
version = "1.0.0"
description = "A high-performance implementation of BanditPAM for k-medoids clustering."
authors = [
    { name="Mo Tiwari", email="mo.tiwari@example.com" },
    { name="Martin Jinye Zhang", email="martin.zhang@example.com" },
]
dependencies = [
    "numpy>=1.18.0",
    "scikit-learn>=0.22.0",
]