Compressed Bitset in C++ 技术文档

2024-12-23 06:18:55作者：宣利权Counsellor

1. 安装指南

1.1 环境要求

支持的操作系统：MacOS、Windows、Linux
支持的编译器：clang++、g++、Intel 编译器、Microsoft Visual Studio
处理器架构：x64、32 位 ARM
需要 C++11 标准支持

1.2 安装步骤

1.2.1 Linux 和类 Linux 系统

运行以下命令进行构建：

cmake -B build
cmake --build build
cd build
ctest

1.2.2 Windows 系统（使用 Visual Studio）

使用 GitHub Desktop 或其他工具克隆代码。
安装 Visual Studio 时，选择 Visual C++ tools for CMake 组件。
在 Visual Studio 中，选择 File > Open > Folder... 打开项目文件夹。
右键点击 CMakeLists.txt，选择 Build 进行构建。
测试时，在标准工具栏中选择 Select Startup Item...，然后选择一个测试并运行。

2. 项目使用说明

2.1 概述

EWAHBoolArray 是一个压缩位集数据结构，支持通过模板参数设置不同的字长（16 位、32 位、64 位）。64 位字长通常提供更好的性能，但内存使用较高；32 位字长可能在压缩方面表现更好，但性能稍差。

2.2 基本使用

以下是一个简单的代码示例，展示了如何使用 EWAHBoolArray：

#include "ewah.h"
using namespace ewah;

typedef EWAHBoolArray<uint32_t> bitmap;

bitmap bitset1 = bitmap::bitmapOf(9, 1, 2, 1000, 1001, 1002, 1003, 1007, 1009, 100000);
std::cout << "first bitset : " << bitset1 << std::endl;
bitmap bitset2 = bitmap::bitmapOf(5, 1, 3, 1000, 1007, 100000);
std::cout << "second bitset : " << bitset2 << std::endl;
bitmap bitset3 = bitmap::bitmapOf(3, 10, 11, 12);
std::cout << "third  bitset : " << bitset3 << std::endl;
bitmap orbitset = bitset1 | bitset2;
bitmap andbitset = bitset1 & bitset2;
bitmap xorbitset = bitset1 ^ bitset2;
bitmap andnotbitset = bitset1 - bitset2;

2.3 示例代码

请参考 examples/example.cpp 获取更多示例。
对于表格数据的示例，请参考 example2.cpp。

3. 项目 API 使用文档

3.1 基本操作

bitmap::bitmapOf(size, ...)：创建一个包含指定元素的位图。
|：按位或操作，计算两个位图的并集。
&：按位与操作，计算两个位图的交集。
^：按位异或操作，计算两个位图的异或结果。
-：按位与非操作，计算两个位图的差集。

3.2 其他操作

EWAHBoolArray::size()：返回位图的大小。
EWAHBoolArray::get(index)：获取指定位置的位值。
EWAHBoolArray::set(index, value)：设置指定位置的位值。

4. 项目安装方式

4.1 通过 CMake 构建

在 Linux 和类 Linux 系统上，使用 CMake 进行构建：
```
cmake -B build
cmake --build build
```
在 Windows 系统上，使用 Visual Studio 进行构建。

4.2 依赖

该项目无外部依赖，可在多种操作系统上运行。

5. 进一步阅读

请参考相关论文和文档以深入了解压缩位图的实现和优化：
- Daniel Lemire 等人的论文 arXiv:1709.07821
- Owen Kaser 和 Daniel Lemire 的论文 arXiv:0901.3751
- Owen Kaser 和 Daniel Lemire 的论文 arXiv:1402.4466