Skip to content

[Store](refactor) Extract 3FS logic into DistributedStorageBackend#2234

Open
LujhCoconut wants to merge 8 commits into
kvcache-ai:mainfrom
LujhCoconut:refact/dfs-backend
Open

[Store](refactor) Extract 3FS logic into DistributedStorageBackend#2234
LujhCoconut wants to merge 8 commits into
kvcache-ai:mainfrom
LujhCoconut:refact/dfs-backend

Conversation

@LujhCoconut
Copy link
Copy Markdown
Collaborator

@LujhCoconut LujhCoconut commented May 26, 2026

Description

This PR refactors the 3FS integration in mooncake-store from a parasitic "file implementation switch" inside StorageBackend into a first-class standalone backend.

Problem: StorageBackend was a concrete class that handled both local SSD (file-per-key) and 3FS distributed storage via #ifdef USE_3FS and is_3fs_dir_ branching. This caused:

  • 3FS was not a peer to BucketStorageBackend / OffsetAllocatorStorageBackend
  • Eviction logic was polluted with 3FS special cases (IsEvictionEnabled() checked is_3fs_dir_)
  • Extending to other DFS (CephFS, JuiceFS) would require more #ifdef branches
  • No unit test coverage for 3FS paths because StorageBackend could not be mocked

Architecture Overview

Before:
  StorageBackend (concrete)
    ├── local: PosixFile / UringFile
    └── #ifdef USE_3FS: ThreeFSFile (via is_3fs_dir_ + 3fs-virt probe)

After:
  StorageBackend (pure local)
    └── PosixFile / UringFile

  DistributedStorageBackend (new)
    └── FileSystemAdapter (abstract)
            └── Hf3fsAdapter (3FS via USRBIO)
            └── CephFsAdapter (future)
            └── JuiceFsAdapter (future)

Solution:

  1. Add DistributedStorageBackend as a new StorageBackendInterface implementation, peer to existing backends
  2. Introduce FileSystemAdapter abstract interface inside DistributedStorageBackend to encapsulate DFS differences
  3. Implement Hf3fsAdapter as the first FileSystemAdapter (reuses existing USRBIO logic)
  4. Strip all 3FS logic from StorageBackend (removes is_3fs_dir_, USRBIOResourceManager, ThreeFSFile creation, 3fs-virt path probing)
  5. Add StorageBackendType::kDistributed and wire it through CreateStorageBackend() factory
  6. Add DFS-specific error codes (-1600 ~ -1605)
  7. Replace fragile path heuristic (fs::exists("3fs-virt")) with explicit configuration-driven backend selection

Closes: (link to issue if applicable)

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • Mooncake EP (mooncake-ep)
  • Integration (mooncake-integration)
  • P2P Store (mooncake-p2p-store)
  • Python Wheel (mooncake-wheel)
  • PyTorch Backend (mooncake-pg)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Breaking change
  • Documentation update
  • Other

Breaking change details:

  • The automatic 3fs-virt path detection in StorageBackend::Create() is removed. Users relying on this heuristic must now explicitly set MOONCAKE_OFFLOAD_STORAGE_BACKEND_DESCRIPTOR=distributed_storage_backend and MOONCAKE_DISTRIBUTED_FS_TYPE=hf3fs.
  • StorageBackend constructor no longer accepts is_3fs_dir parameter.

How Has This Been Tested?

  • Existing mooncake-store unit tests pass without regression (bucket / file-per-key / offset-allocator backends unchanged)
  • Integration test with real 3FS cluster (pending)
  • New mock-based unit tests for DistributedStorageBackend using mock FileSystemAdapter (to be added in follow-up)

Build verification:

cmake .. -DUSE_3FS=ON
make -j$(nproc)

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation. (docs update in follow-up PR)
  • I have added tests to prove my changes are effective. (mock tests in follow-up PR)

Files Changed

# File Change
1 include/types.h Add DFS error codes (DFS_NETWORK_TIMEOUT ~ DFS_PARTIAL_WRITE)
2 include/storage_backend.h Add kDistributed; remove 3FS members from StorageBackend
3 src/storage_backend.cpp Add factory branch for kDistributed; remove 3FS logic
4 src/file_storage.cpp Parse distributed_storage_backend descriptor
5 src/CMakeLists.txt Add distributed_storage_backend.cpp; conditionally add hf3fs_adapter.cpp
6 include/storage/distributed/fs_adapter.h New: FileSystemAdapter abstract interface
7 include/storage/distributed/distributed_storage_backend.h New: DistributedStorageBackend + DistributedStorageConfig
8 include/storage/distributed/hf3fs_adapter.h New: Hf3fsAdapter declaration
9 src/storage/distributed/distributed_storage_backend.cpp New: Implementation (key→path mapping, BatchOffload/Load/ScanMeta)
10 src/storage/distributed/hf3fs_adapter.cpp New: USRBIO read/write/delete implementation

- Add FileSystemAdapter abstraction for distributed filesystems
- Add DistributedStorageBackend as a first-class StorageBackendInterface
- Add Hf3fsAdapter implementing FileSystemAdapter for 3FS USRBIO
- Remove 3FS logic from StorageBackend (is_3fs_dir_, ThreeFSFile, etc.)
- Add kDistributed to StorageBackendType enum and factory
- Add DFS error codes (-1600 ~ -1605)
- Use MOONCAKE_DISTRIBUTED_* env vars for configuration
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a distributed storage backend (DistributedStorageBackend) along with a filesystem adapter interface (FileSystemAdapter) and an implementation for HF3FS (Hf3fsAdapter). It also refactors the existing StorageBackend to decouple eviction logic from the 3FS-specific implementation. The review identified several issues, including a critical buffer overflow risk in VectorReadFile due to an unchecked error code conversion, direct use of POSIX ::stat and std::filesystem::create_directories which bypasses the filesystem abstraction and can cause crashes, silent error suppression in ScanMeta, unsafe string unescaping in UnescapeFilename, and a configuration validation mismatch where "posix" is accepted but causes a factory crash.

Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp
Comment thread mooncake-store/src/storage/distributed/distributed_storage_backend.cpp Outdated
Comment thread mooncake-store/src/storage/distributed/distributed_storage_backend.cpp Outdated
Comment thread mooncake-store/include/storage/distributed/fs_adapter.h Outdated
- Add GetFileSize() virtual method to FileSystemAdapter, refactor
  ListFilesWithInfo() to use it instead of inline ::stat
- Fix ReadFile() signature formatting
- Apply all correctness/safety/compatibility fixes from final review
@LujhCoconut
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new DistributedStorageBackend and a FileSystemAdapter interface to support distributed filesystem storage backends (such as HF3FS), while refactoring the existing StorageBackend to decouple eviction logic from hardcoded 3FS checks. The review feedback highlights several critical issues: uninitialized fields in StorageObjectMetadata during offloading and scanning, non-deterministic behavior from using std::hash for path mapping across restarts/nodes, potential race conditions in the static health check probe path, silent error suppression in ListFiles and FileExists within the HF3FS adapter, and the unfriendly use of LOG(FATAL) in library initialization code.

Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp Outdated
Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp
Comment thread mooncake-store/src/storage_backend.cpp
- fs_adapter.h: Distinguish ENOENT from other errno in GetFileSize;
  propagate non-ENOENT errors in ListFilesWithInfo
- distributed_storage_backend: Use XXH64 for deterministic cross-node
  hashing; initialize StorageObjectMetadata fully; add UUID to health
  check probe path; handle health-check probe deletion errors
- hf3fs_adapter: Move null-buffer check before open() to avoid fd leak;
  check unlink() return value on partial write failures; improve errno
  handling in FileExists and ListFiles; add POSIX API assumption note
- storage_backend: Replace LOG(FATAL) with LOG(ERROR) and return
  INVALID_PARAMS in factory for unsupported backends
@LujhCoconut
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new distributed storage backend (DistributedStorageBackend) along with a file system adapter interface (FileSystemAdapter) and a concrete implementation for 3FS (Hf3fsAdapter). It also refactors the existing StorageBackend to decouple eviction logic from 3FS-specific code and integrates the new distributed backend into the factory creation logic. The review feedback highlights several critical robustness and performance improvements, such as validating the initialization status of the backend before performing operations, checking for null pointers in fs_adapter_ and iovec elements, optimizing sequential directory creation during initialization, and improving error handling during metadata scanning and directory reading.

Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp
Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp
- distributed_storage_backend: Add !initialized_ guard to BatchOffload,
  BatchLoad, IsExist, and ScanMeta
- distributed_storage_backend: Skip missing bucket directories (FILE_NOT_FOUND)
  during ScanMeta instead of failing entirely
- hf3fs_adapter: Add null iov_base validation in VectorWriteFile
Add explicit check that ReadFile returned bytes match slice.size,
not relying solely on the adapter internal validation.
@LujhCoconut
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a distributed filesystem storage backend (DistributedStorageBackend) along with a generic filesystem adapter interface (FileSystemAdapter) and a concrete implementation for 3FS (Hf3fsAdapter). It also refactors the existing StorageBackend to decouple eviction logic from specific storage types, introduces new DFS-specific error codes, and updates the build configuration and factory methods to support the new distributed backend. The review comments suggest several robustness and error-handling improvements, such as validating that fsdir is an absolute path, preventing double initialization, checking the return value of DeleteFile in the health check success path, ensuring batch_object is not empty in BatchOffload, optimizing ScanMeta by accumulating keys across buckets to reduce RPCs, and mapping standard POSIX errnos (like ENOENT, EACCES, ENOSPC) to specific DFS error codes in file I/O operations.

Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp Outdated
Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp Outdated
Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp
Comment thread mooncake-store/src/storage/distributed/hf3fs_adapter.cpp
DistributedStorageBackend now uses XXH64 for key hashing (introduced
in prior commit). Go CGO builds require explicit linkage against
libxxhash, which was missing from both CI workflow and local build.sh.
- Validate fsdir is an absolute path in DistributedStorageConfig
- Add idempotency guard to Init() (skip if already initialized)
- Hoist batch_keys/batch_metas out of per-bucket loop in ScanMeta to
  reduce handler callback frequency across all hash buckets
- Distinguish ENOENT from other open() errors in ReadFile and
  VectorReadFile for more precise error reporting
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0.48544% with 205 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...torage/distributed/distributed_storage_backend.cpp 0.00% 188 Missing ⚠️
mooncake-store/src/storage_backend.cpp 6.25% 15 Missing ⚠️
mooncake-store/src/file_storage.cpp 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

- Remove Capabilities/GetCapabilities() dead code from FileSystemAdapter
  and Hf3fsAdapter
- Remove unused mount_path_ and initialized_ members from Hf3fsAdapter
- Auto-convert relative fsdir to absolute path in FromEnvironment()
- Use std::filesystem::path for robust path construction in GetObjectPath()
- Filter out directories in ListFiles() to prevent ScanMeta from
  incorrectly treating subdirectories as object keys
@LujhCoconut LujhCoconut marked this pull request as ready for review May 27, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants