[Store](refactor) Extract 3FS logic into DistributedStorageBackend#2234
[Store](refactor) Extract 3FS logic into DistributedStorageBackend#2234LujhCoconut wants to merge 8 commits into
Conversation
- Add FileSystemAdapter abstraction for distributed filesystems - Add DistributedStorageBackend as a first-class StorageBackendInterface - Add Hf3fsAdapter implementing FileSystemAdapter for 3FS USRBIO - Remove 3FS logic from StorageBackend (is_3fs_dir_, ThreeFSFile, etc.) - Add kDistributed to StorageBackendType enum and factory - Add DFS error codes (-1600 ~ -1605) - Use MOONCAKE_DISTRIBUTED_* env vars for configuration
There was a problem hiding this comment.
Code Review
This pull request introduces a distributed storage backend (DistributedStorageBackend) along with a filesystem adapter interface (FileSystemAdapter) and an implementation for HF3FS (Hf3fsAdapter). It also refactors the existing StorageBackend to decouple eviction logic from the 3FS-specific implementation. The review identified several issues, including a critical buffer overflow risk in VectorReadFile due to an unchecked error code conversion, direct use of POSIX ::stat and std::filesystem::create_directories which bypasses the filesystem abstraction and can cause crashes, silent error suppression in ScanMeta, unsafe string unescaping in UnescapeFilename, and a configuration validation mismatch where "posix" is accepted but causes a factory crash.
- Add GetFileSize() virtual method to FileSystemAdapter, refactor ListFilesWithInfo() to use it instead of inline ::stat - Fix ReadFile() signature formatting - Apply all correctness/safety/compatibility fixes from final review
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new DistributedStorageBackend and a FileSystemAdapter interface to support distributed filesystem storage backends (such as HF3FS), while refactoring the existing StorageBackend to decouple eviction logic from hardcoded 3FS checks. The review feedback highlights several critical issues: uninitialized fields in StorageObjectMetadata during offloading and scanning, non-deterministic behavior from using std::hash for path mapping across restarts/nodes, potential race conditions in the static health check probe path, silent error suppression in ListFiles and FileExists within the HF3FS adapter, and the unfriendly use of LOG(FATAL) in library initialization code.
- fs_adapter.h: Distinguish ENOENT from other errno in GetFileSize; propagate non-ENOENT errors in ListFilesWithInfo - distributed_storage_backend: Use XXH64 for deterministic cross-node hashing; initialize StorageObjectMetadata fully; add UUID to health check probe path; handle health-check probe deletion errors - hf3fs_adapter: Move null-buffer check before open() to avoid fd leak; check unlink() return value on partial write failures; improve errno handling in FileExists and ListFiles; add POSIX API assumption note - storage_backend: Replace LOG(FATAL) with LOG(ERROR) and return INVALID_PARAMS in factory for unsupported backends
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new distributed storage backend (DistributedStorageBackend) along with a file system adapter interface (FileSystemAdapter) and a concrete implementation for 3FS (Hf3fsAdapter). It also refactors the existing StorageBackend to decouple eviction logic from 3FS-specific code and integrates the new distributed backend into the factory creation logic. The review feedback highlights several critical robustness and performance improvements, such as validating the initialization status of the backend before performing operations, checking for null pointers in fs_adapter_ and iovec elements, optimizing sequential directory creation during initialization, and improving error handling during metadata scanning and directory reading.
- distributed_storage_backend: Add !initialized_ guard to BatchOffload, BatchLoad, IsExist, and ScanMeta - distributed_storage_backend: Skip missing bucket directories (FILE_NOT_FOUND) during ScanMeta instead of failing entirely - hf3fs_adapter: Add null iov_base validation in VectorWriteFile
Add explicit check that ReadFile returned bytes match slice.size, not relying solely on the adapter internal validation.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a distributed filesystem storage backend (DistributedStorageBackend) along with a generic filesystem adapter interface (FileSystemAdapter) and a concrete implementation for 3FS (Hf3fsAdapter). It also refactors the existing StorageBackend to decouple eviction logic from specific storage types, introduces new DFS-specific error codes, and updates the build configuration and factory methods to support the new distributed backend. The review comments suggest several robustness and error-handling improvements, such as validating that fsdir is an absolute path, preventing double initialization, checking the return value of DeleteFile in the health check success path, ensuring batch_object is not empty in BatchOffload, optimizing ScanMeta by accumulating keys across buckets to reduce RPCs, and mapping standard POSIX errnos (like ENOENT, EACCES, ENOSPC) to specific DFS error codes in file I/O operations.
DistributedStorageBackend now uses XXH64 for key hashing (introduced in prior commit). Go CGO builds require explicit linkage against libxxhash, which was missing from both CI workflow and local build.sh.
- Validate fsdir is an absolute path in DistributedStorageConfig - Add idempotency guard to Init() (skip if already initialized) - Hoist batch_keys/batch_metas out of per-bucket loop in ScanMeta to reduce handler callback frequency across all hash buckets - Distinguish ENOENT from other open() errors in ReadFile and VectorReadFile for more precise error reporting
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
- Remove Capabilities/GetCapabilities() dead code from FileSystemAdapter and Hf3fsAdapter - Remove unused mount_path_ and initialized_ members from Hf3fsAdapter - Auto-convert relative fsdir to absolute path in FromEnvironment() - Use std::filesystem::path for robust path construction in GetObjectPath() - Filter out directories in ListFiles() to prevent ScanMeta from incorrectly treating subdirectories as object keys
Description
This PR refactors the 3FS integration in
mooncake-storefrom a parasitic "file implementation switch" insideStorageBackendinto a first-class standalone backend.Problem:
StorageBackendwas a concrete class that handled both local SSD (file-per-key) and 3FS distributed storage via#ifdef USE_3FSandis_3fs_dir_branching. This caused:BucketStorageBackend/OffsetAllocatorStorageBackendIsEvictionEnabled()checkedis_3fs_dir_)#ifdefbranchesStorageBackendcould not be mockedArchitecture Overview
Solution:
DistributedStorageBackendas a newStorageBackendInterfaceimplementation, peer to existing backendsFileSystemAdapterabstract interface insideDistributedStorageBackendto encapsulate DFS differencesHf3fsAdapteras the firstFileSystemAdapter(reuses existing USRBIO logic)StorageBackend(removesis_3fs_dir_,USRBIOResourceManager,ThreeFSFilecreation,3fs-virtpath probing)StorageBackendType::kDistributedand wire it throughCreateStorageBackend()factory-1600 ~ -1605)fs::exists("3fs-virt")) with explicit configuration-driven backend selectionCloses: (link to issue if applicable)
Module
mooncake-transfer-engine)mooncake-store)mooncake-ep)mooncake-integration)mooncake-p2p-store)mooncake-wheel)mooncake-pg)mooncake-rl)Type of Change
Breaking change details:
3fs-virtpath detection inStorageBackend::Create()is removed. Users relying on this heuristic must now explicitly setMOONCAKE_OFFLOAD_STORAGE_BACKEND_DESCRIPTOR=distributed_storage_backendandMOONCAKE_DISTRIBUTED_FS_TYPE=hf3fs.StorageBackendconstructor no longer acceptsis_3fs_dirparameter.How Has This Been Tested?
mooncake-storeunit tests pass without regression (bucket / file-per-key / offset-allocator backends unchanged)DistributedStorageBackendusing mockFileSystemAdapter(to be added in follow-up)Build verification:
cmake .. -DUSE_3FS=ON make -j$(nproc)Checklist
./scripts/code_format.shbefore submitting.Files Changed
include/types.hDFS_NETWORK_TIMEOUT~DFS_PARTIAL_WRITE)include/storage_backend.hkDistributed; remove 3FS members fromStorageBackendsrc/storage_backend.cppkDistributed; remove 3FS logicsrc/file_storage.cppdistributed_storage_backenddescriptorsrc/CMakeLists.txtdistributed_storage_backend.cpp; conditionally addhf3fs_adapter.cppinclude/storage/distributed/fs_adapter.hFileSystemAdapterabstract interfaceinclude/storage/distributed/distributed_storage_backend.hDistributedStorageBackend+DistributedStorageConfiginclude/storage/distributed/hf3fs_adapter.hHf3fsAdapterdeclarationsrc/storage/distributed/distributed_storage_backend.cppsrc/storage/distributed/hf3fs_adapter.cpp