feat(adastra): port ChargE3Net fine-tuning to AMD MI250X on CINES Adastra by speckhard · Pull Request #1 · speckhard/LeMat-Rho

speckhard · 2026-05-19T16:01:25Z

Summary

Stacked on PR LeMaterial#8. Adds an Adastra-side variant of the ChargE3Net fine-tuning pipeline (NVIDIA A100 on Jean Zay → AMD MI250X on Adastra/CINES) without touching charge3net_ft/. Same training code, same dataset layout; only the submit script + setup runbook differ.

What's in this PR

File	What
`submit_charge3net_adastra.sh`	MI250 SLURM headers, ROCm `HIP_VISIBLE_DEVICES` alignment, `batch_size=8` (vs A100's 4, since MI250X has 64 GB HBM2e per GCD), `val_probes=1000`, online W&B (Adastra proxy gives live internet), auto-resume from `latest.pt`. Submit dir defaults to `cad16353` scratch, account billed to `c1816212`.
`ADASTRA.md`	Step-by-step setup (proxy, venv, dataset transfer) + a gotchas table covering the seven port blockers.
`tests/test_data.py`	New `test_ignores_extra_columns` regression test for the Bader-analysis columns that `Entalpic/lemat-rho-v1` added (`bader_charges`, `bader_volumes`, `material_id`).

Port blockers solved

#	Symptom	Cause	Fix
1	`pip install` returns HTTP 000	Adastra doesn't auto-set `HTTP_PROXY`	Export `HTTP_PROXY=http://proxy-l-adastra.cines.fr:3128` (+ HTTPS, lowercase); now in `~/.bashrc` on Adastra
2	April setup vanished	CINES 30-day scratch purge	Setup tree under `\$LEMATRHO_ADASTRA_SETUP` is now rebuildable from sources
3	`pip install boto3` times out	Adastra's pip prefers `gorgone.cines.fr` (missing boto3)	`pip install --index-url https://pypi.org/simple ...` for non-torch deps
4	`snapshot_download` reports 100% but cache is empty	HF Xet backend silently no-ops on Adastra	Raw `curl` with `Authorization: Bearer` per file (3.5 GB in 16 s with `xargs -P 8`)
5	`sbatch: You are not allowed to ask for a qos`	`--qos=debug` not granted on team accounts	Omit `--qos`; default works with 6 h MaxWall
6	Exit code `0:53` (signal 53 = prolog failure), no log files	`c1816212` group inode quota at hard cap (Ali owns ~85% of 1.1M files)	Cross-account setup: submit dir on cad16353 scratch (390 k headroom), `--account=c1816212` (active window). Account and scratch dir are independent in SLURM.
7	sbatch `.out` lands in `\$HOME`	sbatch over SSH without `cd` defaults `WorkDir=\$HOME`	`cd \$WORK_DIR && sbatch ...` in the submit script

Reference smoke run

Job 4969516 on g1342, 2026-05-19. Loaded 65,239 / 68,549 valid materials from 69 parquet chunks. 1,150 training steps in 12 min wall, train L1 down from 29.95 (step 50) → 5.67 (step 1,000). Hit TIMEOUT before completing the epoch (expected: one epoch ≈ 150 min at the debug-run knobs); no val/test metrics yet. A 6 h job under the production knobs in this script is the next step.

Test plan

`pytest tests/test_data.py -v` — 11/11 pass including the new `test_ignores_extra_columns`
`ruff format` + `ruff check` — clean
Manual smoke run on Adastra (job 4969516) — pipeline trains end-to-end on AMD MI250X
Real-data 6 h run with production knobs (`batch=8`, `val_probes=1000`, online W&B) — follow-up, not in this PR

…stra Adds an Adastra-side variant of submit_charge3net.sh and a runbook covering the seven blockers encountered during the port: - HTTP proxy must be set explicitly (Adastra doesn't auto-export it), - 30-day scratch purge wipes setup, so $LEMATRHO_ADASTRA_SETUP is rebuildable from sources, - pip on Adastra defaults to gorgone.cines.fr (missing boto3 etc); --index-url https://pypi.org/simple is required, - huggingface_hub Xet backend silently no-ops the payload fetch, so raw curl with Authorization: Bearer is used for the dataset, - --qos=debug is not granted on the team accounts, - group inode quota on /lus/scratch/CT10/c1816212/ is at the hard cap, so the submit dir lives on cad16353 scratch while the job is billed to c1816212 (account and scratch dir are independent dimensions), - sbatch over SSH defaults WorkDir to \$HOME unless cd'd first. submit_charge3net_adastra.sh mirrors the Jean Zay script (auto-resume from latest.pt, 50-epoch budget) but with MI250 SLURM headers, ROCm HIP_VISIBLE_DEVICES alignment, batch_size=8 (HBM2e has 64 GB per GCD vs A100's 40-80), val_probes=1000, and online W&B (the Adastra proxy gives us live internet, so the Jean Zay offline-then-sync dance is unnecessary). Adds a regression test test_ignores_extra_columns for the dataset loader: Entalpic/lemat-rho-v1 added Bader analysis columns (bader_charges, bader_volumes, material_id) which would have broken _build_parquet_index if it didn't honor the four-column _COLUMNS allowlist. The test confirms the allowlist still holds. Reference smoke run: job 4969516 on g1342, May 19 2026. 65,239 of 68,549 valid materials loaded from 69 parquet chunks. 1,150 training steps in 12 min wall, train L1 down from 29.95 at step 50 to 5.67 at step 1,000. Hit TIMEOUT before completing the epoch (expected: one epoch needs ~150 min at batch=4), no val/test metrics yet; a follow-up 6h job under the production knobs will produce those.

…uards Adds tests/test_equivariance.py with 7 structural tests that pin down the architectural properties needed for ChargE3Net's rotational equivariance guarantee: - Production model has 1.9M params (catches drift that would break loading charge3net_mp.pt). - atom_irreps_sequence reaches lmax >= 4 (the "higher-order" in the paper title; a silent drop to lmax=0 would degenerate the model to a much weaker scalar-only baseline). - Atom representation includes both even and odd parity components. - get_irreps(500, lmax=4) returns 10 entries with no zero-multiplicity irreps (catches a regression that would silently delete some irreps). - atom_irreps_sequence length matches num_interactions. - Atom-model cutoff matches the 4.0 A baked into KdTreeGraphConstructor in LeMatRhoDataset. - Final irreps are an e3nn o3.Irreps instance (replacing this with a plain list would silently break equivariance while still producing output). A runtime equivariance check (rotate inputs, predict, compare) is the gold standard but requires a real forward pass at production hyperparameters that is too slow for a CPU unit test. The structural tests cover the same property at the architecture level. Tests autoskip when the sibling AIforGreatGood/charge3net repo is absent.

… training Two changes motivated by job 4969727 (FAILED after 1h47m on the previous single-GPU submit): 1. Multi-GPU via torch DistributedDataParallel. The paper uses per-GPU batch=16 across 4 GPUs (effective batch=64). Our previous Adastra submit was single-GPU batch=8 — 8x smaller effective batch. With the half-node submit (4 GCDs, 64 CPUs, 128 GB RAM, batch=16 per GCD) the effective batch now matches the paper. Implementation: - New _setup_ddp / _is_ddp / _is_main helpers in train.py read WORLD_SIZE / RANK / LOCAL_RANK / MASTER_ADDR / MASTER_PORT from the env (set in the submit script via srun + scontrol show hostname). - Backend is nccl which routes through RCCL on AMD ROCm builds. - Model wrapped in DistributedDataParallel after .to(device). - DistributedSampler injected into the train loader via a new distributed=True flag on build_dataloaders. Val/test stay non-distributed; cheap enough at 5% of 65k. - DistributedSampler.set_epoch called each epoch for proper shuffling. - All prints and wandb logs gated on is_main (rank 0 only). - Save and load go through a new _unwrap helper so checkpoints are interchangeable between single-GPU and DDP runs. - dist.barrier at end of each epoch to keep ranks in lockstep during checkpoint saves. - dist.destroy_process_group at the very end. 2. Wandb soft-fail. wandb.init now sits inside try/except — if the compute node can't reach api.wandb.ai through the proxy (which is what killed job 4969727 after 5min of timeouts and 1h47m elapsed total), the script logs a warning and sets use_wandb=False so training proceeds with stdout + checkpoints only. Submit script (submit_charge3net_adastra.sh) updated for half-node: --nodes=1 --ntasks-per-node=4 --gpus-per-node=4 --cpus-per-task=16 --mem=125000M --time=06:00:00 plus srun-based DDP launcher that exports RANK/LOCAL_RANK per task, batch_size=16 per GPU, val_probes=1000, wandb-mode=offline. Test plan - pytest tests/ ... 34 passed, 1 failure pre-existing (test_metrics collection error from src.charge3net path shadowing in pytest; unrelated, same on main). - ruff format + check clean on the touched files. - DDP path not yet exercised end-to-end on Adastra; the immediate next step is a 6h submission. If the DDP init fails, the single-GPU code path is still reachable by running without srun.

…om-scratch (TDD) The submit script now reads LEMATRHO_TRAINING_MODE to switch between two runs that share all infrastructure (same DDP, same hyperparams, same dataset, same node layout) but differ in init: pretrained (default) --ckpt-path charge3net_mp.pt save-dir charge3net_checkpoints/ WANDB_NAME=pretrained_mp from_scratch no --ckpt-path (random init) save-dir charge3net_checkpoints_fromscratch/ WANDB_NAME=from_scratch Auto-resume from latest.pt is per-mode (the two save-dirs don't collide), so each arm can be relaunched independently via sbatch ... submit_charge3net_adastra.sh until val NMAPE plateaus. Also adds a LEMATRHO_DRY_RUN=1 escape hatch that prints the resolved train command and exits 0 without sourcing the venv or invoking srun. Used by the 9 new pytest tests in tests/test_submit_script.py: - dry-run prints train command - default mode is pretrained, uses MP checkpoint - pretrained writes to charge3net_checkpoints (not fromscratch dir) - from_scratch drops --ckpt-path completely and never references charge3net_mp.pt - from_scratch uses a separate save dir - WANDB_NAME differs between modes - invalid mode exits non-zero with a clear error - batch-size 16, val-probes 1000 (paper-matching) - wandb-mode is offline TDD: 9 tests RED before the refactor, all GREEN after. Full suite still 33 passed (data + model + equivariance + submit). ruff format + check clean. Submission examples in the script header and in ADASTRA.md.

PR 1 of a 2-PR stack to land DeepDFT as a baseline for the ChargE3Net VASP-speedup experiment. This PR adds only the data adapter; PR 2 will add the training submission (DDP-patched). What's here: deepdft_ft/__init__.py empty package marker deepdft_ft/data.py LeMatRhoDeepDFTDataset adapter tests/test_deepdft_data.py 11 TDD tests pinning the contract The adapter reuses charge3net_ft.data's _row_to_atoms_and_density and _build_parquet_index, then re-shapes the per-sample output into the dict that DeepDFT's CollateFuncRandomSample expects: { "density": np.ndarray (Nx, Ny, Nz), "atoms": ase.Atoms, "origin": np.ndarray (3,), "grid_position": np.ndarray (Nx, Ny, Nz, 3), "metadata": {"filename": str}, } _calculate_grid_pos is inlined from upstream DeepDFT/dataset.py so this adapter has no runtime dependency on the DeepDFT sibling repo (which keeps the test suite hermetic). Tests pinned (RED then GREEN): - dataset length matches the count of valid parquet rows - sample dict has all 5 required keys - density is a 3D numpy array - atoms is ase.Atoms with PBC True/True/True - origin is zeros (matches LeMat-Rho convention) - grid_position has shape (Nx, Ny, Nz, 3) - grid_position[0,0,0] = (0,0,0) - grid_position[1,0,0] = (a_lattice / Nx, 0, 0) - metadata.filename present and unique per sample - extra columns (bader_charges, material_id) ignored - empty parquet dir raises FileNotFoundError Caching is keyed by absolute parquet path (not file index) so multiple LeMatRhoDeepDFTDataset instances pointing at different directories don't collide on fi=0 (which bit me writing the metadata test). Full LeMat-Rho test suite: 44 passed. Ruff format + check clean. Next: PR 2 will add deepdft_ft/runner.py (vendored from upstream DeepDFT + DDP patches) and submit_deepdft_adastra.sh (4-GCD half-node DDP, PaiNN model variant for equivariance parity with ChargE3Net).

PR 2 of the DeepDFT-on-LeMat-Rho stack (PR 1 was the data adapter). Closes the gap from "we have a DeepDFT-compatible Dataset" to "we can sbatch a 4-GCD DDP DeepDFT training run on Adastra". What's here: deepdft_ft/runner.py vendored from peterbjorgensen/DeepDFT@main + DDP patches + LeMat-Rho parquet auto-detect + asap3 stub (no C++ headers on Adastra) submit_deepdft_adastra.sh half-node 4-GCD DDP submission, PaiNN default, LEMATRHO_DEEPDFT_VARIANT={painn,schnet} env var, LEMATRHO_DRY_RUN=1 supported DDP patches mirror what we did in charge3net_ft/train.py: - _setup_ddp + _is_main + _unwrap helpers - DistributedSampler when WORLD_SIZE>1, RandomSampler otherwise - DistributedDataParallel wrap of the PaiNN/SchNet model - All logging.info and checkpoint saves gated on rank 0 - Device pinned to cuda:LOCAL_RANK via torch.cuda.set_device LeMat-Rho parquet auto-detect: if --dataset points at a directory containing chunk_*.parquet, the runner uses LeMatRhoDeepDFTDataset (PR 1). Other dataset paths (.tar, .txt, dir of cube/CHGCAR) still work unchanged — upstream's dataset.DensityData path is preserved. asap3 stub: upstream DeepDFT imports asap3 at module load. asap3 needs Python.h to build from source which isn't on Adastra (and would need admin). The stub at the top of runner.py registers a fake asap3 module with a FullNeighborList class that delegates to ASE's NewPrimitiveNeighborList. Slower than real asap3 but functionally identical for DeepDFT's call sites. Skipped when real asap3 is installed. Submit script defaults: - PaiNN model (matches equivariance of ChargE3Net for the comparison) - batch=2 (DeepDFT's upstream default — they iterate on probes, not materials, so per-batch counts work differently from ChargE3Net) - cutoff=4.0, num_interactions=3, node_size=128 - max_steps=1e8 (effectively unbounded; SLURM walltime is the limiter) - WANDB_NAME=deepdft_painn (or deepdft_schnet) Verified on Adastra: runner module imports cleanly under the venv311, asap3 stub kicks in without error, parquet directory detection works. The actual training run will be submitted next.

Root-causes job 4971720's OOM-kill at startup and aligns the DeepDFT training to the upstream paper's submission settings. Two changes: 1. submit_deepdft_adastra.sh: switch from half-node DDP (4 GCDs) to paper-faithful single-GPU (1 GCD on mi250-shared, HIP_VISIBLE_DEVICES=0, WORLD_SIZE unset). Upstream DeepDFT was trained on 1x RTX 3090 per pretrained_models/*/submit_script.sh. Single-GPU keeps gradient-step semantics identical to the paper's batch=2; no LR sweep needed. Effective hyperparameters are now exactly the upstream PaiNN settings from pretrained_models/{nmc,qm9,ethylenecarbonate}_painn/commandline_args.txt: --cutoff 4 --num_interactions 3 --node_size 128 --max_steps 10000000 --use_painn_model batch_size=2 materials (hardcoded in runner.py) train_probes=1000 per material (hardcoded) val_probes=5000 per material (hardcoded) DDP code paths in runner.py stay in place but only fire when WORLD_SIZE>1, so a future DDP variant of DeepDFT is one env flip away. 2. deepdft_ft/runner.py: replace upstream's eager validation preload `val_loader = [b for b in val_loader]` with a comment explaining why we left it as a streaming DataLoader. Upstream's val sets are ~100 materials (NMC, QM9 ethylenecarbonate subsets) so the preload is cheap. Our val set is 3,261 materials at 5000 probes each, x4 ranks under DDP, which materialised ~150 GB and OOM-killed job 4971720 at startup before a single training step. Streaming the val loader is a data-loading detail, not a hyperparameter; the model math is unchanged. Test plan: - 44/44 local tests still pass (no behavioural changes to the data adapter or submit-script env contract; only the runner internals and the SLURM headers move). - New job to be submitted as the next step; will confirm DeepDFT trains and produces step-level loss in the .out log.

Observation from jobs 4971293 and 4971343: SLURM bumped both to EXCLUSIVE mode despite us requesting half-node resources. The --mem=125000M line was exactly half the 256 GB node's memory, which crosses SLURM's auto-exclusive threshold. Dropping --mem entirely lets SLURM allocate memory proportional to our CPU share (64 of 128 logical CPUs -> ~128 GB out of 256 GB). The other half of the node stays schedulable for other users / jobs. The currently running jobs 4971293 and 4971343 keep their exclusive allocations; only future submissions are affected. Test plan - 9/9 tests in tests/test_submit_script.py still pass (no memory assertion). - Will confirm on next sbatch by inspecting AllocTRES.

Root-causes the OOM that killed jobs 4971293 and 4971343 at MaxRSS=35 GB per rank (140 GB cumulative across 4 DDP ranks, exceeding our 125 GB --mem budget). Two changes, both small: 1. charge3net_ft/data.py: bound _TABLE_CACHE with an LRU eviction policy capped at _TABLE_CACHE_MAX_CHUNKS=5. OrderedDict gives O(1) move-to-end on hit and popitem(last=False) on miss-with-eviction. The previous dict was unbounded, so each DataLoader worker accumulated every chunk it had ever seen. With ~2 GB per pyarrow-decompressed chunk (compressed_charge_density JSON strings inflate 6x) and 32 worker processes (8 per rank x 4 ranks), the cache alone grew to ~140 GB over 6 h. 2. submit_charge3net_adastra.sh: drop --num-workers from 8 to 2. Defense in depth on top of the LRU. At LeMat-Rho's 10x10x10 grid size the DataLoader's data-loading throughput isn't the bottleneck; 2 workers per rank x 4 ranks = 8 total workers is plenty, and per-rank cache pressure now drops by 4x. 3. tests/test_data.py: TestTableCacheLRU adds three regression tests (cache size bounded, LRU eviction order is correct, default cap is within a sensible range). TDD: RED before changes 1+2, GREEN after. Combined effect: cache pressure on a half-node DDP run drops from ~140 GB to roughly 4 ranks x 2 workers x 5 chunks x 2 GB = 80 GB worst case, and in practice much less because workers tend to revisit chunks. Comfortably under the ~128 GB shared-mode default mem. Full suite: 47 passed (test_metrics.py pre-existing src-shadow failure unrelated, same on main).

…ack) PR alpha of 4 for the SALTED-arm basis-expansion benchmark. This PR lands only the BasisSpec dataclass and its tests. PRs beta/gamma/delta land the projection layer, the rholearn model wrapper, and the VASP CHGCAR I/O respectively. What's here salted_ft/__init__.py exports BasisSpec, documents the stack salted_ft/basis.py frozen dataclass with the locked-in hyperparameters from Phase A4 of the investigation memo tests/test_salted_basis.py 19 TDD tests across 5 categories Design decisions captured by the tests BasisSpec is frozen, hashable, equality-by-value so it can key caches and identify metric runs without ambiguity. Mutation raises FrozenInstanceError. Validation happens in __post_init__ so a malformed spec raises at construction time, not deep in a tensor op three PRs from now. Negative max_l, zero n_radial, nonpositive sigma, nonpositive cutoff all rejected with clear messages. Default values match the Phase A4 lockdown verbatim max_l=4, n_radial=4, sigma=(0.5,1.0,2.0,4.0), cutoff=4.0 n_coeffs_per_atom == 100 from the formula n_radial * (max_l+1)**2. These numbers picked to match ChargE3Net's cutoff + lmax for a clean side-by-side comparison. Shape helpers n_angular_components -> (max_l + 1)**2 n_coeffs_per_atom -> n_radial * n_angular_components total_coeffs_shape(n_atoms) -> (n_atoms, n_coeffs_per_atom) used by downstream PRs for tensor allocation. Why locking these numbers matters Every downstream PR (projection, model, I/O) depends on the coefficient shape. Changing max_l or n_radial later requires retraining and re-running validation. Pin once, build around it. Test plan 19/19 tests pass. Ruff format + check clean. No interaction with Adastra; pure-Python dataclass. Next: PR beta = salted_ft/projection.py with project_chgcar_to_basis and reconstruct_grid_from_basis + their tests.

PR beta of 4. The DIY bridge between VASP plane-wave CHGCAR data and the rholearn/SALTED localized-basis world. Both libraries (SALTED, rholearn, also Graph2Mat) target localized-basis DFT codes (FHI-aims, CP2K, PySCF, SIESTA); VASP is plane-wave. So we have to build this projection layer ourselves regardless of which upstream we wrap. See the Phase A memo for the analysis. What's here salted_ft/projection.py - _grid_positions(grid_shape, cell) -> (n_grid, 3) Cartesian - _real_sph_harm(rhat, lmax) -> (..., (lmax+1)^2) real Y_lm values, hand-rolled for lmax <= 4 (covers the locked default). Standard SOAP / SALTED component ordering [Y_00, Y_1{-1}, Y_10, Y_11, Y_2{-2}, ..., Y_44]. - _eval_basis_at_grid(atom, grid, cell, spec) -> (n_grid, n_coeffs_per_atom) basis-function values with minimum-image PBC. - project_chgcar_to_basis(density, atoms, basis_spec) Orthonormal-approx projection: c_k = <B_k, rho> / <B_k, B_k>. v1 stand-in for proper overlap-matrix LSQR which lands in PR gamma. Linear in the input density. - reconstruct_grid_from_basis(coefficients, atoms, grid_shape, basis_spec). Literal expansion sum. Linear in the input coefficients. tests/test_salted_projection.py - TestProjectChgcarToBasis (6 tests) shape, zero->zero, dtype, linearity, additivity, finite. - TestReconstructGridFromBasis (6 tests) shape, zero->zero, dtype, linearity, single-atom-l0-peak-at- atom-position, finite. - TestProjectionReconstructionRoundtrip (2 tests) zero-density and zero-coefficient roundtrips. Tight roundtrip accuracy is intentionally NOT pinned; that lands in PR gamma when we swap in proper LSQR. Design notes PBC: minimum-image via cell inverse. Adequate when 2*cutoff fits inside the smallest cell vector. For very small cells we'd want full supercell expansion; out of scope for PR beta. Numpy-only on purpose. e3nn / torch were tempting for spherical harmonics but adding them to a projection module mixes concerns: projection should be a clean reference implementation that runs on any laptop with numpy. Test plan 33/33 tests pass (19 from PR alpha + 14 new). Ruff format + check clean. No Adastra interaction; pure numpy. Next: PR gamma wraps rholearn's training/inference loop as a SALTEDModel class, pinned against our LeMat-Rho parquet input pipeline and reusing charge3net_ft.train's NMAPE/RMSE/NRMSE metrics.

PR gamma of 4. Adds the model wrapper that pairs with the projection + reconstruction layer from PR beta. The wrapper has a stub mode so the surrounding pipeline (predict -> reconstruct -> metric) can be exercised end-to-end without a trained rholearn checkpoint. What's here salted_ft/model.py SALTEDModel(basis_spec, ckpt_path=None) * __call__(atoms) -> (n_atoms, n_coeffs_per_atom) float64 coefficients. * reconstruct_density(atoms, grid_shape) convenience that runs predict + reconstruct_grid_from_basis in one call. * Stub mode (ckpt_path=None): deterministic, position-dependent coefficients seeded by a hash of the positions / numbers / basis spec. Different atoms in -> different coefficients out; same atoms in -> same coefficients out (verified by tests). * Real-rholearn path raises NotImplementedError for now; lands in a follow-up PR once rholearn is configured on Adastra. Sibling-repo discovery for rholearn follows the existing charge3net_ft / deepdft_ft pattern (lazy; only insists when ckpt_path is set). salted_ft/projection.py Wrapped two more matmul sites in np.errstate to silence the same benign divide/invalid/overflow noise we already suppressed in _eval_basis_at_grid and _grid_positions. tests/test_salted_model.py 15 TDD tests across 5 categories: * Construct: basis_spec stored, default ckpt_path is None. * Output shape: single-atom, multi-atom, float64 dtype, finite. * Determinism: same input -> same output; position changes produce different output (rules out a zero-returning stub). * Reconstruct density: shape, dtype, finite, equals the explicit (predict, then reconstruct_grid_from_basis) path. * Metric integration with charge3net_ft.train's compute_nmape / compute_rmse / compute_nrmse: finite scalars, self-similarity gives NMAPE=0 sanity check. Pinned per the brief: keep metric calculations identical to the ChargE3Net pipeline. Test plan 48/48 tests across the salted suite pass (19 basis + 14 projection + 15 model). Ruff format + check clean. No Adastra interaction; pure local Python. Next: PR delta wraps the CHGCAR I/O via pymatgen so reconstructed grids can be written to disk for VASP ICHARG=1 single-points. End-to- end VASP integration test will be gated on the entalsim StructureVASPSinglePoint maker (separate stack).

PR delta of 4, closes the SALTED scaffold. Adds the boundary between the predicted-density-tensor world and the VASP-input-file world so a trained SALTED-arm model can be evaluated end-to-end via paired SCF runs. What's here salted_ft/io.py write_chgcar(density, atoms, path, n_electrons=None) Writes a pymatgen Chgcar-compatible file. The n_electrons argument rescales the density so its integrated value equals the requested electron count; that is what VASP reads as the total electron count when starting with ICHARG=1. Without rescaling VASP silently fixes the count for us at startup, which would mask part of the speedup we are trying to measure. Rejects non-3D densities and nonpositive n_electrons with clear messages. read_chgcar(path) -> (density, atoms) The inverse. Converts pymatgen's "density times volume" storage convention back to plain rho on the grid. Uses pymatgen.io.ase.AseAtomsAdaptor for the ase.Atoms <-> pymatgen.Structure conversion. tests/test_salted_io.py 9 TDD tests + 1 placeholder (skipped): Write: file exists and is nonempty, electron-count rescaling within 1e-4 relative, non-3D rejected, negative N rejected. Read: shape preserved, atom species preserved (multiset), cell preserved within 1e-6. Roundtrip: density write->read within VASP scientific-notation precision (rtol 1e-3, atol 1e-5). End-to-end: SALTEDModel.reconstruct_density piped into write_chgcar produces a readable file. VASP hook gate: pytest.importorskip on entalsim.dft.tasks.single_point, which auto-activates once Entalpic/entalsim PR #56 lands its PR 2 (StructureVASPSinglePoint maker). Test plan 9 passed + 1 skipped (entalsim gate). Full salted suite now 57 passed + 1 skipped across 4 stacked PRs: PR alpha 19 tests on BasisSpec PR beta 14 tests on projection / reconstruction PR gamma 15 tests on SALTEDModel + metric integration PR delta 10 tests (9+1) on CHGCAR I/O + VASP hook gate Ruff format + check clean across all 8 source/test files. The SALTED scaffold is now ready to consume a trained rholearn checkpoint and produce VASP-ready CHGCARs end-to-end. Next steps (separate stack): wire rholearn training on Adastra using the LeMat-Rho parquet adapter; flip the entalsim hook gate to live when PR 2 of the r2SCAN single-point stack lands.

Phase D1 (projection sanity check on 10 real LeMat-Rho rows) caught a catastrophic failure mode: the orthonormal-approximation projection landed in PR beta produced 1068% NMAPE on the basis-set roundtrip because the Gaussian basis functions overlap heavily (sigma ~= cutoff) and the per-channel c_k = <B_k, rho> / <B_k, B_k> overcounts contributions from neighboring basis functions. Fix: build the full per-structure design matrix B_global of shape (n_grid, n_atoms * n_coeffs_per_atom) and solve one least-squares system for all atom coefficients simultaneously. The system is overdetermined for our 10x10x10 grids (1000 > 4 atoms * 100 coeffs in the typical LeMat-Rho cell) so lstsq returns the unique minimum-residual fit. After: basis-set ceiling on 10 random LeMat-Rho rows is NMAPE: 8.19% +/- 6.60% (min 2.00%, max 22.67%) vs NMAPE: 1068.81% +/- 109.42% (orthonormal-approx) Well within the 'proceed' band from the plan. Full per-sample numbers are in the offline CSV at salted_basis_sanity_check.csv (outside the repo). Test plan 57/57 tests in tests/test_salted_basis.py + test_salted_projection.py + test_salted_model.py + test_salted_io.py still pass with no changes to test contracts. Linearity, zero-in-zero-out, shape, dtype, single-atom peak position, all unaffected. LSQR is linear in rho so the linearity tests hold by construction. Ruff format + check clean. The previous orthonormal-approx was documented in PR beta's commit as a 'v1 stand-in' for proper LSQR; this lands the proper version. No API change.

…irectory) Phase D2 of the Adastra comparison plan. One-time job to project every LeMat-Rho parquet row onto the locked SALTED basis, producing a parallel parquet directory of basis coefficients that downstream training loops (rholearn, Graph2Mat) consume. What's here salted_ft/project_dataset.py project_chunk(in_path, out_path, basis_spec) Reads one LeMat-Rho format chunk, runs project_chgcar_to_basis on every valid row, writes a parallel chunk with this schema: row_index, material_id, n_atoms, atomic_numbers, lattice_vectors, n_electrons, grid_shape, coefficients, basis_set_NMAPE basis_set_NMAPE column is the per-row reconstruction error from project + reconstruct roundtrip; lets downstream training know the basis ceiling per sample. project_directory(input_dir, output_dir, basis_spec) Driver that loops over chunk_*.parquet files. Idempotent: existing nonempty output files are left untouched so an interrupted run can resume cheaply. CLI entry point so the Adastra job runs as uv run python -m salted_ft.project_dataset \\ --input-dir ... --output-dir ... tests/test_salted_project_dataset.py 9 TDD tests across 2 classes covering the contract: * file written, row count, all required columns present * per-row coefficient shape is (n_atoms, n_coeffs_per_atom) * basis_set_NMAPE finite + nonneg per row * material_id preserved if source has it * NULL charge_density rows in source are skipped (real LeMat-Rho has some failed extractions) * project_directory processes every chunk * second invocation is a no-op (idempotent resume) The script uses the LSQR projection landed in commit 22809b9; D1 sanity check (10 random LeMat-Rho rows) showed basis ceiling 8.19% +/- 6.60% NMAPE, well within the proceed band. Test plan 9/9 tests pass on the new file; full salted suite still 66 passed + 1 skipped after this. Ruff format + check clean on touched files. Next: scp + run on Adastra against $SETUP/charge3net_data, expected ~30 min wall on a Genoa CPU node for 65k rows.

Genoa CPU partition, single node, 16 CPUs, 2 h wall (Adastra smoke test of 1 chunk = 71 s, 69 chunks extrapolate to ~80 min). Caps OMP_NUM_THREADS / OPENBLAS_NUM_THREADS / MKL_NUM_THREADS to SLURM_CPUS_ON_NODE so numpy's BLAS-backed lstsq does not over- subscribe the node (default behavior would spawn one thread per hardware core regardless of allocation). Idempotent via project_directory's skip-existing logic, so the job can be requeued without paying the LSQR cost for chunks already written.

Job 4977567 (LRU OOM fix in place) ran 2h41m and died from a NEW failure mode: NCCL TCPStore "Broken pipe" on the DDP heartbeat channel. Trace from .err: Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe ... srun: error: g1132: tasks 1-3: Terminated MaxRSS was 14 GB/task -- memory budget healthy, so the LRU fix is solid. The new bug is inter-rank communication, not memory. Adds four NCCL env vars to the submit script: NCCL_TIMEOUT=3600 per-collective timeout NCCL_ASYNC_ERROR_HANDLING=1 clean shutdown on rank failure, no cascading hangs TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC=1800 half-hour heartbeat tolerance (was the default ~600 sec) TORCH_NCCL_TRACE_BUFFER_SIZE=1000 larger trace buffer for the next crash post-mortem Test plan 9/9 tests in tests/test_submit_script.py still pass. Resubmit to validate end-to-end. If this still crashes from NCCL, fallback options are gloo backend or single-GPU runs.

Phase D3 of the Adastra comparison plan. Bridges our SALTED-arm dense coefficient layout with rholearn's metatensor TensorMap layout so the training loop in rholearn can consume LeMat-Rho data. Layout mismatch resolved by this adapter Our layout (from project_chgcar_to_basis): atom -> n (radial) -> lambda -> mu rholearn's layout (from rholearn/utils/convert.py:_get_flat_index): atom -> lambda -> n (radial) -> mu The reordering is a single per-atom permutation, independent of species because our BasisSpec is uniform across all species in v1. What's here salted_ft/rholearn_adapter.py build_lmax_nmax(basis_spec, species) Expand uniform BasisSpec into rholearn's per-species lmax / nmax dicts (the form expected by convert.coeff_vector_ndarray_to_tensormap). dense_to_rholearn_flat(coeffs, basis_spec, symbols) rholearn_flat_to_dense(flat, basis_spec, symbols) The exact permutation between the two layouts. Roundtrip is the identity; pinned by tests. dense_to_tensormap(coeffs, basis_spec, symbols, positions, cell, structure_idx) Full path that calls rholearn's converter. Lazy-imports rholearn and metatensor so this module is importable without those deps. tests/test_salted_rholearn_adapter.py 12 TDD tests across 4 classes: Build lmax/nmax dicts (species coverage, value match, key form, total coefficient count matches) dense_to_rholearn_flat (output length, zero-in-zero-out, dtype, per-atom block ordering) Roundtrip (single-atom, multi-atom, permutation-is-nontrivial) Full TensorMap (key names; skipped locally when sibling rholearn missing -- auto-activates on Adastra) Test plan 77 passed + 2 skipped across the salted suite (78 = previous 66 + 12 new). The 2 skips are forward-looking gates: one on the entalsim VASP single-point maker, one on the rholearn sibling repo. Both auto-activate as soon as their deps are reachable. Ruff format + check clean. Next: D4 (rholearn training submit script that reads our projected coefficients via this adapter, runs the metatensor-based training, saves checkpoints). Will need a real Adastra job once D2's projected-coefficient dataset is on disk.

Needed by tests/test_salted_rholearn_adapter.py (the metatensor TensorMap conversion path uses both). Without them the TestDenseToTensorMap class skips locally, which masks integration breaks until they're caught at runtime on Adastra. Pure-Python binary wheels exist on PyPI, no compilation needed.

Mirrors salted_ft's basis module for the Graph2Mat arm of the r2SCAN density-model comparison. point_basis_for_species and basis_table_for_species expand our uniform BasisSpec(max_l=4, n_radial=4, cutoff=4.0) into Graph2Mat PointBasis objects with basis=[4]*5 and basis_convention='spherical'. PointBasis.basis_size is asserted equal to BasisSpec.n_coeffs_per_atom (100) so projected coefficients stay loadable into Graph2Mat density matrices. 10 TDD tests pinning: type/R/basis_size/convention contracts, one entry per l, species independence, and dedup behaviour of the batch table builder.

CINES policy rejects explicit --partition= asks on the Genoa nodes, so SLURM auto-routes based on the resource size. 16 CPUs/task lands in the exclusive queue (long wait); 4 CPUs/task lands in shared and starts almost immediately. The projection is BLAS-LSQR bound and saturates 4 cores per chunk already, so the smaller ask costs no wall time.

Path A of the Graph2Mat plan: keep the same regression target as SALTED (per-atom basis coefficient vectors from salted_ft) and use Graph2Mat as a different backbone over the same target. graph2mat_ft.projection exposes: * pack_coeffs_to_point_labels(coeffs, basis_spec, symbols) flattens (N_atoms, n_coeffs_per_atom) into atom-major point_labels. * unpack_point_labels_to_coeffs is the inverse. * make_basis_configuration bundles a structure into a graph2mat.BasisConfiguration so the training driver does not have to reach into graph2mat internals. 14 TDD tests pinning shape, dtype preservation, atom-major ordering, within-atom channel order, length-mismatch ValueError guards, and BasisConfiguration point_types indexing into the species basis list.

…mma) Mirrors salted_ft.model.SALTEDModel. Stub mode (ckpt_path=None) returns deterministic per-atom coefficients seeded off positions + numbers + basis_spec via blake2b, so same structure in -> same coefficients out and small perturbations to any atom change the output. ckpt_path != None raises NotImplementedError until D6 wires in the real Graph2Mat backbone, so the failure mode is loud rather than silently returning stub output during benchmarking. reconstruct_density(atoms, grid_shape) is the convenience entry point for the VASP comparison pipeline. Note: salted_ft.model uses int.from_bytes(seed_bytes[:16], ...) which only seeds off atom 0 -- different bug, same shape, but left alone here per the surgical-changes rule. Worth fixing in its own patch. 10 TDD tests pinning shape, dtype, finiteness, determinism, position-dependence, species-dependence, output magnitude, the NotImplementedError gate for ckpt_path, and the reconstruct_density shape contract.

graph2mat_ft.io re-exports read_chgcar / write_chgcar from salted_ft.io so the two arms share a single implementation (including the n_electrons rescaling that VASP ICHARG=1 needs). Tests pin the identity of the re-exports so a future fix in salted_ft.io automatically propagates.

Graph2Mat's native target is D_ab in an atom-centered basis. VASP does not output that; we would have to invent a CHGCAR -> D_ab projection (10^6 x 10^6 dense LSQR per structure, needs matrix-free + neighbor-cutoff and its own quality validation). Multi-week effort, no clear win for the SCF-speedup goal vs the three arms already in flight. The PointBasis adapter, projection helpers, model wrapper and shared IO surface stay in tree as green-tested scaffolding so the arm can be revived (with SIESTA training data, a matrix-free projection, or a vector-output hijack) without rewriting from zero.

scripts/density_model_eval.py loops over a LeMat-Rho-shaped test parquet, runs the selected arm to predict the density on the ground-truth grid, and writes per-row NMAPE / RMSE / NRMSE into an output parquet. Importable for D8 (the comparison-table builder) via evaluate_dataset(...). Arm coverage in this alpha: * salted: fully wired through SALTEDModel.reconstruct_density. Stub mode (no ckpt) works; real mode lights up when D6 (SALTED training driver) lands. * charge3net, deepdft: dispatcher raises NotImplementedError with a TODO pointing at D7-beta (probe batching). Catches a future user feeding a real-arm name and silently getting stub metrics. * unknown name: ValueError at the boundary. Metrics are numpy-only on flat or 3D arrays (no probe-padding mask needed because grid eval has no padding). 14 TDD tests pin metric values, dispatcher contract, parquet schema (model, ckpt, material_id, n_atoms, nmape, rmse, nrmse), finiteness, and the --limit smoke-test path.

scripts/density_model_comparison_table.py concatenates one or more D7 per-row eval parquets, groups by the model column, and emits a per-arm summary (n, mean +/- std, median for NMAPE / RMSE / NRMSE). Writes both a CSV (machine-readable) and a GitHub-flavour markdown table (paste-into-PR). build_comparison_table(inputs, csv_path, markdown_path) is importable so a Lightning callback / pipeline step can call it directly without spawning a subprocess. CLI driver provided for ad-hoc use. 10 TDD tests pin: per-arm grouping, mean / std / median values, n_structures count, multi-file-per-arm aggregation (sharded eval), markdown content and header structure, and the CSV + markdown write paths.

The old int.from_bytes(seed_bytes[:16], ...) only consumed the first 16 bytes of positions + numbers + spec, which is two-thirds of atom 0's xyz and nothing else. Perturbing any atom past index 0 produced identical stub coefficients, silently collapsing distinct structures into the same seed. Switch to a blake2b(digest_size=16) hash over the full buffer so every atom contributes. Same fix already in graph2mat_ft.model. Regression test pins the multi-atom case: nudging atom 1 in a two-atom Fe cell must change the predicted coefficients.

Wires the charge3net arm in scripts/density_model_eval.py. Builds the full-grid input dict via charge3net's own KdTreeGraphConstructor (so atom + probe edges match training), batches probes through src.utils.predictions.split_batch, and reshapes the concatenated forward output to (Nx, Ny, Nz). predict_density now accepts an optional pre-loaded model so tests inject a mock without going through ChargE3NetWrapper + a real ckpt. The charge3net_ft.model import is forced for its sys.path side effect (adds ../charge3net) so the data utilities resolve even when the caller supplies the model directly. Tests skip cleanly when the charge3net sibling repo is absent (integration-only). Two new mock-model tests pin: full-grid shape contract, value reshape order (constant predictions reproduce a constant grid), and that lowering max_probe_batch increases the forward-pass count. DeepDFT branch still gated behind NotImplementedError (separate forward signature, lands in D7-beta2).

DeepDFT is the upstream code charge3net forked, so the model input-dict format is identical: probe_xyz, num_probes, probe_edges, etc. _deepdft_predict_grid reuses charge3net's data utilities to build the graph and split_batch to batch probes; the DeepDFT-specific bits are: * sys.path side effect from deepdft_ft.runner (adds ../DeepDFT and stubs asap3 when its C extension is unbuildable, as on Adastra). * densitymodel.PainnDensityModel(num_interactions=3, node_size=128, cutoff=4.0) by default; toggle use_painn=False for SchNet. * ckpt loading via torch.load with the "model" key wrapper. Optional model= injection identical to charge3net so tests can mock the network. Integration test skips when the DeepDFT sibling repo is absent (this machine); runs on Adastra where it lives.

…p (D6) Path B of the D6 plan: skip the rholearn integration (would need multi-week Adastra-side iteration) and train a small SchNet-style invariant message-passing net directly on D2's per-atom basis coefficients. MSE loss; AdamW; gradient accumulation per batch since per-structure forward is variable size. Architecture (salted_ft/train_baseline.py): * Z embedding (nn.Embedding, max_z=120). * GaussianRBF distance featurisation over neighbours within BasisSpec.cutoff. * Two SchNet-style cfconv layers. * Per-atom readout MLP -> BasisSpec.n_coeffs_per_atom. Caveat: invariant model means l>0 channels of the SALTED basis will be systematically wrong. This is a baseline; upgrade to e3nn/MACE for proper equivariance if it under-performs. SaltedTrainingDataset joins D2 source (cartesian_site_positions column) and projected coefficients (training targets) by row_index per matching chunk basename, since D2 output does not carry positions. submit_salted_baseline_adastra.sh: single-GCD MI250 job, 10 epochs, 24h walltime, ROCm env mirrored from the DeepDFT submit. 8 TDD tests pinning: forward output shape, dtype, finiteness, determinism, species-dependence (catches frozen Z embedding), loss-decrease on a synthetic toy, save/load round-trip, and an end-to-end train() call on a synthetic 2-row dataset.

Replaces _rholearn_predict (which only raised NotImplementedError) with _baseline_predict: lazy-loads the SaltedBaselineModel from the D6 ckpt format {basis_spec, model: state_dict}, caches it on the wrapper, and forwards through torch.no_grad(). The result is cast to float64 to match the stub-mode contract. Removes the eager _ensure_rholearn_importable() check from __init__ since the baseline path does not need the rholearn sibling repo. The rholearn-faithful path was deferred (graph2mat arm is parked, SALTED arm uses path B); when it comes back as a follow-up we will dispatch on ckpt format inside _baseline_predict. Two new tests: round-trip a baseline state_dict through SALTEDModel and verify the predicted coefficients differ from the stub seed (so we know the ckpt is actually driving inference), and assert a clear RuntimeError on a malformed ckpt.

…izes Job 5003891 OOM-killed (CPU RAM) at ~10 min: slurmstepd reported "Detected 1 oom_kill event" with 64 GB budget. Root cause is the data-buffer footprint, not a model or training-loop issue. The upstream-DeepDFT defaults of RotatingPoolData(pool_size=20) + num_workers=4 keep up to 80 full grids in RAM concurrently. For QM9 (~50^3) and MP (~100^3) that is fine. LeMat-Rho's r2SCAN CHGCARs have a long upper tail (200-300^3), and a single 300^3 sample is ~750 MB once density + grid_pos are materialised; a handful of those in the pool blows past 64 GB. Cut pool_size 20 -> 5 and num_workers 4 -> 2. Effective in-RAM grid count drops 80 -> 10. Hyperparameters that affect training quality (batch_size=2 materials, 1000 probes/material, learning rate, etc.) are unchanged. Verified locally: full test suite still green (195 pass).

…Flow (P4) For each held-out test row, predicts the density via the chosen arm (salted, charge3net, deepdft) using the existing density_model_eval.predict_density, writes a CHGCAR with the n_electrons rescaling salted_ft.io.write_chgcar applies, and submits the paired baseline + predicted r2SCAN single-point Flow via entalsim.dft.scf_speedup.make_scf_speedup_pair plus entalsim.core.submit.submit_workflow. Driver is dependency-injectable on the two entalsim callables (make_pair_fn, submit_fn) so its tests pass locally without entalsim installed; the CLI imports them at runtime via lazy imports. Fail-fast guards at run_experiment call time: * charge3net or deepdft without --ckpt raises ValueError (those arms with no weights produce random-init predictions and waste HPC time) * salted without --ckpt is allowed — stub mode is the documented fallback while D6 trained weights are pending One per-row chgcar directory keyed by (model, material_id) so multiple rows never share a CHGCAR file; make_scf_speedup_pair's prev_dir mechanism receives the right directory. 9 TDD tests pinning: dry-run writes one CHGCAR per row + does not submit; make_pair gets metadata with material_id + arm + experiment; --limit caps rows processed; non-dry-run submits per row with the right project + worker; submitted=True/False flag appears on the returned records; charge3net + deepdft without --ckpt fails fast; salted stub-mode ckpt label propagates; per-row CHGCAR directories are unique.

…4 hardening) Reviewer flagged two blockers on the multi-hour submit loop: * a single bad row killed the batch and left already-submitted Flows on Mongo with no resume path * no per-row logging meant a row-200 failure left no breadcrumb for diagnosis This commit addresses both, plus a chgcar-dir contract nit later. Per-row resilience: * try/except Exception around the prediction + flow-build + submit body. A failed row records {"error": repr(e), "submitted": False} and the loop continues with the next row. Resumable JSONL manifest: * records stream to {chgcar_dir}/manifest.jsonl by default (overridable via --manifest) AFTER each row, in finally:, so an interrupted run leaves an inspectable record. * --skip-existing reads the manifest at start and skips rows with submitted=True for THIS model. Failed rows (submitted=False) are always retried. Observability: * tqdm.auto wrapper on df_in.iterrows() with desc= f"scf_speedup({model_name})" -- visible progress bar without spamming the log. * logger.info per row (material_id, arm, n_jobs, submitted) plus logger.exception on per-row failure for full traceback. * main() configures basicConfig(level=INFO) so the CLI path emits logs straight to stderr. 5 new TDD tests: * TestPerRowResilience: a corrupt positions cell in row 2 of 3 fails that row only; the other two complete normally. * TestManifest.test_manifest_jsonl_written_after_each_row: 3 rows -> 3 JSONL lines in the manifest. * TestManifest.test_manifest_defaults_to_chgcar_dir: implicit manifest path lands at chgcar_dir/manifest.jsonl. * TestSkipExisting.test_skip_existing_skips_already_submitted_rows: pre-populated manifest with submitted=True skips that row. * TestSkipExisting.test_skip_existing_does_not_skip_failed_rows: submitted=False rows are retried, not skipped. 14 / 14 tests green; full suite green (204+ tests).

Reviewer flagged two worth-flagging items. LeMaterial#4 CHGCAR directory layout * was: chgcar_root / f"{model}__{material_id}/CHGCAR" * now: chgcar_root / model / material_id / CHGCAR * the flat layout would have been ambiguous for synthesised IDs containing the separator (e.g. "oqmd__1234"). Nested avoids that entirely and is also more ls-friendly when sweeping models. * new test test_chgcar_layout_is_nested_by_model_then_material_id asserts the path tail. LeMaterial#8 Test-data realism * the existing _toy_parquet uses 2-atom H2 cells with grid_shape=(4,4,4) and n_electrons=2.0 -- a missing n_electrons rescale, a positions-reshape bug, or a grid/atom mismatch would all pass silently. * new TestRealisticRow.test_5_atom_asymmetric_grid_unequal_n_electrons exercises an FeO4 row with grid_shape=(8,10,12) and n_electrons=12.5 != sum(Z). Catches mutations on the reshape and rescale paths. 16 / 16 tests green; full suite green.

speckhard added 6 commits May 20, 2026 13:26

speckhard force-pushed the feat/charge3net-adastra branch from 8487ae9 to 8d510d2 Compare May 20, 2026 11:27

speckhard added 23 commits May 20, 2026 13:28

speckhard added 7 commits May 26, 2026 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(adastra): port ChargE3Net fine-tuning to AMD MI250X on CINES Adastra#1

feat(adastra): port ChargE3Net fine-tuning to AMD MI250X on CINES Adastra#1
speckhard wants to merge 36 commits into
feat/charge3netfrom
feat/charge3net-adastra

speckhard commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

speckhard commented May 19, 2026

Summary

What's in this PR

Port blockers solved

Reference smoke run

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant