Add log_train_loss_on_step toggle to EasySyntax by sevmag · Pull Request #886 · graphnet-team/graphnet

sevmag · 2026-05-04T22:14:26Z

Summary

Adds an opt-in log_train_loss_on_step flag to EasySyntax that logs a per-step train_loss_step metric in addition to the existing epoch-aggregated train_loss.
Default is False, so existing logging behavior is unchanged.

addresses #882

Adds an opt-in `log_train_loss_on_step` constructor argument that, when enabled, logs the per-batch training loss under `train_loss_step` in addition to the epoch-aggregated `train_loss`. Default is False so existing behavior is unchanged.

christianlocatelli · 2026-05-12T11:31:08Z

I will look at this.

christianlocatelli · 2026-05-26T08:44:31Z

        scheduler_class: Optional[type] = None,
        scheduler_kwargs: Optional[Dict] = None,
        scheduler_config: Optional[Dict] = None,
+        log_train_loss_on_step: bool = False,


The variable name could be renamed to also_log_train_loss_per_step. This would immediately clarify, that it is an additional option for logging the per-batch loss under a different key.

Suggested change

log_train_loss_on_step: bool = False,

also_log_train_loss_per_step: bool = False,

It could be also useful to add a Docstring explaining the arguments in __init__(), but especially for also_log_train_loss_per_step.

""" Args: also_log_train_loss_per_step: If `True`, logs an additional per-batch metric (`train_loss_step`) alongside the existing per-epoch metric (`train_loss`). This can be useful for debugging training instabilities or monitoring convergence within long epochs. """

christianlocatelli · 2026-05-26T08:47:58Z

        self._scheduler_class = scheduler_class
        self._scheduler_kwargs = scheduler_kwargs or dict()
        self._scheduler_config = scheduler_config or dict()
+        self._log_train_loss_on_step = log_train_loss_on_step


Suggested change

self._log_train_loss_on_step = log_train_loss_on_step

self._also_log_train_loss_per_step = also_log_train_loss_per_step

christianlocatelli · 2026-05-26T08:54:52Z

+        if self._log_train_loss_on_step:
+            self.log(
+                "train_loss_step",
+                loss,
+                batch_size=batch_size,
+                prog_bar=False,
+                on_epoch=False,
+                on_step=True,
+                sync_dist=True,


It might be computationally expensive, if sync_dist=True.
There would be syncing across GPUs on every batch, which quickly adds up for high batch number. It should be maybe clarified in the Docstring at the top, that the training might be slowed down. The default of this option could also be set to sync_dist=False.

Suggested change

if self._log_train_loss_on_step:

self.log(

"train_loss_step",

loss,

batch_size=batch_size,

prog_bar=False,

on_epoch=False,

on_step=True,

sync_dist=True,

if self._also_log_train_loss_on_step:

self.log(

"train_loss_step",

loss,

batch_size=batch_size,

prog_bar=False,

on_epoch=False,

on_step=True,

sync_dist=True,

Good point! Let's set sync_dist to false as the default

Co-authored-by: Christian Locatelli <97306084+christianlocatelli@users.noreply.github.com>

christianlocatelli

I left some comments about optional naming and doc improvements.

The previous commit ("Apply suggestions from code review") was created via GitHub's batch-suggestion apply, which mangled the indentation and left a name mismatch, so the module no longer imported: - under-indented `also_log_train_loss_per_step` parameter and attribute - top-level `if self._also_log_train_loss_on_step:` referencing an attribute that is never set (`_on_step` vs `_per_step`) Re-apply the reviewer's intent cleanly: rename to `also_log_train_loss_per_step`, log the per-step metric with `sync_dist=False`, and document all `__init__` arguments. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

sevmag · 2026-05-27T01:59:21Z

Hey @christianlocatelli, I implemented your suggestions. Sorry for the commit name, that was claude 😅

christianlocatelli

This looks good to me, thanks for editing the code 👍

sevmag · 2026-05-27T19:59:26Z

@Aske-Rosted tagging you here for completeness (and a potential approval 😅 )

christianlocatelli self-requested a review May 12, 2026 18:33

christianlocatelli reviewed May 26, 2026

View reviewed changes

Apply suggestions from code review

d4e3508

Co-authored-by: Christian Locatelli <97306084+christianlocatelli@users.noreply.github.com>

christianlocatelli reviewed May 26, 2026

View reviewed changes

sevmag requested a review from christianlocatelli May 27, 2026 01:59

christianlocatelli reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add log_train_loss_on_step toggle to EasySyntax#886

Add log_train_loss_on_step toggle to EasySyntax#886
sevmag wants to merge 3 commits into
graphnet-team:mainfrom
sevmag:feature/log-train-loss-on-step

sevmag commented May 4, 2026 •

edited

Loading

Uh oh!

christianlocatelli commented May 12, 2026

Uh oh!

christianlocatelli May 26, 2026 •

edited

Loading

Uh oh!

christianlocatelli May 26, 2026 •

edited

Loading

Uh oh!

christianlocatelli May 26, 2026 •

edited

Loading

Uh oh!

sevmag May 27, 2026

Uh oh!

christianlocatelli left a comment

Uh oh!

sevmag commented May 27, 2026

Uh oh!

christianlocatelli left a comment

Uh oh!

sevmag commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	log_train_loss_on_step: bool = False,
	also_log_train_loss_per_step: bool = False,

	self._log_train_loss_on_step = log_train_loss_on_step
	self._also_log_train_loss_per_step = also_log_train_loss_per_step

Conversation

sevmag commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

christianlocatelli commented May 12, 2026

Uh oh!

christianlocatelli May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christianlocatelli May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christianlocatelli May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sevmag May 27, 2026

Choose a reason for hiding this comment

Uh oh!

christianlocatelli left a comment

Choose a reason for hiding this comment

Uh oh!

sevmag commented May 27, 2026

Uh oh!

christianlocatelli left a comment

Choose a reason for hiding this comment

Uh oh!

sevmag commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sevmag commented May 4, 2026 •

edited

Loading

christianlocatelli May 26, 2026 •

edited

Loading

christianlocatelli May 26, 2026 •

edited

Loading

christianlocatelli May 26, 2026 •

edited

Loading