Skip to content

fix: correct SAC target entropy to -action_dim#31

Open
Mr-Neutr0n wants to merge 1 commit into
ikostrikov:mainfrom
Mr-Neutr0n:fix/sac-target-entropy-formula
Open

fix: correct SAC target entropy to -action_dim#31
Mr-Neutr0n wants to merge 1 commit into
ikostrikov:mainfrom
Mr-Neutr0n:fix/sac-target-entropy-formula

Conversation

@Mr-Neutr0n
Copy link
Copy Markdown

Bug

The default target_entropy is set to -action_dim / 2 in the SAC, SAC v1, and REDQ learners. The original SAC paper (Haarnoja et al., 2018) specifies -action_dim as the target entropy heuristic. Using half the correct value causes under-exploration by targeting lower entropy than intended.

Note that the DRQ learner already used the correct -action_dim formula, making this an inconsistency as well.

Fix

Changed the default target_entropy from -action_dim / 2 to -action_dim in:

  • jaxrl/agents/sac/sac_learner.py
  • jaxrl/agents/sac_v1/sac_v1_learner.py
  • jaxrl/agents/redq/redq_learner.py

This matches the paper and is consistent with the existing DRQ implementation.

Reference

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML 2018.

The default target entropy was set to -action_dim / 2 instead of
-action_dim as specified in the original SAC paper (Haarnoja et al.,
2018). This causes under-exploration by targeting a lower entropy
than intended. The DRQ learner already used the correct formula.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant