fix: correct SAC target entropy to -action_dim by Mr-Neutr0n · Pull Request #31 · ikostrikov/jaxrl

Mr-Neutr0n · 2026-02-11T18:19:17Z

Bug

The default target_entropy is set to -action_dim / 2 in the SAC, SAC v1, and REDQ learners. The original SAC paper (Haarnoja et al., 2018) specifies -action_dim as the target entropy heuristic. Using half the correct value causes under-exploration by targeting lower entropy than intended.

Note that the DRQ learner already used the correct -action_dim formula, making this an inconsistency as well.

Fix

Changed the default target_entropy from -action_dim / 2 to -action_dim in:

jaxrl/agents/sac/sac_learner.py
jaxrl/agents/sac_v1/sac_v1_learner.py
jaxrl/agents/redq/redq_learner.py

This matches the paper and is consistent with the existing DRQ implementation.

Reference

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML 2018.

The default target entropy was set to -action_dim / 2 instead of -action_dim as specified in the original SAC paper (Haarnoja et al., 2018). This causes under-exploration by targeting a lower entropy than intended. The DRQ learner already used the correct formula.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct SAC target entropy to -action_dim#31

fix: correct SAC target entropy to -action_dim#31
Mr-Neutr0n wants to merge 1 commit into
ikostrikov:mainfrom
Mr-Neutr0n:fix/sac-target-entropy-formula

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 11, 2026

Bug

Fix

Reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant